Lighteval

I am working on the Turkish eval integration to the lighteval.

It's fascinating to understand what's happening behind the scenes of benchmarks!

Current OpenLLMTurkishLeaderboard v0.2 is consists of 6 tasks

gsm8k: high-quality grade school math problems
mmlu: massive multitask language understanding
truthful qa: measure whether a language model is truthful in generating answers to questions.
winogrande: adversarial winograd schema challenge
hellaswag: evaluating commonsense natural language inference
arc: ai2 reasoning challenge

and to implement these tasks in lighteval, not that hard!

You need a prompt function like this!

def prompt_fn(line, task_name: str = None):
    return Doc(
        task_name=task_name,
        query=line["question"],
        choices=[f" {c}" for c in line["choices"]],
        gold_index=line["gold"],
        instruction="",
    )

This function returns query -> your prompt, choices, and gold_index means your answer index list! (if you have multiple right answers.)

then you need a task configuration for lighteval

task = LightevalTaskConfig(
    name="myothertask",
    prompt_function=prompt_fn,  # must be defined in the file or imported from src/lighteval/tasks/tasks_prompt_formatting.py
    suite=["community"],
    hf_repo="",
    hf_subset="default",
    hf_avail_splits=[],
    evaluation_splits=[],
    few_shots_split=None,
    few_shots_select=None,
    metric=[],  # select your metric in Metrics
)

Then you need to add your task to the TASKS_TABLE list.

After getting these steps done for every tasks. You need to run evaluate command!

lighteval accelerate \
    "model_name=HuggingFaceH4/zephyr-7b-beta" \
    "community|{custom_task}|0|0" \
    --custom-tasks {path_to_your_custom_task_file}

Ta-daa! 🎉

I've already created an issue in lighteval, soon I'll create a pull-request for this contribution.

mert bozkir

Lighteval