Prepare Models 

Step 1: Prepare Your Model 

You may evaluate either a custom-trained model or any existing model.

Please make sure your model should accept JSON-formatted input files.

Step 2: Make Prediction 

Your model’s output must follow the standardized JSON format shown below. This ensures compatibility with the centralized evaluation protocol.

{
  "task": "name_of_the_task",
  "skill": "name_of_the_skill_cluster",
  "type": "task_type",
  "data": [
      {
          "id": "sample_id",
          "prediction": "your model's output text or file path"
      }
      // Additional prediction entries...
  ]
}
task:

The official name of the task (e.g., video_captioning or image_vqa), as defined in the benchmark documentation.

skill:

The name of the skill cluster to which the task belongs (e.g., VQA, Captioning, ASR).

type:

The task type, such as comprehension, generation, depending on the nature of the problem.

data:

A list of prediction records, each containing:

id:

The unique identifier for the data sample, which must exactly match the id field provided in the input file.

prediction:

The model’s prediction.

Note

For text-based tasks (e.g., classification, QA, captioning), this should be a string representing the predicted text.

For generation tasks involving images or videos, the prediction should be a file path pointing to the generated output.

Note: Ensure predictions are generated for every data point in the evaluation set.