Prepare Models
Step 1: Prepare Your Model
You may evaluate either a custom-trained model or any existing model.
Please make sure your model should accept JSON-formatted input files.
Step 2: Make Prediction
Your model’s output must follow the standardized JSON format shown below. This ensures compatibility with the centralized evaluation protocol.
{ "task": "name_of_the_task", "skill": "name_of_the_skill_cluster", "type": "task_type", "data": [ { "id": "sample_id", "prediction": "your model's output text or file path" } // Additional prediction entries... ] }
- task:
The official name of the task (e.g.,
video_captioning
orimage_vqa
), as defined in the benchmark documentation.- skill:
The name of the skill cluster to which the task belongs (e.g.,
VQA
,Captioning
,ASR
).- type:
The task type, such as
comprehension
,generation
, depending on the nature of the problem.- data:
A list of prediction records, each containing:
- id:
The unique identifier for the data sample, which must exactly match the id field provided in the input file.
- prediction:
The model’s prediction.
Note
For text-based tasks (e.g., classification, QA, captioning), this should be a
string
representing the predicted text.For generation tasks involving images or videos, the prediction should be a
file path
pointing to the generated output.
Note: Ensure predictions are generated for every data point in the evaluation set.