Prepare Files
Step 1: Determine Your Target Data Category
The first step is to determine the target data category that best aligns with your model’s evaluation goals and readiness.
To accommodate varying computational budgets and model development stages, General-Bench
is split into two distinct subsets: the closed set
and the open set
.
Closed Set: with inputs and labels of samples all publicly open, for free open-world use (e.g., for academic experiment/comparisons).
Open Set: with only sample inputs available, which is used for leaderboard ranking. Participants need to submit the predictions to us for internal evaluation.

Scope Definitions
Tasks are further categorized into the following four evaluation scopes:
Scope-A: Full-spectrum Hero
Covers all modalities and task types under the General-Level benchmark. Designed for highly capable, general-purpose multimodal models.
Scope-B: Modality-specific Unified Hero
Focuses on a single modality or combinations of related modalities (e.g., image, video, audio, 3D), targeting modality-level generalists.
Scope-C: Comprehension/Generation Hero
Separates tasks into comprehension vs. generation within each modality. Suitable for lightweight or early-stage models due to its lower entry threshold.
Scope-D: Skill-specific Hero
Provides fine-grained evaluation focused on specific task clusters such as VQA, captioning, or speech recognition—ideal for partial generalists.
Step 2: Download the Dataset
Based on your selected scope and target tasks, download the corresponding dataset subsets.
Available Evaluation Datasets
- Scope-C:
Notes
If a scope contains multiple sub-tasks, please download each dataset individually.
All datasets follow a standardized file structure. For details, refer to Dataset Format.