Prepare Files

Step 1: Determine Your Target Data Category

The first step is to determine the target data category that best aligns with your model’s evaluation goals and readiness.

To accommodate varying computational budgets and model development stages, General-Bench is split into two distinct subsets: the closed set and the open set.

  • Closed Set: with inputs and labels of samples all publicly open, for free open-world use (e.g., for academic experiment/comparisons).

  • Open Set: with only sample inputs available, which is used for leaderboard ranking. Participants need to submit the predictions to us for internal evaluation.

dual-set

Scope Definitions

Tasks are further categorized into the following four evaluation scopes:

  • Scope-A: Full-spectrum Hero

    Covers all modalities and task types under the General-Level benchmark. Designed for highly capable, general-purpose multimodal models.

  • Scope-B: Modality-specific Unified Hero

    Focuses on a single modality or combinations of related modalities (e.g., image, video, audio, 3D), targeting modality-level generalists.

  • Scope-C: Comprehension/Generation Hero

    Separates tasks into comprehension vs. generation within each modality. Suitable for lightweight or early-stage models due to its lower entry threshold.

  • Scope-D: Skill-specific Hero

    Provides fine-grained evaluation focused on specific task clusters such as VQA, captioning, or speech recognition—ideal for partial generalists.

Step 2: Download the Dataset

Based on your selected scope and target tasks, download the corresponding dataset subsets.

Available Evaluation Datasets

Notes

  • If a scope contains multiple sub-tasks, please download each dataset individually.

  • All datasets follow a standardized file structure. For details, refer to Dataset Format.