Leaderboard under Scope A: Full-spectrum Hero 👑
This is the full-spectrum leaderboard covering all modalities and tasks under General-Level, for highly capable, general-purpose multimodal generalists (foundational models or agents). Scope-A requires models to be evaluated across all modalities, paradigms, skills, and tasks—based on the full General-Bench dataset—while also considering the model's level of synergy across comprehension and generation tasks, as well as across multimodal interactions. Therefore, Scope-A serves as the most representative leaderboard for assessing true multimodal generalists and aligns most faithfully with our intended design of the General-Level evaluation protocol. Naturally, this leaderboard is also the most challenging, requiring substantial computational resources for models to complete the full evaluation.
However, to lower the participation barrier for practitioners and reduce the code complexity involved in Scope-A evaluation, we derive a smaller subset (v-small) from the General-Bench (v-full) dataset. This subset still spans all modalities, paradigms, and skills, but with fewer tasks and a reduced data volume, making it more concise and lightweight. While preserving the core evaluation criteria of General-Bench as much as possible, the reduced data size significantly cuts down the evaluation cost. We refer to the complete leaderboard as the Full Board, and the reduced one as the Quick Board (Task ↓11.07%, Data volume ↓95.22%).
📌 Go to [All-Tasks] page to find all tasks included in Full Board.