Path to Multimodal Generalist

Leaderboard under Scope B: Modality-specific Unified Hero 💎

In many cases—especially with current-generation MLLMs—models are not truly capable of handling all modalities simultaneously. In fact, most existing multimodal generalists tend to excel in a specific modality (or a limited combination of modalities), forming what we call modality-wise generalists, such as Gemini, InternVL, DeepSeek-VL, Emu3, and Janus-Pro. To accommodate this, different from Scope-A, we introduce Scope-B, which provides modality-specific leaderboards focused on a single modality or a partially joint modality to evaluate modality-wise generalists.

Scope-B consists of 7 separate leaderboards: 4 for single modalities and 3 for combined modalities. Each leaderboard is based on a corresponding dataset designed to measure a model’s overall capability within that modality, as well as its within-modality synergy and generalization. This design ensures that Scope-B leaderboards can still reflect all four General-Levels defined in our framework. Importantly, since each leaderboard in Scope-B involves fewer data and lower evaluation overhead, it significantly reduces the barrier and cost for practitioners—while still preserving the core evaluation principles of General-Level. Thus, modality-wise generalist models that are particularly strong in a given modality can stand out under this scope.

Submit evaluation results of your model on the leaderboard: Submit

Download the corresponding evaluation subset for Image (ID: #S-B-I) group.

Choose a sub-board:

Image

Video

Audio

Image-Video

Audio-Video

Audio-Image-Video

📌 Go to [All-Tasks] page to find all tasks included in Image group.

Image Hero at Level-2 Ranking in

Image Hero at Level-3 Ranking in

Image Hero at Level-4 Ranking in

Image Hero at Level-5 Ranking in