Dataset Category
A companion massive multimodal benchmark dataset, encompasses a broader spectrum of skills, modalities, formats, and capabilities.
Specifically, it comprises 145
multimodal skills, and contains 702
tasks under 5
major modalities of both comprehension and generation, covering 29
domains across various totally free-form task formats (with various and raw evaluation metrics), with over 325,876
samples in total (will be further increasing).
To accommodate varying computational budgets and model development stages, General-Bench
provides a dual-set evaluation protocol by dividing the test data for each task into two distinct subsets: the closed set
and the open set
.
General-Bench-Openset: This subset is designated specifically for leaderboard submissions. Only the input data is released publicly, while the corresponding ground-truth annotations are kept hidden. Participants are required to submit their model predictions, which will be centrally evaluated and ranked on the leaderboard.
General-Bench-Closeset: The open set includes both the input data and the corresponding ground-truth outputs. This allows researchers and developers to freely explore the dataset, conduct custom analyses, or use the data for publications and internal benchmarking, without participating in the official leaderboard.