New Dataset

Contribute to New Dataset

We warmly welcome contributions of new datasets to expand the capability of General-Bench in evaluating diverse and complex skills of generalist models. Especially encouraged are datasets that focus on underrepresented skills or involve multi-skill/multi-modal combinations that challenge current multimodal systems.

If you would like to add a new dataset to the benchmark, please follow the steps below:

Step 1: Prepare Your Dataset

Ensure that your dataset adheres to the standardized data format defined in our Data Format documentation. This includes:

  • A clear task name

  • Associated skill category

  • Input/output samples formatted according to our specifications

The dataset should be well-structured and ready for seamless integration into our evaluation pipeline.

Step 2: Prepare Evaluation Protocol and Documentation

You must also provide detailed documentation describing your evaluation protocol, including:

1. Evaluation Metric(s):

Clearly define how performance will be measured (e.g., BLEU, F1, Accuracy, CIDEr, etc.)

2. Evaluation Code:

Submit a working script (preferably in Python) that implements the metric computation. Please ensure it is modular and readable.

3. README File:

Provide a README file that includes:

  • Installation instructions and environment dependencies

  • Parameter usage and configuration options

  • Example commands for running the evaluation script

This will ensure that your dataset and evaluation method are easy to reproduce and integrate.

Step 3: Submit Your Dataset

Once all required files are ready, you may submit your dataset via our official contribution portal: [Submission Link Placeholder].

Please include:

  • Dataset archive (e.g., .zip or GitHub link)

  • Evaluation script and documentation

  • Contact email for follow-up if necessary

Your contribution will be reviewed by the benchmark maintainers. Approved datasets will be added to the open evaluation pool and made available to the community through the platform.