Evaluation Metrics

Metric List

Since all tasks in General-Bench retain their original task definitions without altering the output or prediction format, our evaluation methods vary according to the nature of different tasks and data. Following, we show the evaluation metrics and methods used across all tasks.

General

Following, we show the general evaluation metrics.

Accuracy

Accuracy is defined as the ratio of correctly classified instances to the total number of instances.

Abbreviation:

Acc

Monotonicity:

monotonically increasing

Range:

[0,1]

Representative tasks:

Classification

Macro Accuracy

Macro-Acc evaluates how well a model performs on average across all classes, regardless of class imbalance

Abbreviation:

Macro-Acc

Monotonicity:

monotonically increasing

Range:

[0,1]

Representative tasks:

Event Relation Prediction

Human-aware Evaluation

GPT-Score

GPT-Score evaluates the instruction following rate with GPT assistance, as an alternative to human evaluation.

Abbreviation:

GPT-Score

Monotonicity:

monotonically increasing

Range:

[0,1]

Representative tasks:

Audio Question Answering

Mapping Functions of Scoring Metric

Most task evaluation scores, despite utilizing different metrics, fall within a 0-100% range, such as F1, Accuracy (Acc), and ROUGE-L, and follow a monotonically increasing trend. However, certain task metrics produce scores outside this range. For example, regression-related metrics, as well as FID, FVD, and similar metrics, range from 0 to infinity and follow a monotonically decreasing trend. In contrast, MOS scores are represented as a discrete 5-point scale. Due to these varying score ranges across tasks, it becomes intractable to normalize them to a unified scale for level score calculations. Thus, we design the following mapping functions to standardize these metrics into a 1-100% range, thereby streamlining the computation of level scoring algorithms.

  • MAE

    \[y = 2 \times \text{sigmoid}\left(\frac{50}{x}\right) -1 , \quad \text{where } x \in [0, +\infty), \quad y \in (0, 1).\]
  • RMS

    \[y = 2 \times \text{sigmoid}\left(\frac{50}{x}\right) -1, \quad \text{where } x \in [0, +\infty), \quad y \in (0, 1).\]