Evaluation Metrics 

Metric List 

Since all tasks in General-Bench retain their original task definitions without altering the output or prediction format, our evaluation methods vary according to the nature of different tasks and data. Following, we show the evaluation metrics and methods used across all tasks.

General 

Following, we show the general evaluation metrics.

Accuracy

Accuracy is defined as the ratio of correctly classified instances to the total number of instances.

Abbreviation:: Acc
Monotonicity:: monotonically increasing
Range:: [0,1]
Representative tasks:: Classification

Macro Accuracy

Macro-Acc evaluates how well a model performs on average across all classes, regardless of class imbalance

Abbreviation:: Macro-Acc
Monotonicity:: monotonically increasing
Range:: [0,1]
Representative tasks:: Event Relation Prediction

Human-aware Evaluation 

GPT-Score

GPT-Score evaluates the instruction following rate with GPT assistance, as an alternative to human evaluation.

Abbreviation:: GPT-Score
Monotonicity:: monotonically increasing
Range:: [0,1]
Representative tasks:: Audio Question Answering

Mapping Functions of Scoring Metric 

Most task evaluation scores, despite utilizing different metrics, fall within a 0-100% range, such as F1, Accuracy (Acc), and ROUGE-L, and follow a monotonically increasing trend. However, certain task metrics produce scores outside this range. For example, regression-related metrics, as well as FID, FVD, and similar metrics, range from 0 to infinity and follow a monotonically decreasing trend. In contrast, MOS scores are represented as a discrete 5-point scale. Due to these varying score ranges across tasks, it becomes intractable to normalize them to a unified scale for level score calculations. Thus, we design the following mapping functions to standardize these metrics into a 1-100% range, thereby streamlining the computation of level scoring algorithms.

MAE

\[y = 2 \times \text{sigmoid}\left(\frac{50}{x}\right) -1 , \quad \text{where } x \in [0, +\infty), \quad y \in (0, 1).\]
RMS

\[y = 2 \times \text{sigmoid}\left(\frac{50}{x}\right) -1, \quad \text{where } x \in [0, +\infty), \quad y \in (0, 1).\]