Evaluation Metrics
Metric List
Since all tasks in General-Bench retain their original task definitions without altering the output or prediction format, our evaluation methods vary according to the nature of different tasks and data. Following, we show the evaluation metrics and methods used across all tasks.
General
Following, we show the general evaluation metrics.
- Accuracy
Accuracy is defined as the ratio of correctly classified instances to the total number of instances.
- Abbreviation:
Acc
- Monotonicity:
monotonically increasing
- Range:
[0,1]
- Representative tasks:
Classification
- Macro Accuracy
Macro-Acc evaluates how well a model performs on average across all classes, regardless of class imbalance
- Abbreviation:
Macro-Acc
- Monotonicity:
monotonically increasing
- Range:
[0,1]
- Representative tasks:
Event Relation Prediction
Human-aware Evaluation
- GPT-Score
GPT-Score evaluates the instruction following rate with GPT assistance, as an alternative to human evaluation.
- Abbreviation:
GPT-Score
- Monotonicity:
monotonically increasing
- Range:
[0,1]
- Representative tasks:
Audio Question Answering
Mapping Functions of Scoring Metric
Most task evaluation scores, despite utilizing different metrics, fall within a 0-100% range, such as F1, Accuracy (Acc), and ROUGE-L, and follow a monotonically increasing trend. However, certain task metrics produce scores outside this range. For example, regression-related metrics, as well as FID, FVD, and similar metrics, range from 0 to infinity and follow a monotonically decreasing trend. In contrast, MOS scores are represented as a discrete 5-point scale. Due to these varying score ranges across tasks, it becomes intractable to normalize them to a unified scale for level score calculations. Thus, we design the following mapping functions to standardize these metrics into a 1-100% range, thereby streamlining the computation of level scoring algorithms.
MAE
\[y = 2 \times \text{sigmoid}\left(\frac{50}{x}\right) -1 , \quad \text{where } x \in [0, +\infty), \quad y \in (0, 1).\]RMS
\[y = 2 \times \text{sigmoid}\left(\frac{50}{x}\right) -1, \quad \text{where } x \in [0, +\infty), \quad y \in (0, 1).\]