Statistics of the Image Comprehension Skills

This section presents detailed statistics for skills categorized under the Image modality. The image-related tasks are grouped based on their functional objectives and evaluated using standard multimodal benchmarks.

I-C-1 Behavior Recognition (Behavior Recog)

ATM Booths Suspicious Behavior Recognition

This task focuses on detecting anomalous or suspicious behaviors in ATM booths using multimodal cues.

Abbreviation:

ATM Behavior Reg

Domain:

General

Capability:

Content Recognition

Data Source:

ATM Booths Anomaly Recognition

Number:

1000

SoTA Specialist:

CLIP

Metrics:

Acc

Human Action Recognition

This task targets the recognition of common human actions in diverse contexts.

Abbreviation:

Human Action Recog

Domain:

General

Capability:

Content Recognition, Commonsense Knowledge

Data source:

Human Action Recognition

Number:

3000

SoTA Specialist:

CLIP

Metrics:

Acc

I-C-2 Code Generation (Code Gen)

Sketch-to-HTML Code Generation

This task involves generating front-end HTML code from hand-drawn sketches.

Abbreviation:

Sketch2HTML Gen

Domain:

Code

Capability:

Creativity and Innovation, Reasoning Ability

Data Source:

Sketch2Code

Number:

500

SoTA Specialist:

CLIP

Metrics:

Acc