Statistics of the Image Comprehension Skills
This section presents detailed statistics for skills categorized under the Image
modality.
The image-related tasks are grouped based on their functional objectives and evaluated using standard multimodal benchmarks.
I-C-1 Behavior Recognition (Behavior Recog)
- ATM Booths Suspicious Behavior Recognition
This task focuses on detecting anomalous or suspicious behaviors in ATM booths using multimodal cues.
- Abbreviation:
ATM Behavior Reg
- Domain:
General
- Capability:
Content Recognition
- Data Source:
ATM Booths Anomaly Recognition
- Number:
1000
- SoTA Specialist:
CLIP
- Metrics:
Acc
- Human Action Recognition
This task targets the recognition of common human actions in diverse contexts.
- Abbreviation:
Human Action Recog
- Domain:
General
- Capability:
Content Recognition, Commonsense Knowledge
- Data source:
Human Action Recognition
- Number:
3000
- SoTA Specialist:
CLIP
- Metrics:
Acc
I-C-2 Code Generation (Code Gen)
- Sketch-to-HTML Code Generation
This task involves generating front-end HTML code from hand-drawn sketches.
- Abbreviation:
Sketch2HTML Gen
- Domain:
Code
- Capability:
Creativity and Innovation, Reasoning Ability
- Data Source:
Sketch2Code
- Number:
500
- SoTA Specialist:
CLIP
- Metrics:
Acc