Statistics of the 3D Comprehension Skills
This section presents detailed statistics for skills categorized under the 3D
modality.
3D comprehension tasks emphasize structural understanding, spatial reasoning, and object recognition in three-dimensional environments.
Such tasks often require models to operate on point clouds, voxel grids, or mesh representations, and are evaluated using geometry-aware metrics.
D-C-2 3D Structure and Environment Classification (3D-Struct Cls)
This cluster focuses on classifying larger 3D structures and environments.
- 3D Furniture Classification
This task classifies 3D models of furniture such as chairs and tables.
- Abbreviation:
3D-Furniture Cls
- Domain:
General
- Capability:
Content Recognition
- Data Source:
ModelNet
- Number:
400
- SoTA Specialist:
PointGST
- Metrics:
Acc
- 3D Structure Classification
This task classifies architectural structures in 3D data.
- Abbreviation:
3D-Struct Cls
- Domain:
General
- Capability:
Content Recognition
- Data Source:
ModelNet
- Number:
40
- SoTA Specialist:
PointGST
- Metrics:
Acc
D-C-3 Transportation and Technology Object Classification (Tech Cls)
This cluster focuses on classifying technological and vehicle-related objects.
- 3D Electronic Classification
This task classifies electronic gadgets and equipment from 3D models.
- Abbreviation:
3D-Electronic Cls
- Domain:
General
- Capability:
Content Recognition
- Data Source:
ModelNet
- Number:
140
- SoTA Specialist:
PointGST
- Metrics:
Acc
- 3D Vehicle Classification
This task classifies vehicles using 3D object data.
- Abbreviation:
3D-Vehicle Cls
- Domain:
General
- Capability:
Content Recognition
- Data Source:
ModelNet
- Number:
200
- SoTA Specialist:
PointGST
- Metrics:
Acc
D-C-4 3D Indoor Scene Semantic Segmentation (Indoor-Scene Seg)
This cluster focuses on segmenting functional regions and items in 3D indoor environments.
- 3D Indoor Appliance Semantic Segmentation
This task segments different appliances in indoor 3D scans.
- Abbreviation:
3D-Appliance Seg
- Domain:
General
- Capability:
Commonsense Knowledge
- Data Source:
ScanNet
- Number:
142
- SoTA Specialist:
ODIN
- Metrics:
mIoU
D-C-5 3D Outdoor Scene Semantic Segmentation (Outdoor-Scene Seg)
This cluster focuses on segmenting semantic regions in outdoor 3D scenes.
- 3D Outdoor Semantic Segmentation
This task segments outdoor environments into labeled regions in 3D.
- Abbreviation:
3D-Outdr Seg
- Domain:
General
- Capability:
Commonsense Knowledge
- Data Source:
Semantic KITTI
- Number:
4071
- SoTA Specialist:
OpenPCSeg
- Metrics:
mIoU
D-C-6 3D Indoor Scene Instance Segmentation (Indoor-Inst Seg)
This cluster focuses on segmenting individual object instances within indoor scenes.
- 3D Indoor Instance Segmentation
This task segments each individual object instance in a 3D indoor environment.
- Abbreviation:
3D-In-Instance Seg
- Domain:
General
- Capability:
Commonsense Knowledge
- Data Source:
ScanNet
- Number:
142
- SoTA Specialist:
SphericalMask
- Metrics:
mIoU
D-C-7 3D Pose Estimation (Pose Est)
This cluster focuses on solving 3D odometry problems involving geometric tracking over time.
- 3D Odometry
This task focuses on estimating 3D motion trajectories using odometry techniques.
- Abbreviation:
3D Odometry
- Domain:
Geometry
- Capability:
Problem Solving
- Data Source:
KITTI
- Number:
10
- SoTA Specialist:
CT-ICP
- Metrics:
RTE
D-C-8 3D Part Segmentation (Part Seg)
This cluster focuses on segmenting parts of 3D objects across different categories using commonsense spatial knowledge.
- 3D Aircrafts Part Segmentation
This task segments structural parts of 3D aircraft models.
- Abbreviation:
3D-Aircraft Seg
- Domain:
General
- Capability:
Commonsense Knowledge
- Data Source:
ShapeNet Part
- Number:
523
- SoTA Specialist:
SPOTR
- Metrics:
Instance mIoU
- 3D Personal Item Part Segmentation
This task segments parts of everyday personal items (e.g. bags, glasses) in 3D.
- Abbreviation:
3D-Person Seg
- Domain:
General
- Capability:
Commonsense Knowledge
- Data Source:
ShapeNet Part
- Number:
346
- SoTA Specialist:
SPOTR
- Metrics:
Instance mIoU
- 3D Vehicle Part Segmentation
This task segments different parts of vehicles (e.g. wheels, doors) in 3D models.
- Abbreviation:
3D-Vehicle Seg
- Domain:
General
- Capability:
Commonsense Knowledge
- Data Source:
ShapeNet Part
- Number:
288
- SoTA Specialist:
SPOTR
- Metrics:
Instance mIoU
- 3D Furniture Part Segmentation
This task segments functional components of furniture in 3D.
- Abbreviation:
3D-Furniture Seg
- Domain:
General
- Capability:
Commonsense Knowledge
- Data Source:
ShapeNet Part
- Number:
2128
- SoTA Specialist:
SPOTR
- Metrics:
Instance mIoU
- 3D Tableware Part Segmentation
This task segments sub-parts of tableware (e.g., handles, bases) in 3D models.
- Abbreviation:
3D-Tableware Seg
- Domain:
General
- Capability:
Commonsense Knowledge
- Data Source:
ShapeNet Part
- Number:
377
- SoTA Specialist:
SPOTR
- Metrics:
Instance mIoU
- 3D Weapon Part Segmentation
This task segments weapon components (e.g., barrel, grip) in 3D object models.
- Abbreviation:
3D-Weapon Seg
- Domain:
General
- Capability:
Commonsense Knowledge
- Data Source:
ShapeNet Part
- Number:
133
- SoTA Specialist:
SPOTR
- Metrics:
Instance mIoU
D-C-9 3D Tracking (3D Track)
This cluster focuses on tracking 3D objects over time in dynamic scenes using spatial reasoning.
- 3D Tracking
This task tracks objects in 3D scenes through time.
- Abbreviation:
3D Track
- Domain:
General
- Capability:
Commonsense Knowledge
- Data Source:
NuScenes
- Number:
500
- SoTA Specialist:
CenterPoint
- Metrics:
AMOTA
D-C-10 3D Geometry Feature Analysis (3D-Geo Analy)
This cluster focuses on analyzing geometric features of 3D objects, such as normals and curvature.
- 3D Normal Estimation
This task estimates surface normals for points in 3D scenes.
- Abbreviation:
3D-Normal Est
- Domain:
Geometry
- Capability:
Problem Solving
- Data Source:
PCPNet dataset
- Number:
108
- SoTA Specialist:
SHS-Net
- Metrics:
RMSE
D-C-11 3D Detection (3D Det)
This cluster focuses on detecting and recognizing objects from 3D input scenes.
- 3D Detection
This task detects and classifies objects from 3D point clouds.
- Abbreviation:
3D Detection
- Domain:
General
- Capability:
Content Recognition
- Data Source:
NuScenes
- Number:
500
- SoTA Specialist:
BEVFusion
- Metrics:
mAP
D-C-12 3D Question Answering (3D QA)
This cluster focuses on answering spatial and situational questions based on 3D scenes.
- 3D Spatial Scene Question Answering
This task answers spatial reasoning questions about 3D scenes.
- Abbreviation:
3D-Space QA
- Domain:
General
- Capability:
Commonsense Knowledge
- Data Source:
ScanQA
- Number:
4675
- SoTA Specialist:
SIG3D
- Metrics:
BLEU@4
- 3D Situated Question Answering on "What"
This task focuses on answering “what”-type questions about object presence in 3D.
- Abbreviation:
3D-“What” QA
- Domain:
General
- Capability:
Commonsense Knowledge
- Data Source:
SQA3D
- Number:
1147
- SoTA Specialist:
SIG3D
- Metrics:
EM@1
- 3D Situated Question Answering on "Is"
This task focuses on verifying facts via “is”-type binary questions in 3D.
- Abbreviation:
3D-“Is” QA
- Domain:
General
- Capability:
Commonsense Knowledge
- Data Source:
SQA3D
- Number:
652
- SoTA Specialist:
SIG3D
- Metrics:
EM@1
- 3D Situated Question Answering on "How"
This task involves answering “how”-type questions in 3D, often about quantities or distances.
- Abbreviation:
3D-“How” QA
- Domain:
General
- Capability:
Commonsense Knowledge
- Data Source:
SQA3D
- Number:
465
- SoTA Specialist:
SIG3D
- Metrics:
EM@1
- 3D Situated Question Answering on "Can"
This task involves action-feasibility questions in 3D scenes.
- Abbreviation:
3D-“Can” QA
- Domain:
General
- Capability:
Commonsense Knowledge
- Data Source:
ScanQA
- Number:
684
- SoTA Specialist:
SIG3D
- Metrics:
EM@1
- 3D Situated Question Answering on "Which"
This task asks models to select correct objects in response to “which”-type queries.
- Abbreviation:
3D-“Which” QA
- Domain:
General
- Capability:
Commonsense Knowledge
- Data Source:
ScanQA
- Number:
622
- SoTA Specialist:
SIG3D
- Metrics:
EM@1
- 3D Situated Question Answering on "Other"
This task addresses miscellaneous 3D questions outside the common categories.
- Abbreviation:
3D-“Other” QA
- Domain:
General
- Capability:
Commonsense Knowledge
- Data Source:
ScanQA
- Number:
566
- SoTA Specialist:
SIG3D
- Metrics:
EM@1
D-C-13 3D Motion Understanding (3D-Motion Analy)
This cluster focuses on understanding movement within 3D environments and generating descriptive captions for dynamic scenes.
- 3D Motion Captioning
This task generates natural language captions describing 3D motion.
- Abbreviation:
3D-Motion Cap
- Domain:
General
- Capability:
Commonsense Knowledge
- Data Source:
KIT-ML
- Number:
4383
- SoTA Specialist:
2ET-Interpretable
- Metrics:
BLEU-4