Statistics of the 3D Comprehension Skills

This section presents detailed statistics for skills categorized under the 3D modality. 3D comprehension tasks emphasize structural understanding, spatial reasoning, and object recognition in three-dimensional environments. Such tasks often require models to operate on point clouds, voxel grids, or mesh representations, and are evaluated using geometry-aware metrics.

D-C-2 3D Structure and Environment Classification (3D-Struct Cls)

This cluster focuses on classifying larger 3D structures and environments.

3D Furniture Classification

This task classifies 3D models of furniture such as chairs and tables.

Abbreviation:

3D-Furniture Cls

Domain:

General

Capability:

Content Recognition

Data Source:

ModelNet

Number:

400

SoTA Specialist:

PointGST

Metrics:

Acc

3D Structure Classification

This task classifies architectural structures in 3D data.

Abbreviation:

3D-Struct Cls

Domain:

General

Capability:

Content Recognition

Data Source:

ModelNet

Number:

40

SoTA Specialist:

PointGST

Metrics:

Acc

D-C-3 Transportation and Technology Object Classification (Tech Cls)

This cluster focuses on classifying technological and vehicle-related objects.

3D Electronic Classification

This task classifies electronic gadgets and equipment from 3D models.

Abbreviation:

3D-Electronic Cls

Domain:

General

Capability:

Content Recognition

Data Source:

ModelNet

Number:

140

SoTA Specialist:

PointGST

Metrics:

Acc

3D Vehicle Classification

This task classifies vehicles using 3D object data.

Abbreviation:

3D-Vehicle Cls

Domain:

General

Capability:

Content Recognition

Data Source:

ModelNet

Number:

200

SoTA Specialist:

PointGST

Metrics:

Acc

D-C-4 3D Indoor Scene Semantic Segmentation (Indoor-Scene Seg)

This cluster focuses on segmenting functional regions and items in 3D indoor environments.

3D Indoor Appliance Semantic Segmentation

This task segments different appliances in indoor 3D scans.

Abbreviation:

3D-Appliance Seg

Domain:

General

Capability:

Commonsense Knowledge

Data Source:

ScanNet

Number:

142

SoTA Specialist:

ODIN

Metrics:

mIoU

D-C-5 3D Outdoor Scene Semantic Segmentation (Outdoor-Scene Seg)

This cluster focuses on segmenting semantic regions in outdoor 3D scenes.

3D Outdoor Semantic Segmentation

This task segments outdoor environments into labeled regions in 3D.

Abbreviation:

3D-Outdr Seg

Domain:

General

Capability:

Commonsense Knowledge

Data Source:

Semantic KITTI

Number:

4071

SoTA Specialist:

OpenPCSeg

Metrics:

mIoU

D-C-6 3D Indoor Scene Instance Segmentation (Indoor-Inst Seg)

This cluster focuses on segmenting individual object instances within indoor scenes.

3D Indoor Instance Segmentation

This task segments each individual object instance in a 3D indoor environment.

Abbreviation:

3D-In-Instance Seg

Domain:

General

Capability:

Commonsense Knowledge

Data Source:

ScanNet

Number:

142

SoTA Specialist:

SphericalMask

Metrics:

mIoU

D-C-7 3D Pose Estimation (Pose Est)

This cluster focuses on solving 3D odometry problems involving geometric tracking over time.

3D Odometry

This task focuses on estimating 3D motion trajectories using odometry techniques.

Abbreviation:

3D Odometry

Domain:

Geometry

Capability:

Problem Solving

Data Source:

KITTI

Number:

10

SoTA Specialist:

CT-ICP

Metrics:

RTE

D-C-8 3D Part Segmentation (Part Seg)

This cluster focuses on segmenting parts of 3D objects across different categories using commonsense spatial knowledge.

3D Aircrafts Part Segmentation

This task segments structural parts of 3D aircraft models.

Abbreviation:

3D-Aircraft Seg

Domain:

General

Capability:

Commonsense Knowledge

Data Source:

ShapeNet Part

Number:

523

SoTA Specialist:

SPOTR

Metrics:

Instance mIoU

3D Personal Item Part Segmentation

This task segments parts of everyday personal items (e.g. bags, glasses) in 3D.

Abbreviation:

3D-Person Seg

Domain:

General

Capability:

Commonsense Knowledge

Data Source:

ShapeNet Part

Number:

346

SoTA Specialist:

SPOTR

Metrics:

Instance mIoU

3D Vehicle Part Segmentation

This task segments different parts of vehicles (e.g. wheels, doors) in 3D models.

Abbreviation:

3D-Vehicle Seg

Domain:

General

Capability:

Commonsense Knowledge

Data Source:

ShapeNet Part

Number:

288

SoTA Specialist:

SPOTR

Metrics:

Instance mIoU

3D Furniture Part Segmentation

This task segments functional components of furniture in 3D.

Abbreviation:

3D-Furniture Seg

Domain:

General

Capability:

Commonsense Knowledge

Data Source:

ShapeNet Part

Number:

2128

SoTA Specialist:

SPOTR

Metrics:

Instance mIoU

3D Tableware Part Segmentation

This task segments sub-parts of tableware (e.g., handles, bases) in 3D models.

Abbreviation:

3D-Tableware Seg

Domain:

General

Capability:

Commonsense Knowledge

Data Source:

ShapeNet Part

Number:

377

SoTA Specialist:

SPOTR

Metrics:

Instance mIoU

3D Weapon Part Segmentation

This task segments weapon components (e.g., barrel, grip) in 3D object models.

Abbreviation:

3D-Weapon Seg

Domain:

General

Capability:

Commonsense Knowledge

Data Source:

ShapeNet Part

Number:

133

SoTA Specialist:

SPOTR

Metrics:

Instance mIoU

D-C-9 3D Tracking (3D Track)

This cluster focuses on tracking 3D objects over time in dynamic scenes using spatial reasoning.

3D Tracking

This task tracks objects in 3D scenes through time.

Abbreviation:

3D Track

Domain:

General

Capability:

Commonsense Knowledge

Data Source:

NuScenes

Number:

500

SoTA Specialist:

CenterPoint

Metrics:

AMOTA

D-C-10 3D Geometry Feature Analysis (3D-Geo Analy)

This cluster focuses on analyzing geometric features of 3D objects, such as normals and curvature.

3D Normal Estimation

This task estimates surface normals for points in 3D scenes.

Abbreviation:

3D-Normal Est

Domain:

Geometry

Capability:

Problem Solving

Data Source:

PCPNet dataset

Number:

108

SoTA Specialist:

SHS-Net

Metrics:

RMSE

D-C-11 3D Detection (3D Det)

This cluster focuses on detecting and recognizing objects from 3D input scenes.

3D Detection

This task detects and classifies objects from 3D point clouds.

Abbreviation:

3D Detection

Domain:

General

Capability:

Content Recognition

Data Source:

NuScenes

Number:

500

SoTA Specialist:

BEVFusion

Metrics:

mAP

D-C-12 3D Question Answering (3D QA)

This cluster focuses on answering spatial and situational questions based on 3D scenes.

3D Spatial Scene Question Answering

This task answers spatial reasoning questions about 3D scenes.

Abbreviation:

3D-Space QA

Domain:

General

Capability:

Commonsense Knowledge

Data Source:

ScanQA

Number:

4675

SoTA Specialist:

SIG3D

Metrics:

BLEU@4

3D Situated Question Answering on "What"

This task focuses on answering “what”-type questions about object presence in 3D.

Abbreviation:

3D-“What” QA

Domain:

General

Capability:

Commonsense Knowledge

Data Source:

SQA3D

Number:

1147

SoTA Specialist:

SIG3D

Metrics:

EM@1

3D Situated Question Answering on "Is"

This task focuses on verifying facts via “is”-type binary questions in 3D.

Abbreviation:

3D-“Is” QA

Domain:

General

Capability:

Commonsense Knowledge

Data Source:

SQA3D

Number:

652

SoTA Specialist:

SIG3D

Metrics:

EM@1

3D Situated Question Answering on "How"

This task involves answering “how”-type questions in 3D, often about quantities or distances.

Abbreviation:

3D-“How” QA

Domain:

General

Capability:

Commonsense Knowledge

Data Source:

SQA3D

Number:

465

SoTA Specialist:

SIG3D

Metrics:

EM@1

3D Situated Question Answering on "Can"

This task involves action-feasibility questions in 3D scenes.

Abbreviation:

3D-“Can” QA

Domain:

General

Capability:

Commonsense Knowledge

Data Source:

ScanQA

Number:

684

SoTA Specialist:

SIG3D

Metrics:

EM@1

3D Situated Question Answering on "Which"

This task asks models to select correct objects in response to “which”-type queries.

Abbreviation:

3D-“Which” QA

Domain:

General

Capability:

Commonsense Knowledge

Data Source:

ScanQA

Number:

622

SoTA Specialist:

SIG3D

Metrics:

EM@1

3D Situated Question Answering on "Other"

This task addresses miscellaneous 3D questions outside the common categories.

Abbreviation:

3D-“Other” QA

Domain:

General

Capability:

Commonsense Knowledge

Data Source:

ScanQA

Number:

566

SoTA Specialist:

SIG3D

Metrics:

EM@1

D-C-13 3D Motion Understanding (3D-Motion Analy)

This cluster focuses on understanding movement within 3D environments and generating descriptive captions for dynamic scenes.

3D Motion Captioning

This task generates natural language captions describing 3D motion.

Abbreviation:

3D-Motion Cap

Domain:

General

Capability:

Commonsense Knowledge

Data Source:

KIT-ML

Number:

4383

SoTA Specialist:

2ET-Interpretable

Metrics:

BLEU-4