Statistics of the Language Skills

This section presents detailed statistics for skills categorized under the Language modality. Language tasks assess a wide range of linguistic and cognitive capabilities, including comprehension, commonsense reasoning, intent understanding, and natural language inference. Models are expected to process and reason over textual inputs to produce accurate and context-aware outputs.

L-1 Cognitive Question (Cog QA)

This cluster evaluates models on different types of reasoning and cognitive inference abilities based on textual input.

Commonsense Reasoning

This task evaluates the ability to apply general world knowledge to infer plausible outcomes.

Abbreviation:

Common Reason

Domain:

General

Capability:

Commonsense Knowledge

Data Source:

commonsense_qa

Number:

510

SoTA Specialist:

Flan-T5-Large

Metrics:

Acc

Causal Reasoning

This task focuses on identifying cause-effect relationships between events.

Abbreviation:

Causal Reason

Domain:

General

Capability:

Causality Discrimination

Data Source:

corr2cause

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

Acc

Document-Level Causal Reasoning

This task measures the ability to extract and reason over causal relations across a full document.

Abbreviation:

Doc-Causal Reason

Domain:

General

Capability:

Reasoning Ability

Data Source:

qa4mre

Number:

564

SoTA Specialist:

Flan-T5-Large

Metrics:

BLEU-1

Counterfactual Reasoning

This task evaluates a model’s ability to reason about alternate scenarios.

Abbreviation:

Counterfact Reason

Domain:

General

Capability:

Causality Discrimination

Data Source:

Counterfactual-Reasoning-Capacity-of-Large-Language-Models (crc)

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

F1

Analogical Reasoning

This task tests the ability to identify abstract relational similarities between different pairs.

Abbreviation:

Analog Reason

Domain:

General

Capability:

Reasoning Ability

Data Source:

BATS

Number:

456

SoTA Specialist:

Flan-T5-Large

Metrics:

Acc

Multi-Hop Question Answering

This task involves reasoning over multiple supporting facts to answer a question.

Abbreviation:

Multi-Hop QA

Domain:

General

Capability:

Reasoning Ability

Data Source:

hotpotQA

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

BLEU-1

Temporal Reasoning

This task evaluates the understanding and inference of time-based relationships between events.

Abbreviation:

Temporal Reason

Domain:

General

Capability:

Reasoning Ability

Data Source:

time_dial (Qin et al.,_

L-2 Ethical NLP (Ethics NLP)

This cluster assesses the ability of models to reason about or respond appropriately in ethically sensitive or culturally aware contexts.

Ethical Reasoning

This task evaluates moral and ethical reasoning ability in text.

Abbreviation:

Ethical Reason

Domain:

Ethics

Capability:

Ethical Awareness

Data Source:

ethics

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

Acc

Truthful Question Answering

This task focuses on generating factually and ethically accurate responses to questions.

Abbreviation:

Truthful QA

Domain:

Ethics

Capability:

Ethical Awareness

Data Source:

truthful_qa

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

BLEU-1

This task involves interpreting and answering law-related questions ethically and accurately.

Abbreviation:

Legal QA

Domain:

Law

Capability:

Ethical Awareness

Data Source:

JEC-QA

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

Acc

Bias Detection

This task detects socially or culturally biased statements in text.

Abbreviation:

Bias Detect

Domain:

Culture

Capability:

Affective Analysis

Data Source:

WNC

Number:

500

SoTA Specialist:

Roberta-Large

Metrics:

Micro-F1

Offensive Classification

This task identifies offensive or harmful language based on social norms.

Abbreviation:

Offensive Classify

Domain:

Culture

Capability:

Affective Analysis

Data Source:

offensive-speech (ock)

Number:

500

SoTA Specialist:

Roberta-Large

Metrics:

Micro-F1

Hate Speech Detection

This task flags hate-driven or discriminatory content in language.

Abbreviation:

Hate-Speech Detect

Domain:

Culture

Capability:

Affective Analysis

Data Source:

hate-speech (ock)

Number:

500

SoTA Specialist:

Roberta-Large

Metrics:

Micro-F1

Spam Detection

This task detects and filters spam or unsolicited content.

Abbreviation:

Spam Detect

Domain:

Culture

Capability:

Causality Discrimination

Data Source:

spam-email (spa)

Number:

500

SoTA Specialist:

Roberta-Large

Metrics:

Micro-F1

Fake News Detection

This task detects false or misleading news content.

Abbreviation:

Fake-News Detect

Domain:

Humanities

Capability:

Ethical Awareness

Data Source:

fake-news (fak)

Number:

500

SoTA Specialist:

Roberta-Large

Metrics:

Micro-F1

Fact Verification

This task verifies the truthfulness of claims based on supporting evidence.

Abbreviation:

Fact Verify

Domain:

General

Capability:

Causality Discrimination

Data Source:

Counterfactual

Number:

500

SoTA Specialist:

Roberta-Large

Metrics:

Micro-F1

L-3 Domain-Specific QA (Domain QA)

This cluster evaluates models on question answering across specialized knowledge domains such as biomedicine, healthcare, and engineering.

Biomedical Question Answering

This task covers factual and reasoning questions in the biomedical field.

Abbreviation:

Biomedical QA

Domain:

Biomedical

Capability:

Reasoning Ability

Data Source:

BioASQ (bqb)

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

BLEU-1

Medical Question Answering

This task targets open-domain medical question answering from clinical or patient-related queries.

Abbreviation:

Medical QA

Domain:

Medical

Capability:

Reasoning Ability

Data Source:

ChatDoctor-HealthCareMagic-Knowledge

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

BLEU-1

Technical Question Answering

This task covers questions in technical or engineering domains such as programming and systems.

Abbreviation:

Tech QA

Domain:

Engineering

Capability:

Reasoning Ability

Data Source:

Stack Overflow Data (sod)

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

BLEU-1

Engineering Question Answering

This task evaluates problem-solving and factual answering in engineering contexts.

Abbreviation:

Eng QA

Domain:

Engineering

Capability:

Reasoning Ability

Data Source:

University of Bath (eqa)

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

BLEU-1

Science Question Answering

This task involves answering questions based on scientific knowledge and reasoning.

Abbreviation:

Science QA

Domain:

Science

Capability:

Reasoning Ability

Data Source:

sciQ

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

Acc

Earth Question Answering

This task evaluates earth science-related question answering skills.

Abbreviation:

Earth QA

Domain:

Earth

Capability:

Reasoning Ability

Data Source:

Manual

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

Acc

Nature Question Answering

This task assesses the ability to answer questions about nature and environment.

Abbreviation:

Nature QA

Domain:

Nature

Capability:

Reasoning Ability

Data Source:

Manual

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

Acc

L-4 Social QA (Social QA)

This cluster focuses on social and humanistic domains, measuring models’ reasoning ability in areas such as philosophy, art, business, and history.

Humanities Question Answering

This task evaluates reasoning in the humanities including literature, culture, and society.

Abbreviation:

Humanities QA

Domain:

Humanities

Capability:

Reasoning Ability

Data Source:

squad_v2

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

BLEU-1

Social Science Question Answering

This task focuses on questions from sociology, economics, and related disciplines.

Abbreviation:

Social-Sci QA

Domain:

Social

Capability:

Reasoning Ability

Data Source:

Social IQA

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

Acc

Philosophical Question Answering

This task evaluates a model’s reasoning within abstract and philosophical concepts.

Abbreviation:

Philosophy QA

Domain:

Philosophy

Capability:

Reasoning Ability

Data Source:

strix-philosophy-qa (qpa)

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

Acc

Art Question Answering

This task evaluates question answering related to visual or performing arts.

Abbreviation:

Art QA

Domain:

Art

Capability:

Reasoning Ability

Data Source:

Manual

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

Acc

Business Question Answering

This task covers reasoning and problem-solving in business scenarios.

Abbreviation:

Business QA

Domain:

Business

Capability:

Reasoning Ability

Data Source:

Manual

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

Acc

History Question Answering

This task focuses on historical fact retrieval and reasoning from textual sources.

Abbreviation:

History QA

Domain:

History

Capability:

Reasoning Ability

Data Source:

wiki-reading

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

BLEU-1

L-5 Non-Traditional QA (Non-Trad QA)

This cluster includes question answering formats that are less conventional, often grounded in informal or creative content.

Fairytale Question Answering

This task focuses on answering questions about narrative and fictional content.

Abbreviation:

Fairytale QA

Domain:

Culture

Capability:

Reasoning Ability

Data Source:

FairytaleQA

Number:

501

SoTA Specialist:

Flan-T5-Large

Metrics:

BLEU-1

Tweet Question Answering

This task addresses question answering over social media text like tweets.

Abbreviation:

Tweet QA

Domain:

Social

Capability:

Reasoning Ability

Data Source:

tweet_qa

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

F1

Trivial QA

This task targets general knowledge questions in a quiz or trivia format.

Abbreviation:

Trivial QA

Domain:

Social

Capability:

Reasoning Ability

Data Source:

trivia_qa

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

F1

L-6 Advanced QA (Advance QA)

This cluster tests more sophisticated reasoning in QA, including multilingual and conversational abilities.

Open-Domain Question Answering

This task involves retrieving and answering open-domain factual questions.

Abbreviation:

Open-Domain QA

Domain:

General

Capability:

Reasoning Ability

Data Source:

squad_v2

Number:

513

SoTA Specialist:

Flan-T5-Large

Metrics:

BLEU-1

Conversational Question Answering

This task answers context-dependent questions in a dialogue setting.

Abbreviation:

Conversation QA

Domain:

General

Capability:

Reasoning Ability

Data Source:

coqa

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

BLEU-1

Table Question Answering

This task evaluates a model’s ability to interpret and reason over tabular data.

Abbreviation:

Table QA

Domain:

General

Capability:

Reasoning Ability

Data Source:

wikitablequestions

Number:

590

SoTA Specialist:

Flan-T5-Large

Metrics:

F1

Multilingual Question Answering

This task focuses on answering questions posed in multiple languages.

Abbreviation:

Multi-Lang QA

Domain:

Linguistics

Capability:

Reasoning Ability

Data Source:

multilingual_qa (mqa)

Number:

500

SoTA Specialist:

mT5-Large

Metrics:

BLEU-1

Code-Switch Question Answering

This task involves QA with mixed-language input (code-switching).

Abbreviation:

Code-Switch QA

Domain:

Linguistics

Capability:

Reasoning Ability

Data Source:

Manual

Number:

500

SoTA Specialist:

mT5-Large

Metrics:

Acc

L-7 Math Problem Solving (Math Ability)

This cluster focuses on solving numerical and algebraic problems using mathematical reasoning.

Math Question Answering

This task targets direct mathematical reasoning and fact recall.

Abbreviation:

Math QA

Domain:

Math

Capability:

Reasoning Ability

Data Source:

math-QA

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

Acc

Mathematical Word Problem Solving

This task addresses multi-step word problems requiring logical mathematical reasoning.

Abbreviation:

Math Word Prob

Domain:

Math

Capability:

Problem Solving

Data Source:

math-QA

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

Acc

Mathematical Proof Generation

This task involves generating step-by-step mathematical proofs.

Abbreviation:

Math Proof Gen

Domain:

Math

Capability:

Problem Solving

Data Source:

math-QA

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

BLEU-1

L-8 Code Problem Solving (Code Ability)

This cluster evaluates models on understanding and debugging code through explanations and defect detection.

Code Explanation

This task tests the ability to explain the logic of code snippets.

Abbreviation:

Code Explain

Domain:

Code

Capability:

Problem Solving

Data Source:

CodeXGLUE

Number:

508

SoTA Specialist:

CodeT5

Metrics:

BLEU-1

Code Defect Detection

This task detects bugs or defects in code.

Abbreviation:

Code Defect Detect

Domain:

Code

Capability:

Problem Solving

Data Source:

CodeXGLUE

Number:

501

SoTA Specialist:

CodeT5

Metrics:

Acc

Code Generation

This task generates code from natural language prompts or specifications.

Abbreviation:

Code Gen

Domain:

Code

Capability:

Problem Solving

Data Source:

mbpp (mbp)

Number:

500

SoTA Specialist:

CodeT5

Metrics:

CodeBLEU

Code Repair

This task detects and corrects code defects in provided programs.

Abbreviation:

Code Repair

Domain:

Code

Capability:

Problem Solving

Data Source:

CodeXGLUE

Number:

500

SoTA Specialist:

CodeT5

Metrics:

CodeBLEU

Text-to-SQL Generation

This task generates SQL queries based on natural language input.

Abbreviation:

Txt2SQL Gen

Domain:

Code

Capability:

Problem Solving

Data Source:

Text-to-sql-v1 ()

Number:

600

SoTA Specialist:

CodeT5

Metrics:

BLEU-1

L-9 Cross-lingual Translation (X-Lan&NMT)

This cluster focuses on multilingual understanding and translation across diverse languages.

Multilingual Translation

This task translates across a wide range of language pairs.

Abbreviation:

Multi-lang Trans

Domain:

General

Capability:

Multilingual Capability

Data Source:

Manual

Number:

504

SoTA Specialist:

Flan-T5-Large

Metrics:

ROUGE-1

Low-Resource Translation

This task handles translation between languages with limited training data.

Abbreviation:

Low-Res Trans

Domain:

General

Capability:

Multilingual Capability

Data Source:

flores_101

Number:

502

SoTA Specialist:

Flan-T5-Large

Metrics:

ROUGE-1

English-Chinese Translation

This task translates between English and Chinese.

Abbreviation:

En-Zh Trans

Domain:

General

Capability:

Multilingual Capability

Data Source:

Manual

Number:

501

SoTA Specialist:

Flan-T5-Large

Metrics:

ROUGE-1

English-French Translation

This task translates between English and French.

Abbreviation:

En-Fr Trans

Domain:

General

Capability:

Multilingual Capability

Data Source:

Manual

Number:

501

SoTA Specialist:

Flan-T5-Large

Metrics:

ROUGE-1

L-10 Text Summarization (Txt Sum)

This cluster tests summarization skills, from extractive to abstractive methods.

Extractive Summarization

This task selects key sentences from the source document.

Abbreviation:

Extract Summ

Domain:

General

Capability:

Text Manipulation

Data Source:

cnn_dailymail

Number:

503

SoTA Specialist:

BART-Large

Metrics:

ROUGE-1

Abstractive Summarization

This task creates novel summaries that may rephrase source content.

Abbreviation:

Abstract Summ

Domain:

General

Capability:

Text Manipulation

Data Source:

xsum

Number:

500

SoTA Specialist:

BART-Large

Metrics:

ROUGE-1

Multi-Document Summarization

This task summarizes across multiple documents into one concise output.

Abbreviation:

Multi-Doc Sum

Domain:

General

Capability:

Text Manipulation

Data Source:

multi_news

Number:

500

SoTA Specialist:

BART-Large

Metrics:

ROUGE-1

L-11 Dialogue Generation (Dialog Gen)

This cluster evaluates interactive response generation in a multi-turn conversational setting.

Multi-Turn Daily Dialogue Generation

This task generates natural responses in everyday dialogue.

Abbreviation:

Daily Dialogue Gen

Domain:

General

Capability:

Interactive Capability, Cognition Understanding

Data Source:

daily_dialogue

Number:

553

SoTA Specialist:

DialoGPT

Metrics:

BLEU-1

L-12 Text Generation (TxT Gen)

This cluster focuses on creative or corrective forms of free-form text generation.

Table-to-Text Generation

This task generates coherent text from structured tabular data.

Abbreviation:

Table-to-Text

Domain:

General

Capability:

Text Manipulation

Data Source:

ToTTo

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

BLEU-1

Text Style Transfer

This task transforms text from one style to another (e.g., formal to informal).

Abbreviation:

Style Transfer

Domain:

General

Capability:

Creativity and Innovation

Data Source:

style-transfer-paraphrase, text_style_transfer

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

BLEU-1

Story Generation

This task involves open-ended narrative generation.

Abbreviation:

Story Gen

Domain:

General

Capability:

Creativity and Innovation

Data Source:

story-generation

Number:

500

SoTA Specialist:

mFlan-T5-Large

Metrics:

BLEU-1

Paraphrase Generation

This task creates alternate phrasings of a given sentence or paragraph.

Abbreviation:

Paraphrase

Domain:

General

Capability:

Creativity and Innovation

Data Source:

manual (sto)

Number:

633

SoTA Specialist:

Flan-T5-Large

Metrics:

BLEU-1

Grammar Correction

This task corrects grammatical errors in text while preserving the original meaning.

Abbreviation:

Grammar Correct

Domain:

General

Capability:

Text Manipulation

Data Source:

Grammar Correction (gra)

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

BLEU-1

L-13 Time Series Analysis (Time Series)

This cluster focuses on learning and predicting sequential patterns over time.

Time Series Prediction

This task involves predicting future values from historical time series data.

Abbreviation:

Time Series Pred

Domain:

General

Capability:

Temporal Determination, Reasoning Ability

Data Source:

Manual

Number:

510

SoTA Specialist:

TimesFm-1.0-200m

Metrics:

RMSE

L-14 Content Categorization (Txt Cls)

This cluster assesses the ability to assign topic or category labels to text.

Topic Classification

This task assigns topic labels to input text.

Abbreviation:

Topic Cls

Domain:

General

Capability:

Content Recognition

Data Source:

Topic (top)

Number:

500

SoTA Specialist:

Roberta-Large

Metrics:

Micro-F1

L-15 Text Entailment (Txt Entail)

This cluster evaluates textual inference, including recognizing entailment, contradiction, and neutrality.

Natural Language Inference

This task determines whether a hypothesis is entailed by a premise.

Abbreviation:

NLI

Domain:

General

Capability:

Reasoning Ability, Causality Discrimination

Data Source:

SNLI

Number:

500

SoTA Specialist:

Roberta-Large

Metrics:

Micro-F1

L-16 Semantic Analysis (Sem Analy)

This cluster focuses on extracting, comparing, and interpreting meaning from text.

Sentence Similarity Detection

This task evaluates whether two sentences have similar meanings.

Abbreviation:

Sent Similar

Domain:

General

Capability:

Reasoning Ability, Causality Discrimination

Data Source:

QuoraQP (sen)

Number:

500

SoTA Specialist:

Roberta-Large

Metrics:

Micro-F1

Intent Detection

This task identifies the underlying intent of a sentence, often used in dialogue systems.

Abbreviation:

Intent Det

Domain:

General, Social, Culture

Capability:

Cognition Understanding, Reasoning Ability

Data Source:

Intent (int)

Number:

500

SoTA Specialist:

Roberta-Large

Metrics:

Micro-F1

Stance Detection

This task determines whether the author’s position is in favor of, against, or neutral toward a target.

Abbreviation:

Stance Detect

Domain:

General, Social, Humanities

Capability:

Cognition Understanding, Reasoning Ability

Data Source:

SemEval2016

Number:

500

SoTA Specialist:

Roberta-Large

Metrics:

Micro-F1

Personality Analysis

This task involves inferring a subject’s personality traits based on their written text, such as essays or social media posts.

Abbreviation:

Person Analy

Domain:

Business

Capability:

Reasoning Ability, Commonsense Knowledge, Cognition Understanding

Data Source:

Essay2007

Number:

500

SoTA Specialist:

Roberta-Large

Metrics:

Micro-F1

L-17 Affective Computing (Affect Computing)

This cluster focuses on analyzing emotions, opinions, and sentiments expressed in language, especially in business and social domains.

Humor Detection

This task determines whether a sentence contains humor.

Abbreviation:

Humor Detect

Domain:

Culture

Capability:

Affective Analysis

Data Source:

Reddit

Number:

500

SoTA Specialist:

Roberta-Large

Metrics:

Micro-F1

Sarcasm Detection

This task identifies sarcastic statements in text.

Abbreviation:

Sarcasm Det

Domain:

Culture

Capability:

Affective Analysis

Data Source:

Sarcasm

Number:

500

SoTA Specialist:

Roberta-Large

Metrics:

Micro-F1

Sentiment Classification

This task classifies text sentiment as positive, negative, or neutral.

Abbreviation:

Sentiment Cls

Domain:

General

Capability:

Affective Analysis

Data Source:

SST5

Number:

500

SoTA Specialist:

Roberta-Large

Metrics:

Micro-F1

Mental Health Toxicity Detection

This task detects toxic mental health content.

Abbreviation:

Mental-Toxic Det

Domain:

Humanities

Capability:

Affective Analysis

Data Source:

MentalHealth (men)

Number:

500

SoTA Specialist:

Roberta-Large

Metrics:

Micro-F1

Financial Sentiment Analysis

This task analyzes sentiment in financial texts.

Abbreviation:

Finance Cls

Domain:

Finance

Capability:

Affective Analysis

Data Source:

Financial Sentiment Analysis

Number:

500

SoTA Specialist:

Roberta-Large

Metrics:

Micro-F1

Metaphor Detection

This task detects metaphorical language use.

Abbreviation:

Metaphor Det

Domain:

Culture

Capability:

Affective Analysis

Data Source:

Metaphor

Number:

500

SoTA Specialist:

Roberta-Large

Metrics:

Micro-F1

Aspect Category Detection

This task identifies categories for aspect-based sentiment analysis.

Abbreviation:

Aspect Det

Domain:

Business

Capability:

Affective Analysis

Data Source:

res15

Number:

500

SoTA Specialist:

Roberta-Large

Metrics:

Micro-F1

Aspect Sentiment Classification

This task classifies sentiment of each aspect in text.

Abbreviation:

Aspect-Senti Cls

Domain:

Business

Capability:

Affective Analysis

Data Source:

res15

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

Micro-F1

Aspect Term Extraction

This task extracts aspect terms from sentences.

Abbreviation:

Aspect-Term Ext

Domain:

Business

Capability:

Affective Analysis

Data Source:

res15

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

Micro-F1

Target-oriented Opinion Words Extraction

This task extracts sentiment words associated with targets.

Abbreviation:

Opinion Ext

Domain:

Business

Capability:

Affective Analysis

Data Source:

TOWE

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

Micro-F1

Aspect-Opinion Pair Extraction

This task extracts aspect-opinion pairs from sentences.

Abbreviation:

AO-Pair Ext

Domain:

Business

Capability:

Affective Analysis

Data Source:

SDRN

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

Micro-F1

End-to-End ABSA

This task performs all ABSA steps (aspect, opinion, sentiment) in a unified manner.

Abbreviation:

ABSA

Domain:

Business

Capability:

Affective Analysis

Data Source:

ABSA

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

Micro-F1

Aspect Sentiment Triplet Extraction

This task extracts aspect-opinion-sentiment triplets.

Abbreviation:

Senti-Triplet Ext

Domain:

Business

Capability:

Affective Analysis

Data Source:

ASTE

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

Micro-F1

Aspect-Category-Sentiment Detection

This task detects sentiment for both aspect and category.

Abbreviation:

ACS Det

Domain:

Business

Capability:

Affective Analysis

Data Source:

res15

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

Micro-F1

Aspect Sentiment Quad Prediction

This task predicts aspect-category-opinion-sentiment quads.

Abbreviation:

Senti-Quad Pred

Domain:

Business

Capability:

Affective Analysis

Data Source:

res16

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

Micro-F1

Dialogue-Level Sentiment Quadruple Extraction

This task extracts sentiment quads at the dialogue level.

Abbreviation:

Dialog Sent Quad

Domain:

Business

Capability:

Affective Analysis

Data Source:

diasag

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

Micro-F1

Soccer Sentiment Classification

This task classifies sentiment in sports commentary or fan speech.

Abbreviation:

Soccer SA

Domain:

Sports

Capability:

Affective Analysis

Data Source:

FIFA-SA (soc)

Number:

500

SoTA Specialist:

Roberta-Large

Metrics:

Micro-F1

Comparative Opinion Quintuple Extraction

This task extracts quintuples involving comparisons between entities.

Abbreviation:

Opinion-Quin Ext

Domain:

Business

Capability:

Affective Analysis

Data Source:

Camrare

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

Micro-F1

L-18 Named Entity Recognition (NER)

This cluster focuses on extracting structured entities from domain-specific texts across various disciplines.

Scientific NER

This task detects scientific entities in research-related text.

Abbreviation:

Sci NER

Domain:

Physical Science

Capability:

Content Recognition

Data Source:

SciER

Number:

500

SoTA Specialist:

W2NER

Metrics:

Micro-F1

Temporal NER

This task extracts time expressions and temporal entities from text.

Abbreviation:

Temporal NER

Domain:

Engineering

Capability:

Content Recognition

Data Source:

TimeBank (tem)

Number:

500

SoTA Specialist:

W2NER

Metrics:

Micro-F1

Pathology NER

This task identifies disease- or diagnosis-related entities in biomedical texts.

Abbreviation:

Path NER

Domain:

Biology

Capability:

Content Recognition

Data Source:

Pathology NER (pat)

Number:

500

SoTA Specialist:

W2NER

Metrics:

Micro-F1

Cybersecurity NER

This task extracts cybersecurity-related named entities.

Abbreviation:

Cyber NER

Domain:

Physical Science

Capability:

Content Recognition

Data Source:

CyNER (cyb)

Number:

500

SoTA Specialist:

W2NER

Metrics:

Micro-F1

Geological NER

This task identifies geological terminology in text.

Abbreviation:

Geo NER

Domain:

Geography

Capability:

Content Recognition

Data Source:

GEO-NER (geo)

Number:

500

SoTA Specialist:

W2NER

Metrics:

Micro-F1

This task identifies legal entities in political or legislative documents.

Abbreviation:

Legal NER

Domain:

Politics

Capability:

Content Recognition

Data Source:

InLegalNER

Number:

500

SoTA Specialist:

W2NER

Metrics:

Micro-F1

Organization Recognition

This task identifies named organizations in text.

Abbreviation:

Organ NER

Domain:

Politics

Capability:

Content Recognition

Data Source:

CoNLL2003

Number:

500

SoTA Specialist:

W2NER

Metrics:

Micro-F1

Person Recognition

This task identifies person names in text.

Abbreviation:

Person NER

Domain:

General

Capability:

Content Recognition

Data Source:

CoNLL2003

Number:

500

SoTA Specialist:

W2NER

Metrics:

Micro-F1

Location Recognition

This task identifies location entities in text.

Abbreviation:

Location NER

Domain:

General

Capability:

Content Recognition

Data Source:

CoNLL2003

Number:

500

SoTA Specialist:

W2NER

Metrics:

Micro-F1

Climate Change NER

This task extracts entities related to climate change.

Abbreviation:

Climate NER

Domain:

Climate

Capability:

Content Recognition

Data Source:

Climate-Change-NER

Number:

500

SoTA Specialist:

W2NER

Metrics:

Micro-F1

Gene/Protein NER

This task extracts gene or protein entities from biomedical text.

Abbreviation:

Gene&Protein NER

Domain:

Biology

Capability:

Content Recognition

Data Source:

Genia

Number:

500

SoTA Specialist:

W2NER

Metrics:

Micro-F1

Chemical NER

This task detects chemical entities in scientific text.

Abbreviation:

Chem NER

Domain:

Chemistry

Capability:

Content Recognition

Data Source:

CHEMDNER (che, a)

Number:

500

SoTA Specialist:

W2NER

Metrics:

Micro-F1

Disease NER

This task extracts disease-related named entities.

Abbreviation:

Disease NER

Domain:

Biology

Capability:

Content Recognition

Data Source:

Disease-NER (dis)

Number:

500

SoTA Specialist:

W2NER

Metrics:

Micro-F1

L-20 Event Extraction (Event Ext)

This cluster focuses on identifying and reasoning about events in text.

Event Trigger Detection

This task detects event triggers in text.

Abbreviation:

Event-Trigger Det

Domain:

General

Capability:

Content Recognition

Data Source:

TAC-KBP

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

Micro-F1

Temporal Event Reasoning

This task involves reasoning over temporal relations between events.

Abbreviation:

Temp-Event Reason

Domain:

General

Capability:

Content Recognition, Temporal Determination

Data Source:

TimeBank (tem)

Number:

500

SoTA Specialist:

Roberta-Large

Metrics:

Micro-F1

L-21 Semantic Parsing (Sem Par)

This cluster covers structured semantic understanding of sentence meaning and predicate-argument structures.

Event Coreference Resolution

This task clusters textual mentions referring to the same event.

Abbreviation:

Event-Coref Res

Domain:

Linguistics

Capability:

Content Recognition

Data Source:

TAC-KBP 2015 (eve)

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

Acc

Semantic Role Labeling

This task labels predicate-argument structures.

Abbreviation:

SRL

Domain:

Linguistics

Capability:

Content Recognition

Data Source:

OntoNotes (srl)

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

Acc

Abstract Meaning Representation

This task parses sentences into graph-based semantic representations.

Abbreviation:

AMR

Domain:

Linguistics

Capability:

Content Recognition

Data Source:

AMR 2.0 amrldc (amr)

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

Acc

L-22 Linguistic Parsing (Ling Par)

This cluster focuses on syntactic structure parsing of natural language sentences.

Dependency Parsing

This task parses dependency trees for input sentences.

Abbreviation:

Dep Parse

Domain:

Linguistics

Capability:

Content Recognition

Data Source:

Universal Dependency (srl)

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

Acc

Constituency Parsing

This task parses syntactic constituency trees.

Abbreviation:

Const Parse

Domain:

Linguistics

Capability:

Content Recognition

Data Source:

OntoNotes (dpl)

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

Acc

Part-of-Speech

This task tags each token with its part-of-speech label.

Abbreviation:

POS

Domain:

Linguistics

Capability:

Content Recognition

Data Source:

Wall Street Journal (pos)

Number:

500

SoTA Specialist:

Flan-T5-Large

Metrics:

Micro-F1