Statistics of the Language Skills
This section presents detailed statistics for skills categorized under the Language
modality.
Language tasks assess a wide range of linguistic and cognitive capabilities, including comprehension, commonsense reasoning, intent understanding, and natural language inference.
Models are expected to process and reason over textual inputs to produce accurate and context-aware outputs.
L-1 Cognitive Question (Cog QA)
This cluster evaluates models on different types of reasoning and cognitive inference abilities based on textual input.
- Commonsense Reasoning
This task evaluates the ability to apply general world knowledge to infer plausible outcomes.
- Abbreviation:
Common Reason
- Domain:
General
- Capability:
Commonsense Knowledge
- Data Source:
commonsense_qa
- Number:
510
- SoTA Specialist:
Flan-T5-Large
- Metrics:
Acc
- Causal Reasoning
This task focuses on identifying cause-effect relationships between events.
- Abbreviation:
Causal Reason
- Domain:
General
- Capability:
Causality Discrimination
- Data Source:
corr2cause
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
Acc
- Document-Level Causal Reasoning
This task measures the ability to extract and reason over causal relations across a full document.
- Abbreviation:
Doc-Causal Reason
- Domain:
General
- Capability:
Reasoning Ability
- Data Source:
qa4mre
- Number:
564
- SoTA Specialist:
Flan-T5-Large
- Metrics:
BLEU-1
- Counterfactual Reasoning
This task evaluates a model’s ability to reason about alternate scenarios.
- Abbreviation:
Counterfact Reason
- Domain:
General
- Capability:
Causality Discrimination
- Data Source:
Counterfactual-Reasoning-Capacity-of-Large-Language-Models (crc)
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
F1
- Analogical Reasoning
This task tests the ability to identify abstract relational similarities between different pairs.
- Abbreviation:
Analog Reason
- Domain:
General
- Capability:
Reasoning Ability
- Data Source:
BATS
- Number:
456
- SoTA Specialist:
Flan-T5-Large
- Metrics:
Acc
- Multi-Hop Question Answering
This task involves reasoning over multiple supporting facts to answer a question.
- Abbreviation:
Multi-Hop QA
- Domain:
General
- Capability:
Reasoning Ability
- Data Source:
hotpotQA
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
BLEU-1
- Temporal Reasoning
This task evaluates the understanding and inference of time-based relationships between events.
- Abbreviation:
Temporal Reason
- Domain:
General
- Capability:
Reasoning Ability
- Data Source:
time_dial (Qin et al.,_
L-2 Ethical NLP (Ethics NLP)
This cluster assesses the ability of models to reason about or respond appropriately in ethically sensitive or culturally aware contexts.
- Ethical Reasoning
This task evaluates moral and ethical reasoning ability in text.
- Abbreviation:
Ethical Reason
- Domain:
Ethics
- Capability:
Ethical Awareness
- Data Source:
ethics
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
Acc
- Truthful Question Answering
This task focuses on generating factually and ethically accurate responses to questions.
- Abbreviation:
Truthful QA
- Domain:
Ethics
- Capability:
Ethical Awareness
- Data Source:
truthful_qa
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
BLEU-1
- Legal Question Answering
This task involves interpreting and answering law-related questions ethically and accurately.
- Abbreviation:
Legal QA
- Domain:
Law
- Capability:
Ethical Awareness
- Data Source:
JEC-QA
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
Acc
- Bias Detection
This task detects socially or culturally biased statements in text.
- Abbreviation:
Bias Detect
- Domain:
Culture
- Capability:
Affective Analysis
- Data Source:
WNC
- Number:
500
- SoTA Specialist:
Roberta-Large
- Metrics:
Micro-F1
- Offensive Classification
This task identifies offensive or harmful language based on social norms.
- Abbreviation:
Offensive Classify
- Domain:
Culture
- Capability:
Affective Analysis
- Data Source:
offensive-speech (ock)
- Number:
500
- SoTA Specialist:
Roberta-Large
- Metrics:
Micro-F1
- Hate Speech Detection
This task flags hate-driven or discriminatory content in language.
- Abbreviation:
Hate-Speech Detect
- Domain:
Culture
- Capability:
Affective Analysis
- Data Source:
hate-speech (ock)
- Number:
500
- SoTA Specialist:
Roberta-Large
- Metrics:
Micro-F1
- Spam Detection
This task detects and filters spam or unsolicited content.
- Abbreviation:
Spam Detect
- Domain:
Culture
- Capability:
Causality Discrimination
- Data Source:
spam-email (spa)
- Number:
500
- SoTA Specialist:
Roberta-Large
- Metrics:
Micro-F1
- Fake News Detection
This task detects false or misleading news content.
- Abbreviation:
Fake-News Detect
- Domain:
Humanities
- Capability:
Ethical Awareness
- Data Source:
fake-news (fak)
- Number:
500
- SoTA Specialist:
Roberta-Large
- Metrics:
Micro-F1
- Fact Verification
This task verifies the truthfulness of claims based on supporting evidence.
- Abbreviation:
Fact Verify
- Domain:
General
- Capability:
Causality Discrimination
- Data Source:
Counterfactual
- Number:
500
- SoTA Specialist:
Roberta-Large
- Metrics:
Micro-F1
L-3 Domain-Specific QA (Domain QA)
This cluster evaluates models on question answering across specialized knowledge domains such as biomedicine, healthcare, and engineering.
- Biomedical Question Answering
This task covers factual and reasoning questions in the biomedical field.
- Abbreviation:
Biomedical QA
- Domain:
Biomedical
- Capability:
Reasoning Ability
- Data Source:
BioASQ (bqb)
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
BLEU-1
- Medical Question Answering
This task targets open-domain medical question answering from clinical or patient-related queries.
- Abbreviation:
Medical QA
- Domain:
Medical
- Capability:
Reasoning Ability
- Data Source:
ChatDoctor-HealthCareMagic-Knowledge
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
BLEU-1
- Technical Question Answering
This task covers questions in technical or engineering domains such as programming and systems.
- Abbreviation:
Tech QA
- Domain:
Engineering
- Capability:
Reasoning Ability
- Data Source:
Stack Overflow Data (sod)
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
BLEU-1
- Engineering Question Answering
This task evaluates problem-solving and factual answering in engineering contexts.
- Abbreviation:
Eng QA
- Domain:
Engineering
- Capability:
Reasoning Ability
- Data Source:
University of Bath (eqa)
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
BLEU-1
- Science Question Answering
This task involves answering questions based on scientific knowledge and reasoning.
- Abbreviation:
Science QA
- Domain:
Science
- Capability:
Reasoning Ability
- Data Source:
sciQ
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
Acc
- Earth Question Answering
This task evaluates earth science-related question answering skills.
- Abbreviation:
Earth QA
- Domain:
Earth
- Capability:
Reasoning Ability
- Data Source:
Manual
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
Acc
- Nature Question Answering
This task assesses the ability to answer questions about nature and environment.
- Abbreviation:
Nature QA
- Domain:
Nature
- Capability:
Reasoning Ability
- Data Source:
Manual
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
Acc
L-5 Non-Traditional QA (Non-Trad QA)
This cluster includes question answering formats that are less conventional, often grounded in informal or creative content.
- Fairytale Question Answering
This task focuses on answering questions about narrative and fictional content.
- Abbreviation:
Fairytale QA
- Domain:
Culture
- Capability:
Reasoning Ability
- Data Source:
FairytaleQA
- Number:
501
- SoTA Specialist:
Flan-T5-Large
- Metrics:
BLEU-1
- Tweet Question Answering
This task addresses question answering over social media text like tweets.
- Abbreviation:
Tweet QA
- Domain:
Social
- Capability:
Reasoning Ability
- Data Source:
tweet_qa
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
F1
- Trivial QA
This task targets general knowledge questions in a quiz or trivia format.
- Abbreviation:
Trivial QA
- Domain:
Social
- Capability:
Reasoning Ability
- Data Source:
trivia_qa
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
F1
L-6 Advanced QA (Advance QA)
This cluster tests more sophisticated reasoning in QA, including multilingual and conversational abilities.
- Open-Domain Question Answering
This task involves retrieving and answering open-domain factual questions.
- Abbreviation:
Open-Domain QA
- Domain:
General
- Capability:
Reasoning Ability
- Data Source:
squad_v2
- Number:
513
- SoTA Specialist:
Flan-T5-Large
- Metrics:
BLEU-1
- Conversational Question Answering
This task answers context-dependent questions in a dialogue setting.
- Abbreviation:
Conversation QA
- Domain:
General
- Capability:
Reasoning Ability
- Data Source:
coqa
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
BLEU-1
- Table Question Answering
This task evaluates a model’s ability to interpret and reason over tabular data.
- Abbreviation:
Table QA
- Domain:
General
- Capability:
Reasoning Ability
- Data Source:
wikitablequestions
- Number:
590
- SoTA Specialist:
Flan-T5-Large
- Metrics:
F1
- Multilingual Question Answering
This task focuses on answering questions posed in multiple languages.
- Abbreviation:
Multi-Lang QA
- Domain:
Linguistics
- Capability:
Reasoning Ability
- Data Source:
multilingual_qa (mqa)
- Number:
500
- SoTA Specialist:
mT5-Large
- Metrics:
BLEU-1
- Code-Switch Question Answering
This task involves QA with mixed-language input (code-switching).
- Abbreviation:
Code-Switch QA
- Domain:
Linguistics
- Capability:
Reasoning Ability
- Data Source:
Manual
- Number:
500
- SoTA Specialist:
mT5-Large
- Metrics:
Acc
L-7 Math Problem Solving (Math Ability)
This cluster focuses on solving numerical and algebraic problems using mathematical reasoning.
- Math Question Answering
This task targets direct mathematical reasoning and fact recall.
- Abbreviation:
Math QA
- Domain:
Math
- Capability:
Reasoning Ability
- Data Source:
math-QA
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
Acc
- Mathematical Word Problem Solving
This task addresses multi-step word problems requiring logical mathematical reasoning.
- Abbreviation:
Math Word Prob
- Domain:
Math
- Capability:
Problem Solving
- Data Source:
math-QA
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
Acc
- Mathematical Proof Generation
This task involves generating step-by-step mathematical proofs.
- Abbreviation:
Math Proof Gen
- Domain:
Math
- Capability:
Problem Solving
- Data Source:
math-QA
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
BLEU-1
L-8 Code Problem Solving (Code Ability)
This cluster evaluates models on understanding and debugging code through explanations and defect detection.
- Code Explanation
This task tests the ability to explain the logic of code snippets.
- Abbreviation:
Code Explain
- Domain:
Code
- Capability:
Problem Solving
- Data Source:
CodeXGLUE
- Number:
508
- SoTA Specialist:
CodeT5
- Metrics:
BLEU-1
- Code Defect Detection
This task detects bugs or defects in code.
- Abbreviation:
Code Defect Detect
- Domain:
Code
- Capability:
Problem Solving
- Data Source:
CodeXGLUE
- Number:
501
- SoTA Specialist:
CodeT5
- Metrics:
Acc
- Code Generation
This task generates code from natural language prompts or specifications.
- Abbreviation:
Code Gen
- Domain:
Code
- Capability:
Problem Solving
- Data Source:
mbpp (mbp)
- Number:
500
- SoTA Specialist:
CodeT5
- Metrics:
CodeBLEU
- Code Repair
This task detects and corrects code defects in provided programs.
- Abbreviation:
Code Repair
- Domain:
Code
- Capability:
Problem Solving
- Data Source:
CodeXGLUE
- Number:
500
- SoTA Specialist:
CodeT5
- Metrics:
CodeBLEU
- Text-to-SQL Generation
This task generates SQL queries based on natural language input.
- Abbreviation:
Txt2SQL Gen
- Domain:
Code
- Capability:
Problem Solving
- Data Source:
Text-to-sql-v1 ()
- Number:
600
- SoTA Specialist:
CodeT5
- Metrics:
BLEU-1
L-9 Cross-lingual Translation (X-Lan&NMT)
This cluster focuses on multilingual understanding and translation across diverse languages.
- Multilingual Translation
This task translates across a wide range of language pairs.
- Abbreviation:
Multi-lang Trans
- Domain:
General
- Capability:
Multilingual Capability
- Data Source:
Manual
- Number:
504
- SoTA Specialist:
Flan-T5-Large
- Metrics:
ROUGE-1
- Low-Resource Translation
This task handles translation between languages with limited training data.
- Abbreviation:
Low-Res Trans
- Domain:
General
- Capability:
Multilingual Capability
- Data Source:
flores_101
- Number:
502
- SoTA Specialist:
Flan-T5-Large
- Metrics:
ROUGE-1
- English-Chinese Translation
This task translates between English and Chinese.
- Abbreviation:
En-Zh Trans
- Domain:
General
- Capability:
Multilingual Capability
- Data Source:
Manual
- Number:
501
- SoTA Specialist:
Flan-T5-Large
- Metrics:
ROUGE-1
- English-French Translation
This task translates between English and French.
- Abbreviation:
En-Fr Trans
- Domain:
General
- Capability:
Multilingual Capability
- Data Source:
Manual
- Number:
501
- SoTA Specialist:
Flan-T5-Large
- Metrics:
ROUGE-1
L-10 Text Summarization (Txt Sum)
This cluster tests summarization skills, from extractive to abstractive methods.
- Extractive Summarization
This task selects key sentences from the source document.
- Abbreviation:
Extract Summ
- Domain:
General
- Capability:
Text Manipulation
- Data Source:
cnn_dailymail
- Number:
503
- SoTA Specialist:
BART-Large
- Metrics:
ROUGE-1
- Abstractive Summarization
This task creates novel summaries that may rephrase source content.
- Abbreviation:
Abstract Summ
- Domain:
General
- Capability:
Text Manipulation
- Data Source:
xsum
- Number:
500
- SoTA Specialist:
BART-Large
- Metrics:
ROUGE-1
- Multi-Document Summarization
This task summarizes across multiple documents into one concise output.
- Abbreviation:
Multi-Doc Sum
- Domain:
General
- Capability:
Text Manipulation
- Data Source:
multi_news
- Number:
500
- SoTA Specialist:
BART-Large
- Metrics:
ROUGE-1
L-11 Dialogue Generation (Dialog Gen)
This cluster evaluates interactive response generation in a multi-turn conversational setting.
- Multi-Turn Daily Dialogue Generation
This task generates natural responses in everyday dialogue.
- Abbreviation:
Daily Dialogue Gen
- Domain:
General
- Capability:
Interactive Capability, Cognition Understanding
- Data Source:
daily_dialogue
- Number:
553
- SoTA Specialist:
DialoGPT
- Metrics:
BLEU-1
L-12 Text Generation (TxT Gen)
This cluster focuses on creative or corrective forms of free-form text generation.
- Table-to-Text Generation
This task generates coherent text from structured tabular data.
- Abbreviation:
Table-to-Text
- Domain:
General
- Capability:
Text Manipulation
- Data Source:
ToTTo
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
BLEU-1
- Text Style Transfer
This task transforms text from one style to another (e.g., formal to informal).
- Abbreviation:
Style Transfer
- Domain:
General
- Capability:
Creativity and Innovation
- Data Source:
style-transfer-paraphrase, text_style_transfer
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
BLEU-1
- Story Generation
This task involves open-ended narrative generation.
- Abbreviation:
Story Gen
- Domain:
General
- Capability:
Creativity and Innovation
- Data Source:
story-generation
- Number:
500
- SoTA Specialist:
mFlan-T5-Large
- Metrics:
BLEU-1
- Paraphrase Generation
This task creates alternate phrasings of a given sentence or paragraph.
- Abbreviation:
Paraphrase
- Domain:
General
- Capability:
Creativity and Innovation
- Data Source:
manual (sto)
- Number:
633
- SoTA Specialist:
Flan-T5-Large
- Metrics:
BLEU-1
- Grammar Correction
This task corrects grammatical errors in text while preserving the original meaning.
- Abbreviation:
Grammar Correct
- Domain:
General
- Capability:
Text Manipulation
- Data Source:
Grammar Correction (gra)
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
BLEU-1
L-13 Time Series Analysis (Time Series)
This cluster focuses on learning and predicting sequential patterns over time.
- Time Series Prediction
This task involves predicting future values from historical time series data.
- Abbreviation:
Time Series Pred
- Domain:
General
- Capability:
Temporal Determination, Reasoning Ability
- Data Source:
Manual
- Number:
510
- SoTA Specialist:
TimesFm-1.0-200m
- Metrics:
RMSE
L-14 Content Categorization (Txt Cls)
This cluster assesses the ability to assign topic or category labels to text.
- Topic Classification
This task assigns topic labels to input text.
- Abbreviation:
Topic Cls
- Domain:
General
- Capability:
Content Recognition
- Data Source:
Topic (top)
- Number:
500
- SoTA Specialist:
Roberta-Large
- Metrics:
Micro-F1
L-15 Text Entailment (Txt Entail)
This cluster evaluates textual inference, including recognizing entailment, contradiction, and neutrality.
- Natural Language Inference
This task determines whether a hypothesis is entailed by a premise.
- Abbreviation:
NLI
- Domain:
General
- Capability:
Reasoning Ability, Causality Discrimination
- Data Source:
SNLI
- Number:
500
- SoTA Specialist:
Roberta-Large
- Metrics:
Micro-F1
L-16 Semantic Analysis (Sem Analy)
This cluster focuses on extracting, comparing, and interpreting meaning from text.
- Sentence Similarity Detection
This task evaluates whether two sentences have similar meanings.
- Abbreviation:
Sent Similar
- Domain:
General
- Capability:
Reasoning Ability, Causality Discrimination
- Data Source:
QuoraQP (sen)
- Number:
500
- SoTA Specialist:
Roberta-Large
- Metrics:
Micro-F1
- Intent Detection
This task identifies the underlying intent of a sentence, often used in dialogue systems.
- Abbreviation:
Intent Det
- Domain:
General, Social, Culture
- Capability:
Cognition Understanding, Reasoning Ability
- Data Source:
Intent (int)
- Number:
500
- SoTA Specialist:
Roberta-Large
- Metrics:
Micro-F1
- Stance Detection
This task determines whether the author’s position is in favor of, against, or neutral toward a target.
- Abbreviation:
Stance Detect
- Domain:
General, Social, Humanities
- Capability:
Cognition Understanding, Reasoning Ability
- Data Source:
SemEval2016
- Number:
500
- SoTA Specialist:
Roberta-Large
- Metrics:
Micro-F1
- Personality Analysis
This task involves inferring a subject’s personality traits based on their written text, such as essays or social media posts.
- Abbreviation:
Person Analy
- Domain:
Business
- Capability:
Reasoning Ability, Commonsense Knowledge, Cognition Understanding
- Data Source:
Essay2007
- Number:
500
- SoTA Specialist:
Roberta-Large
- Metrics:
Micro-F1
L-17 Affective Computing (Affect Computing)
This cluster focuses on analyzing emotions, opinions, and sentiments expressed in language, especially in business and social domains.
- Humor Detection
This task determines whether a sentence contains humor.
- Abbreviation:
Humor Detect
- Domain:
Culture
- Capability:
Affective Analysis
- Data Source:
Reddit
- Number:
500
- SoTA Specialist:
Roberta-Large
- Metrics:
Micro-F1
- Sarcasm Detection
This task identifies sarcastic statements in text.
- Abbreviation:
Sarcasm Det
- Domain:
Culture
- Capability:
Affective Analysis
- Data Source:
Sarcasm
- Number:
500
- SoTA Specialist:
Roberta-Large
- Metrics:
Micro-F1
- Sentiment Classification
This task classifies text sentiment as positive, negative, or neutral.
- Abbreviation:
Sentiment Cls
- Domain:
General
- Capability:
Affective Analysis
- Data Source:
SST5
- Number:
500
- SoTA Specialist:
Roberta-Large
- Metrics:
Micro-F1
- Mental Health Toxicity Detection
This task detects toxic mental health content.
- Abbreviation:
Mental-Toxic Det
- Domain:
Humanities
- Capability:
Affective Analysis
- Data Source:
MentalHealth (men)
- Number:
500
- SoTA Specialist:
Roberta-Large
- Metrics:
Micro-F1
- Financial Sentiment Analysis
This task analyzes sentiment in financial texts.
- Abbreviation:
Finance Cls
- Domain:
Finance
- Capability:
Affective Analysis
- Data Source:
Financial Sentiment Analysis
- Number:
500
- SoTA Specialist:
Roberta-Large
- Metrics:
Micro-F1
- Metaphor Detection
This task detects metaphorical language use.
- Abbreviation:
Metaphor Det
- Domain:
Culture
- Capability:
Affective Analysis
- Data Source:
Metaphor
- Number:
500
- SoTA Specialist:
Roberta-Large
- Metrics:
Micro-F1
- Aspect Category Detection
This task identifies categories for aspect-based sentiment analysis.
- Abbreviation:
Aspect Det
- Domain:
Business
- Capability:
Affective Analysis
- Data Source:
res15
- Number:
500
- SoTA Specialist:
Roberta-Large
- Metrics:
Micro-F1
- Aspect Sentiment Classification
This task classifies sentiment of each aspect in text.
- Abbreviation:
Aspect-Senti Cls
- Domain:
Business
- Capability:
Affective Analysis
- Data Source:
res15
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
Micro-F1
- Aspect Term Extraction
This task extracts aspect terms from sentences.
- Abbreviation:
Aspect-Term Ext
- Domain:
Business
- Capability:
Affective Analysis
- Data Source:
res15
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
Micro-F1
- Target-oriented Opinion Words Extraction
This task extracts sentiment words associated with targets.
- Abbreviation:
Opinion Ext
- Domain:
Business
- Capability:
Affective Analysis
- Data Source:
TOWE
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
Micro-F1
- Aspect-Opinion Pair Extraction
This task extracts aspect-opinion pairs from sentences.
- Abbreviation:
AO-Pair Ext
- Domain:
Business
- Capability:
Affective Analysis
- Data Source:
SDRN
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
Micro-F1
- End-to-End ABSA
This task performs all ABSA steps (aspect, opinion, sentiment) in a unified manner.
- Abbreviation:
ABSA
- Domain:
Business
- Capability:
Affective Analysis
- Data Source:
ABSA
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
Micro-F1
- Aspect Sentiment Triplet Extraction
This task extracts aspect-opinion-sentiment triplets.
- Abbreviation:
Senti-Triplet Ext
- Domain:
Business
- Capability:
Affective Analysis
- Data Source:
ASTE
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
Micro-F1
- Aspect-Category-Sentiment Detection
This task detects sentiment for both aspect and category.
- Abbreviation:
ACS Det
- Domain:
Business
- Capability:
Affective Analysis
- Data Source:
res15
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
Micro-F1
- Aspect Sentiment Quad Prediction
This task predicts aspect-category-opinion-sentiment quads.
- Abbreviation:
Senti-Quad Pred
- Domain:
Business
- Capability:
Affective Analysis
- Data Source:
res16
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
Micro-F1
- Dialogue-Level Sentiment Quadruple Extraction
This task extracts sentiment quads at the dialogue level.
- Abbreviation:
Dialog Sent Quad
- Domain:
Business
- Capability:
Affective Analysis
- Data Source:
diasag
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
Micro-F1
- Soccer Sentiment Classification
This task classifies sentiment in sports commentary or fan speech.
- Abbreviation:
Soccer SA
- Domain:
Sports
- Capability:
Affective Analysis
- Data Source:
FIFA-SA (soc)
- Number:
500
- SoTA Specialist:
Roberta-Large
- Metrics:
Micro-F1
- Comparative Opinion Quintuple Extraction
This task extracts quintuples involving comparisons between entities.
- Abbreviation:
Opinion-Quin Ext
- Domain:
Business
- Capability:
Affective Analysis
- Data Source:
Camrare
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
Micro-F1
L-18 Named Entity Recognition (NER)
This cluster focuses on extracting structured entities from domain-specific texts across various disciplines.
- Scientific NER
This task detects scientific entities in research-related text.
- Abbreviation:
Sci NER
- Domain:
Physical Science
- Capability:
Content Recognition
- Data Source:
SciER
- Number:
500
- SoTA Specialist:
W2NER
- Metrics:
Micro-F1
- Temporal NER
This task extracts time expressions and temporal entities from text.
- Abbreviation:
Temporal NER
- Domain:
Engineering
- Capability:
Content Recognition
- Data Source:
TimeBank (tem)
- Number:
500
- SoTA Specialist:
W2NER
- Metrics:
Micro-F1
- Pathology NER
This task identifies disease- or diagnosis-related entities in biomedical texts.
- Abbreviation:
Path NER
- Domain:
Biology
- Capability:
Content Recognition
- Data Source:
Pathology NER (pat)
- Number:
500
- SoTA Specialist:
W2NER
- Metrics:
Micro-F1
- Cybersecurity NER
This task extracts cybersecurity-related named entities.
- Abbreviation:
Cyber NER
- Domain:
Physical Science
- Capability:
Content Recognition
- Data Source:
CyNER (cyb)
- Number:
500
- SoTA Specialist:
W2NER
- Metrics:
Micro-F1
- Geological NER
This task identifies geological terminology in text.
- Abbreviation:
Geo NER
- Domain:
Geography
- Capability:
Content Recognition
- Data Source:
GEO-NER (geo)
- Number:
500
- SoTA Specialist:
W2NER
- Metrics:
Micro-F1
- Legal NER
This task identifies legal entities in political or legislative documents.
- Abbreviation:
Legal NER
- Domain:
Politics
- Capability:
Content Recognition
- Data Source:
InLegalNER
- Number:
500
- SoTA Specialist:
W2NER
- Metrics:
Micro-F1
- Organization Recognition
This task identifies named organizations in text.
- Abbreviation:
Organ NER
- Domain:
Politics
- Capability:
Content Recognition
- Data Source:
CoNLL2003
- Number:
500
- SoTA Specialist:
W2NER
- Metrics:
Micro-F1
- Person Recognition
This task identifies person names in text.
- Abbreviation:
Person NER
- Domain:
General
- Capability:
Content Recognition
- Data Source:
CoNLL2003
- Number:
500
- SoTA Specialist:
W2NER
- Metrics:
Micro-F1
- Location Recognition
This task identifies location entities in text.
- Abbreviation:
Location NER
- Domain:
General
- Capability:
Content Recognition
- Data Source:
CoNLL2003
- Number:
500
- SoTA Specialist:
W2NER
- Metrics:
Micro-F1
- Climate Change NER
This task extracts entities related to climate change.
- Abbreviation:
Climate NER
- Domain:
Climate
- Capability:
Content Recognition
- Data Source:
Climate-Change-NER
- Number:
500
- SoTA Specialist:
W2NER
- Metrics:
Micro-F1
- Gene/Protein NER
This task extracts gene or protein entities from biomedical text.
- Abbreviation:
Gene&Protein NER
- Domain:
Biology
- Capability:
Content Recognition
- Data Source:
Genia
- Number:
500
- SoTA Specialist:
W2NER
- Metrics:
Micro-F1
- Chemical NER
This task detects chemical entities in scientific text.
- Abbreviation:
Chem NER
- Domain:
Chemistry
- Capability:
Content Recognition
- Data Source:
CHEMDNER (che, a)
- Number:
500
- SoTA Specialist:
W2NER
- Metrics:
Micro-F1
- Disease NER
This task extracts disease-related named entities.
- Abbreviation:
Disease NER
- Domain:
Biology
- Capability:
Content Recognition
- Data Source:
Disease-NER (dis)
- Number:
500
- SoTA Specialist:
W2NER
- Metrics:
Micro-F1
L-20 Event Extraction (Event Ext)
This cluster focuses on identifying and reasoning about events in text.
- Event Trigger Detection
This task detects event triggers in text.
- Abbreviation:
Event-Trigger Det
- Domain:
General
- Capability:
Content Recognition
- Data Source:
TAC-KBP
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
Micro-F1
- Temporal Event Reasoning
This task involves reasoning over temporal relations between events.
- Abbreviation:
Temp-Event Reason
- Domain:
General
- Capability:
Content Recognition, Temporal Determination
- Data Source:
TimeBank (tem)
- Number:
500
- SoTA Specialist:
Roberta-Large
- Metrics:
Micro-F1
L-21 Semantic Parsing (Sem Par)
This cluster covers structured semantic understanding of sentence meaning and predicate-argument structures.
- Event Coreference Resolution
This task clusters textual mentions referring to the same event.
- Abbreviation:
Event-Coref Res
- Domain:
Linguistics
- Capability:
Content Recognition
- Data Source:
TAC-KBP 2015 (eve)
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
Acc
- Semantic Role Labeling
This task labels predicate-argument structures.
- Abbreviation:
SRL
- Domain:
Linguistics
- Capability:
Content Recognition
- Data Source:
OntoNotes (srl)
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
Acc
- Abstract Meaning Representation
This task parses sentences into graph-based semantic representations.
- Abbreviation:
AMR
- Domain:
Linguistics
- Capability:
Content Recognition
- Data Source:
AMR 2.0 amrldc (amr)
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
Acc
L-22 Linguistic Parsing (Ling Par)
This cluster focuses on syntactic structure parsing of natural language sentences.
- Dependency Parsing
This task parses dependency trees for input sentences.
- Abbreviation:
Dep Parse
- Domain:
Linguistics
- Capability:
Content Recognition
- Data Source:
Universal Dependency (srl)
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
Acc
- Constituency Parsing
This task parses syntactic constituency trees.
- Abbreviation:
Const Parse
- Domain:
Linguistics
- Capability:
Content Recognition
- Data Source:
OntoNotes (dpl)
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
Acc
- Part-of-Speech
This task tags each token with its part-of-speech label.
- Abbreviation:
POS
- Domain:
Linguistics
- Capability:
Content Recognition
- Data Source:
Wall Street Journal (pos)
- Number:
500
- SoTA Specialist:
Flan-T5-Large
- Metrics:
Micro-F1