Shared Tasks

Quick links

SciHal2025: Hallucination Detection for Scientific Content

Generative AI-enhanced academic research assistants are transforming how research is conducted. By allowing users to pose research-related questions in natural language, these systems can generate structured and concise summaries supported by relevant references. However, hallucinations — unsupported claims introduced by large language models — remain a significant obstacle to fully trusting these automatically generated scientific answers.

This shared task invites participants to develop and evaluate systems that detect hallucinations in automatically generated scientific answers. The dataset comprises research-oriented questions sourced from subject matter experts, along with corresponding answers and references. These answers are produced by various well-performing retrieval-augmented generation (RAG) systems indexing approximately millions of published academic abstracts. Each answer is annotated to indicate whether it includes unsupported claims that are not grounded in the provided references. Two levels of labeling will be provided: a three-class scheme (entailment, neutral, contradiction) and a more detailed scheme encompassing 10+ fine-grained categories (to be specified later). Teams are challenged to classify claims into the appropriate categories, with evaluation metrics focusing on the precision and recall of detecting unsupported claims.

More information is available on the shared task page.

Organizers

Dan Li (Elsevier)

Bogdan Palfi (Elsevier)

Colin Kehan Zhang (Elsevier)

Context25: Contextualizing Scientific Figures and Tables

You are invited to participate in the shared task “Context25: Evidence and Grounding Context Identification for Scientific Claims” collocated with the 5th Workshop on Scholarly Document Processing (SDP 2025) to be held at ACL 2025. Participants of the competition are also invited to submit papers describing their findings.

Interpreting scientific claims in the context of empirical findings is a valuable practice, yet extremely time-consuming for researchers. Such interpretation of scientific claims requires identifying key results (from figures or tables) that provide supporting evidence from research papers, and contextualizing these results with associated methodological details (e.g., measures, sample, etc.). In this shared task, we are interested in automating identification of key results (or evidence) as well as additional grounding context to make claim interpretation more efficient.

Organizers

Joel Chan (University of Maryland)

Matthew Akamatsu (University of Washington)

Aakanksha Naik (Allen Institute for AI)

SciVQA: Scientific Visual Question Answering

Scholarly articles convey valuable information not only through unstructured text but also via (semi-)structured figures such as charts and diagrams. Automatically interpreting the semantics of knowledge encoded in these figures can be beneficial for downstream tasks such as question answering (QA). In the SciVQA challenge, participants will develop multimodal QA systems using a dataset of scientific figures from ACL Anthology and arXiv papers. Each figure is annotated with seven QA pairs and includes metadata such as caption, ID, type (e.g., compound, line graph, bar chart, scatter plot), publication title, DOI and URL, QA pair type. This shared task specifically focuses on closed-ended visual (i.e., addressing visual attributes such as colour, shape, size, height, etc.) and non-visual (not addressing figure visual attributes) questions. More information is available on the shared task page.

Organizers

Ekaterina Borisova (DFKI)

Georg Rehm (DFKI)

ClimateCheck: Scientific Fact-checking of Social Media Posts on Climate Change

Social media facilitates discussions on critical issues such as climate change, but it also contributes to the rapid dissemination of misinformation, which complicates efforts to maintain an informed public and create evidence-based policies. In this shared task, we emphasise the need to link public discourse to peer-reviewed scholarly articles by gathering English claims from social media about climate change as well as a corpus of about 400K abstracts of publications from the climate sciences domains. Participants will be asked to retrieve relevant abstracts for each claim (subtask I) and classify the relation between the claim and abstract as ‘supports’, ‘refutes’, or ‘not enough information’ (subtask II). More information is available on the shared task page.

Organizers

Raia Abu Ahmad (DFKI)

Aida Usmanova (Leuphana University Lüneburg)

Georg Rehm (DFKI)

Software Mention Detection in Scholarly Publications (SOMD 25)

Software plays an essential role in scientific research and is considered one of the crucial entity types in scholarly documents. However, software usually is not cited formally in academic documents, resulting in various informal mentions of software. Automatic identification and disambiguation of software mentions, related attributes and the purpose of a software mentions contributes to the better understanding, accessibility, and reproducibility of research but is a challenging task. We are extending our first iteration of the shared task SOMD 2024 with new challenges. This competition invites participants to develop a system that detects software mentions and their attributes as named entities from scholarly texts and classifies the relationships between these entity pairs. For details, visit SOMD2025 shared task page

Organizers

Sharmila Upadhyaya (GESIS Leibniz Institut für Sozialwissenschaften, Germany)

Wolfgang Otto (GESIS Leibniz Institut für Sozialwissenschaften, Germany)

Frank Krueger (Wismar University of Applied Sciences, Germany)

Stefan Dietze (GESIS Leibniz Institut für Sozialwissenschaften, Cologne & Heinrich-Heine-University Düsseldorf, Germany)