A big problem with the ubiquity of Generative AI is that it has now become very easy to generate fake scientific papers. This can erode public trust in science and attack the foundations of science: are we standing on the shoulders of robots? The Detecting Automatically Generated Papers (DAGPAP) competition aims to encourage the development of robust, reliable AI-generated scientific text detection systems, utilizing a diverse dataset and varied machine learning models in a number of scientific domains.
Dan Li, Elsevier
Anita de Waard, Elsevier
Generative AI-enhanced academic research assistants are transforming how research is conducted. By allowing users to pose research-related questions in natural language, these systems can generate structured and concise summaries supported by relevant references. However, hallucinations — unsupported claims introduced by large language models — remain a significant obstacle to fully trusting these automatically generated scientific answers.
This shared task invites participants to develop and evaluate systems that detect hallucinations in automatically generated scientific answers. The dataset comprises research-oriented questions sourced from subject matter experts, along with corresponding answers and references. These answers are produced by various well-performing retrieval-augmented generation (RAG) systems indexing approximately millions of published academic abstracts. Each answer is annotated to indicate whether it includes unsupported claims that are not grounded in the provided references. Two levels of labeling will be provided: a three-class scheme (entailment, neutral, contradiction) and a more detailed scheme encompassing 10+ fine-grained categories (to be specified later). Teams are challenged to classify claims into the appropriate categories, with evaluation metrics focusing on the precision and recall of detecting unsupported claims.
Dan Li, Colin Kehan Zhang, Bogdan Palfi, Adrian Raudaschl, Anita de Waard (Elsevier).
You are invited to participate in the shared task “Context25: Evidence and Grounding Context Identification for Scientific Claims” collocated with the 5th Workshop on Scholarly Document Processing (SDP 2025) to be held at ACL 2025. Participants of the competition are also invited to submit papers describing their findings.
Interpreting scientific claims in the context of empirical findings is a valuable practice, yet extremely time-consuming for researchers. Such interpretation of scientific claims requires identifying key results that provide supporting evidence from research papers, and contextualizing these results with associated methodological details (e.g., measures, sample, etc.). In this shared task, we are interested in automating identification of key results (or evidence) as well as additional grounding context to make claim interpretation more efficient.
Joel Chan (University of Maryland)
Matthew Akamatsu (University of Washington)
Aakanksha Naik (Allen Institute for AI)
Scholarly articles convey valuable information not only through unstructured text but also via (semi-)structured figures such as charts and diagrams. Automatically interpreting the semantics of knowledge encoded in these figures can be beneficial for downstream tasks such as question answering (QA).
In the SciVQA challenge, participants will develop multimodal QA systems using a dataset of scientific figures from ACL Anthology and arXiv papers. Each figure is annotated with seven QA pairs and includes metadata such as caption, ID, type (e.g., compound, line graph, bar chart, scatter plot), publication title, DOI and URL, QA pair type. This shared task specifically focuses on closed-ended visual (i.e., addressing visual attributes such as colour, shape, size, height, etc.) and non-visual (not addressing figure visual attributes) questions. Systems will be evaluated using Accuracy, BLEU, METEOR, and ROUGE scores. Automated evaluations of submitted systems will be done through the Codabench platform (link will be provided soon).
Social media facilitates discussions on critical issues such as climate change, but it also contributes to the rapid dissemination of misinformation, which complicates efforts to maintain an informed public and create evidence-based policies. In this shared task, we emphasise the need to link public discourse to peer-reviewed scholarly articles by gathering English claims from social media about climate change as well as a corpus of about 400K abstracts of publications from the climate sciences domains. Participants will be asked to retrieve relevant abstracts for each claim (subtask I) and classify the relation between the claim and abstract as ‘supports’, ‘refutes’, or ‘not enough information’ (subtask II). A link to the competition webpage will be provided soon.
Aida Usmanova (Leuphana University Lüneburg)
Software plays an essential role in scientific research and is considered one of the crucial entity types in scholarly documents. However, software usually is not cited formally in academic documents, resulting in various informal mentions of software. Automatic identification and disambiguation of software mentions, related attributes and the purpose of a software mentions contributes to the better understanding, accessibility, and reproducibility of research but is a challenging task. We are extending our first iteration of the shared task SOMD 2024 with new challenges.