SciHal: Hallucination Detection for Scientific Content

Generative AI-enhanced academic research assistants are transforming how research is conducted. By allowing users to pose research-related questions in natural language, these systems can generate structured and concise summaries supported by relevant references. However, hallucinations — unsupported claims introduced by large language models — remain a significant obstacle to fully trusting these automatically generated scientific answers.

SciHal stands for "Hallucination Detection for Scientific Content". SciHal will invite participants to detect hallucinated claims in the answers to scientific questions generated by GenAI-powered research assistants.

The dataset comprises research-oriented questions sourced from subject matter experts, along with corresponding answers and references. These answers are produced by real-world retrieval-augmented generation (RAG) systems which indexed approximately millions of published academic abstracts. Each answer is annotated to indicate whether it includes unsupported claims that are not grounded in the provided references. Two levels of labeling will be provided: a three-class scheme (entailment, neutral, contradiction) and a fine-grained scheme encompassing 10+ categories. Teams are challenged to classify claims into the appropriate categories, with evaluation metrics focusing on the precision, recall, and F1 of detecting unsupported claims.

Registration

Participants fill out the Registration Form. Once submitted, you will receive a confirmation email. After that, you can register for the competition on Kaggle via https://www.kaggle.com/competitions/hallucination-detection-scientific-content-2025. After you register for the competition, you'll find a train_sample.csv dataset, which contains a few samples. The training data will be uploaded on April 10, the test data will be uplodaed on May 1.

Competition Platform

The shared task is available on Kaggle. All updates and details on the competition will be published there.

Important Dates

  • Registration open: April 1, 2025
  • Release of training data: April 10, 2025
  • Release of testing data: May 1, 2025
  • Deadline for system submissions: May 30, 2025
  • Paper submission deadline: May 30, 2025
  • Notification of acceptance: June 9, 2025
  • Camera-ready paper due: June 16, 2025
  • Workshop: July 31, 2025

Organizers

Dan Li (Elsevier)

Bogdan Palfi (Elsevier)

Colin Kehan Zhang (Elsevier)



Contact: sdproc2025@googlegroups.com

Sign up for updates: https://groups.google.com/g/sdproc-updates

Follow us: https://twitter.com/SDPWorkshop

Back to top