Systematic literature reviews aim to comprehensively summarize evidence from all available studies relevant to a question. In the context of medicine, such reviews constitute the highest quality evidence used to inform clinical care. However, reviews are expensive to produce manually; (semi-)automation via NLP may facilitate faster evidence synthesis without sacrificing rigor. We introduce the task of multi-document summarization for generating review summaries. This task uses two datasets of review summaries derived from the scientific literature [1][2]. Participating teams are evaluated using automated and human evaluation metrics. We also encourage contributions which extend this task and dataset, e.g., by proposing scaffolding tasks, methods for model interpretability, and improved automated evaluation methods in this domain.
Details about data access, task evaluation, and more are available here.
Please join our mailing list to receive updates or email lucyw@allenai.org to be added to our Slack workspace.
Lucy Lu Wang, Allen Institute for AI
Jay DeYoung, Northeastern University
Byron Wallace, Northeastern University
Please email mslr-organizers@googlegroups.com to contact the organizers.
There are increasing reports that research papers can be written by computers, which presents a series of concerns (e.g., see [1]). In this challenge we explore the state of the art in detecting automatically generated papers. We frame the detection problem as a binary classification task: given an excerpt of text, label it as either human-written or machine-generated. To this end we will provide a corpus of over 4,000 automatically written papers, based on the work by Cabanac et al. [2], as well as documents collected by our publishing and editorial teams. As a control, we will provide a corpus of openly accessibly human-written papers from the same scientific domains of documents.
We also encourage contributions that aim to extend this dataset with other computer-generated scientific papers, or paper that propose valid metrics to assess automatically generated papers against those written by humans.
Anita de Waard, Elsevier
Yury Kashnitsky, Elsevier
Guillaume Cabanac, University of Toulouse
Cyrill Labbé, Université Grenoble
Alexander Magazinov, Yandex
Most of the existing work on scientific document summarization focuses on generating short, abstract-like summaries. LongSumm task is focused on the study of generating high quality long summaries for scientific litrature. This is the 3rd iteration of LongSumm [1]. In SDP 2021, LongSumm has received 50 submissions from 8 different teams. Evaluation results are reported on a public leaderboard.
Guy Feigenblat, Piiano Privacy Solutions
Michal Shmueli-Scheuer, IBM Research
For this shared task, we focus on concepts specific to social science literature, namely survey variables. We build on the original work of [1], [2] and propose an evaluation exercise on the task of "Variable Detection and Linking". Survey variable mention identification in texts can be seen as a multi-label classification problem: Given a sentence in a document (in our case: a scientific publication in the social sciences), and a list of unique variables (from a reference vocabulary of survey variables), the task is to classify which variables, if any, are mentioned in each sentence.
We split the task into two sub-tasks: a) variable detection and b) variable disambiguation. Variable detection deals with identifying whether a sentence contains a variable mention or not, whereas variable disambiguation focuses on identifying which variable from the vocabulary is specifically mentioned in a certain sentence.
This task is organized by the VAriable Detection, Interlinking and Summarization (VADIS) project.
Link to the SV-Ident 2022 page (more info to come): https://vadis-project.github.io/sv-ident-sdp2022/
Simone Paolo Ponzetto, University of Mannheim
Andrea Zielinski, Fraunhofer ISI
Tornike Tsereteli, University of Stuttgart
Yavuz Selim Kartal, GESIS
Philipp Mayr, GESIS
With the demise of the widely used Microsoft Academic Graph (MAG) [1], [2] at the end of 2021, the scholarly document processing community is facing a pressing need to replace MAG by an open source community supported service. A number of challenging data processing tasks are essential for a scalable creation of a comprehensive scholarly graph, i.e. a graph of entities involving but not limited to research papers, their authors, research organisations and research themes. This shared task will evaluate three key sub-tasks involved in the generation of a scholarly graph:
Test and evaluation data will be supplied by the CORE aggregator [3].
Pre-register your team here and we'll keep you posted with competition updates and timelines.
Petr Knoth, Open University
David Pride, Open University
Ronin Wu, Iris.ai
Generating summaries of scientific documents is known to be a challenging task. Majority of existing work in summarization assumes only one single best gold summary for each given document. Having only one gold summary negatively impacts our ability to evaluate the quality of summarization systems as writing summaries is a subjective activity. At the same time, annotating multiple gold summaries for scientific documents can be extremely expensive as it requires domain experts to read and understand long scientific documents. This shared task will enable exploring methods for generating multi-perspective summaries. We introduce a novel summarization corpus, leveraging data from scientific peer reviews to capture diverse perspectives from the reader's point of view.
Guy Feigenblat, Piiano, Israel
Michal Shmueli-Scheuer, IBM Research AI, Haifa Research Lab, Israel
Arman Cohan, Allen Institute for AI, Seattle, USA
Tirthankar Ghosal, Charles University, Czech Republic