Systematic literature reviews aim to comprehensively summarize evidence from all available studies relevant to a question. In the context of medicine, such reviews constitute the highest quality evidence used to inform clinical care. However, reviews are expensive to produce manually; (semi-)automation via NLP may facilitate faster evidence synthesis without sacrificing rigor. We introduce the task of multi-document summarization for generating review summaries. This task uses two datasets of review summaries derived from the scientific literature [1][2]. Participating teams are evaluated using automated and human evaluation metrics. We also encourage contributions which extend this task and dataset, e.g., by proposing scaffolding tasks, methods for model interpretability, and improved automated evaluation methods in this domain.
Details about data access, task evaluation, and more are available here.
Please join our mailing list to receive updates or email lucyw@allenai.org to be added to our Slack workspace.
Lucy Lu Wang, Allen Institute for AI
Jay DeYoung, Northeastern University
Byron Wallace, Northeastern University
Please email mslr-organizers@googlegroups.com to contact the organizers.
There are increasing reports that research papers can be written by computers, which presents a series of concerns (e.g., see [1]). In this challenge we explore the state of the art in detecting automatically generated papers. We frame the detection problem as a binary classification task: given an excerpt of text, label it as either human-written or machine-generated. To this end we are releasing a corpus of over 4000 excerpts from automatically written papers, based on the work by Cabanac et al. [2], as well as documents collected by our publishing and editorial teams. As a test set, the participants are provided with a 5x larger corpus of openly accessible human-written as well as generated papers from the same scientific domains of documents.
We also encourage contributions that aim to extend this dataset with other computer-generated scientific papers, or papers that propose valid metrics to assess automatically generated papers against those written by humans.
The competition is now live on the Kaggle platform.
The DAGPap Shared Task is scored using standard binary classification evaluation metrics. Each excerpt should be labeled as human-written or machine-generated. We then compute the F1 score over all test excerpts.
All participants are encouraged to submit publications describing their solutions to the SDP 2022 workshop. Please follow the SDP 2022 workshop submission instructions when submitting to the shared task.
For any questions about the competition please use the competition discussion forum. For any other questions (about paper submission, the workshop, etc.) please contact dagpap2022@googlegroups.com.
Yury Kashnitsky, Elsevier
Anita de Waard, Elsevier
Cyrill Labbé, Université Grenoble
Georgios Tsatsaronis, Elsevier
Catriona Fennell, Elsevier
Drahomira Herrmannova, Elsevier
Most of the existing work on scientific document summarization focuses on generating short, abstract-like summaries. LongSumm task is focused on the study of generating high quality long summaries for scientific litrature. This is the 3rd iteration of LongSumm [1]. In SDP 2021, LongSumm has received 50 submissions from 8 different teams. Evaluation results are reported on a public leaderboard.
Please join our task here to receive updates or email us with any questions longsumm.shared.task@gmail.com
Guy Feigenblat, Piiano Privacy Solutions
Michal Shmueli-Scheuer, IBM Research
For this shared task, we focus on concepts specific to social science literature, namely survey variables. We build on the original work of [1], [2] and propose an evaluation exercise on the task of "Variable Detection and Linking". Survey variable mention identification in texts can be seen as a multi-label classification problem: Given a sentence in a document (in our case: a scientific publication in the social sciences), and a list of unique variables (from a reference vocabulary of survey variables), the task is to classify which variables, if any, are mentioned in each sentence.
We split the task into two sub-tasks: a) variable detection and b) variable disambiguation. Variable detection deals with identifying whether a sentence contains a variable mention or not, whereas variable disambiguation focuses on identifying which variable from the vocabulary is specifically mentioned in a certain sentence.
This task is organized by the VAriable Detection, Interlinking and Summarization (VADIS) project.
Link to the SV-Ident 2022 page (more info to come): https://vadis-project.github.io/sv-ident-sdp2022/
Simone Paolo Ponzetto, University of Mannheim
Andrea Zielinski, Fraunhofer ISI
Tornike Tsereteli, University of Stuttgart
Yavuz Selim Kartal, GESIS
Philipp Mayr, GESIS
With the demise of the widely used Microsoft Academic Graph (MAG) [1], [2] at the end of 2021, the scholarly document processing community is facing a pressing need to replace MAG by an open source community supported service. A number of challenging data processing tasks are essential for a scalable creation of a comprehensive scholarly graph, i.e. a graph of entities involving but not limited to research papers, their authors, research organisations and research themes. This shared task will evaluate three key sub-tasks involved in the generation of a scholarly graph:
Test and evaluation data will be supplied by the CORE aggregator [3].
Pre-register your team here and we'll keep you posted with competition updates and timelines.
Petr Knoth, Open University
David Pride, Open University
Ronin Wu, Iris.ai
Generating summaries of scientific documents is known to be a challenging task. Majority of existing work in summarization assumes only one single best gold summary for each given document. Having only one gold summary negatively impacts our ability to evaluate the quality of summarization systems as writing summaries is a subjective activity. At the same time, annotating multiple gold summaries for scientific documents can be extremely expensive as it requires domain experts to read and understand long scientific documents. This shared task will enable exploring methods for generating multi-perspective summaries. We introduce a novel summarization corpus, leveraging data from scientific peer reviews to capture diverse perspectives from the reader's point of view.
Please join our task here.
Please refer to our MuP task page for more details.
Guy Feigenblat, Piiano, Israel
Michal Shmueli-Scheuer, IBM Research AI, Haifa Research Lab, Israel
Arman Cohan, Allen Institute for AI, Seattle, USA
Tirthankar Ghosal, Charles University, Czech Republic