Most of the work on scientific document summarization focuses on generating relatively short summaries (abstract like). While such a length constraint can be sufficient for summarizing news articles, it is far from sufficient for summarizing scientific work. In fact, such a short summary resembles more to an abstract than to a summary that aims to cover all the salient information conveyed in a given text. Writing such summaries requires expertise and a deep understanding in a scientific domain, as can be found in some researchers blogs.
The LongSumm task opted to leverage blogs created by researchers in the NLP and Machine learning communities and use these summaries as reference summaries to compare the submissions against.
The corpus for this task includes a training set that consists of 1,705 extractive summaries, and around 700 abstractive summaries of NLP and Machine Learning scientific papers. These are drawn from papers based on video talks from associated conferences (Lev et al. 2019 TalkSumm) and from blogs created by NLP and ML researchers. In addition, we create a test set of abstractive summaries. Each submission is judged against one reference summary (gold summary) on ROUGE and should not exceed 600 words.
This is the second year LongSumm is being hosted — the results from LongSumm @ SDP 2020 are reported on a public leaderboard. In 2021, the task will continue to expand, by incorporating additional summaries.
The task is defined as follows:
The Long Summary Task will be scored by using several ROUGE metrics to compare the system output and the gold standard Lay Summary. The intrinsic evaluation will be done by ROUGE, using ROUGE-1, -2, -L and Skipgram metrics. In addition, a randomly selected subset of the summaries will undergo human evaluation.
The training data is composed of abstractive and extractive summaries. To download both datasets, and for further details, see the LongSumm GitHub repository.
Guy Feigenblat, IBM Research AI
Michal Shmueli-Scheuer, IBM Research AI
Please contact shmueli@il.ibm.com and guyf@il.ibm.com with questions about this shared task.
Due to the rapid growth in scientific literature, it is difficult for scientists to stay up-to-date on the latest findings. This challenge is especially acute during pandemics due to the risk of making decisions based on outdated or incomplete information. There is a need for AI systems that can help scientists with information overload and support scientific fact checking and evidence synthesis.
In the SCIVER shared task, we will build systems of the form:
To register, please send an email to the organizers at sciver-info@allenai.org with:
We will use the SciFact dataset of 1,409 expert-annotated biomedical claims verified against 5,183 abstracts from peer-reviewed publications. Download the full dataset here. You can also find baseline models and starter code on the GitHub repo. Find out more details from the EMNLP 2020 paper.
For each claim, we provide:
An example of a claim paired with evidence from a single abstract is shown below.
{
"id": 52,
"claim": "ALDH1 expression is associated with poorer prognosis for breast cancer primary tumors.",
"evidence": {
"11": [ // 2 evidence sets in document 11 support the claim.
{"sentences": [0, 1], // Sentences 0 and 1, taken together, support the claim.
"label": "SUPPORT"},
{"sentences": [11], // Sentence 11, on its own, supports the claim.
"label": "SUPPORT"}
],
"15": [ // A single evidence set in document 15 supports the claim.
{"sentences": [4],
"label": "SUPPORT"}
]
},
"cited_doc_ids": [11, 15]
}
We will use the SciFact public leaderboard as the official submission portal for the SciVER task. Please read the online instructions for how to make submissions.
The final evaluation will use test claims with hidden relevant abstracts, labels, and evidence from the same released corpus. For each claim, the system is expected to predict which abstracts contain relevant evidence. Each predicted abstract must be annotated with the following two pieces of information:
An example prediction is shown below:
{
"id": 52,
"evidence": {
"11": {
"sentences": [1, 11, 13], // Predicted rationale sentences.
"label": "SUPPORT" // Predicted label.
},
"16": {
"sentences": [18, 20],
"label": "REFUTES"
}
}
}
Two evaluation metrics will be used. For a full description, see Section 4 of the SciFact paper.
Here's a simple step-by-step example showing how these metrics are calculated.
Dave Wadden, University of Washington
Kyle Lo, Allen Institute for Artificial Intelligence (AI2)
Iz Beltagy, Allen Institute for Artificial Intelligence (AI2)
Anita de Waard, Elsevier, USA
Tirthankar Ghosal, Indian Institute of Technology Patna, India
Recent years have witnessed a massive increase in the amount of scientific literature and research data being published online, providing revelation about the advancements in the field of different domains. The introduction of aggregator services like CORE [1] has enabled unprecedented levels of open access to scholarly publications. The availability of full text of the research documents facilitates the possibility of extending the bibliometric studies by identifying the context of the citations [2]. The shared task organized as part of the SDP 2021 focuses on classifying citation context in research publications based on their influence and purpose.
Subtask A: A task for identifying the purpose of a citation. Multiclass classification of citations into one of six classes: Background, Uses, Compare_Contrast, Motivation, Extension, and Future.
Subtask B: A task for identifying the importance of a citation. Binary classification of citations into one of two classes: Incidental, and Influential.
The participants will be provided with a labeled dataset of 3000 instances annotated using the ACT platform [3].
The dataset is provided in csv format and contains the following fields:
Each citation context in the dataset contains an "#AUTHOR_TAG" label, which represents the citation that is being considered. All other fields in the dataset correspond to the values associated with the #AUTHOR_TAG. The possible values of the citation_class_label are:
and that of citation_influence_label are:
The following table shows a sample entry from the training dataset.
unique_id | 1998 |
core_id | 81605842 |
citing_title | Everolimus improves behavioral deficits in a patient with autism associated with tuberous sclerosis: a case report |
citing_author | Ryouhei Ishii |
cited_title | Learning disability and epilepsy in an epidemiological sample of individuals with tuberous sclerosis complex |
cited_author | Joinson |
citation_context | West syndrome (infantile spasms) is the common estc epileptic disorder, which is associated with more intellectual disability and a less favorable neurological outcome (#AUTHOR_TAG et al, 2003) |
citation_class_label | 4 |
citation_influence_label | 1 |
A sample training dataset can be downloaded by filling the shared task registration form. The full training dataset will be released shortly via the Kaggle platform.
The ACL-ARC dataset [4], which is compatible with our ACT dataset can be used by the participants during the competition.
The evaluation will be conducted using the withheld test data containing 1000 instances. The evaluation metric used will be the F1-macro score.
$$\mbox{F1-macro} = {\frac{1}{n} \sum_{i=1}^{n}{\frac{2 \times P_i \times R_i}{P_i + R_i}}}$$The shared task is hosted on the Kaggle platform. Please note that both subtasks will be hosted as separate competitions on Kaggle. Please make sure you sign in/register on Kaggle before opening the following links.
To participate in the 3C Shared Task:
Each team can participate in any of the tasks or all of them. The submission files need to be in CSV format with the following fields:
Upload your solutions using kaggle.
For submitting your paper and code to the 3C Shared task, please register here and use the 3C shared task submission link. While uploading the paper and code, please use the following naming convention:
[kaggle-team-name]_SDP2021_task_[A/B]
where A/B represents the subtask for which you are submitting. If you use the same approach for both subtasks, there is no need to write separate papers.
Petr Knoth, Open University, UK
Suchetha N. Kunnath, Open University, UK
David Pride, Open University, UK
Kuansan Wang, Microsoft Research
Dasha Herrmannova, Oak Ridge National Laboratory
If you have any questions about this shared task, please contact david.pride@open.ac.uk and suchetha.nambanoor-kunnath@open.ac.uk.