Software plays an essential role in scientific research and is considered one of the crucial entity types in scholarly documents. However, the software is usually not cited formally in academic documents, resulting in various informal software mentions. Automatic identification and disambiguation of software mentions, related attributes, and the purpose of software mentions contributes to the better understanding, accessibility, and reproducibility of research but is a challenging task (Schindler et al., 2021).
This competition invites participants to develop a system that detects software mentions and their attributes as named entities from scholarly texts and classifies the relationships between these entity pairs. The dataset includes sentences from full-text scholarly documents annotated with Named Entities and Relations. It contains various software types, such as Operating Systems or Applications, and attributes like URLs and version numbers. This task emphasizes the joint learning of Named Entity Recognition (NER) and Relation Extraction (RE) (Hennen et al., 2024 ; Cabot & Navigli, 2021 ; Wadden et al., 2019; Ye et al., 2022) to improve computational efficiency and model accuracy, moving away from traditional pipeline approaches (Zeng et al., 2014; Zhang et al., 2017) . Effective integration of NER and RE, as supported by relevant studies, significantly boosts performance (Li & Ji, 2014).
Platform: Participants will submit their entries on the Codabench platform. Please follow this Link to Participate. The competition will proceed in two phases:
We will upload the dataset shortly after the competition begins.
We evaluate submissions using the F1 score, a metric that reflects the accuracy and precision of the Named Entity Recognition (NER) and Relation Extraction (RE). We will calculate macro-average F1 score using exact match (Nakayama, 2018) criteria for each of the two test phases.
Sharmila Upadhyaya (GESIS Leibniz Institut für Sozialwissenschaften, Germany)
Frank Krueger (Wismar University of Applied Sciences, Germany)
Stefan Dietze (GESIS Leibniz Institut für Sozialwissenschaften, Cologne & Heinrich-Heine-University Düsseldorf, Germany)
For inquiries: somd25@googlegroups.com. Join our Google Group for updates and discussions related to the competition.
This work has received funding through the DFG project NFDI4DS (no. 460234259)
We wish to thank NFDI4DS for both funding and support. A special thanks goes to all institutions and actors engaging for the association and its goals.
For more information about NFDI4DS, visit the website