Software Mention Detection (SOMD) 2025
Software plays an essential role in scientific research and is considered one of the crucial entity types in scholarly documents. However, the software is usually not cited formally in academic documents, resulting in various informal software mentions. Automatic identification and disambiguation of software mentions, related attributes, and the purpose of software mentions contributes to the better understanding, accessibility, and reproducibility of research but is a challenging task (Schindler et al., 2021).
This competition invites participants (Link for Participation) to develop a system that detects software mentions and their attributes as named entities from scholarly texts and classifies the relationships between these entity pairs. The dataset includes sentences from full-text scholarly documents annotated with Named Entities and Relations. It contains various software types, such as Operating Systems or Applications, and attributes like URLs and version numbers.
This task emphasizes the joint learning of Named Entity Recognition (NER) and Relation Extraction (RE) (Hennen et al., 2024 ; Cabot & Navigli, 2021 ; Wadden et al., 2019; Ye et al., 2022) to improve computational efficiency and model accuracy, moving away from traditional pipeline approaches (Zeng et al., 2014; Zhang et al., 2017) . Effective integration of NER and RE, as supported by relevant studies, significantly boosts performance (Li & Ji, 2014).
Competition Platform and Phases
Platform: Participants will submit their entries on the Codabench platform. Please follow this Link to Participate.
The competition will proceed in two phases:
- Phase I: Participants will develop their models using a training set that aligns with the first test set.
- Phase II: The second test set, scholarly documents sampled from computer science journals in pubmed central, will test the generalization of the developed systems to out-of-distribution datasets.
Dataset
Dataset is made available in the competition platform .
Evaluation
We evaluate submissions using the F1 score, a metric that reflects the accuracy and precision of the Named Entity Recognition (NER) and Relation Extraction (RE). We will calculate macro-average F1 score using exact match (Nakayama, 2018) criteria for each of the two test phases.
Competition Timeline Overview
- Competition Registration starts on February 24, 2025
- First phase: Dataset release, Train, and Test Data: February 27, 2025
- First phase ends (Submission closes on): March 18, 2025
- Second phase data release: March 18, 2025
- The competition ends (Phase II submission closed): April 4, 2025
- Paper submission deadlines: April 17, 2025
- Notification of Acceptance: May 1, 2025
- Camera-ready Paper Deadline for Workshop: May 16, 2025.
- Workshop Date: July 21-August 1, 2025
Paper Submission Guidelines
-
Paper Submission Portal:
Submit your paper via the following link:
Submission Portal
-
Formatting Guidelines:
Your paper must be formatted according to the official ACL submission guidelines. For further details, please refer to:
ACL Submission Details
-
ACL Template:
Please use the official ACL template available at:
ACL Template on GitHub
-
Additional References:
-
Paper Length Options:
You may submit either a long paper (8 pages) or a short paper (4 pages).
-
Reproducible Model Requirement:
In addition to your paper, you must submit a reproducible model that incorporates your system. The reproducible model should be provided via a GitHub repository. Ensure that the repository is public and contains all necessary files, documentation, and instructions to fully reproduce your results.
-
Final Notes:
Ensure that your submission is complete and adheres to all the guidelines mentioned above. Submissions that do not comply with the formatting or reproducibility requirements may be rejected.
Organizers
Sharmila Upadhyaya (GESIS Leibniz Institut für Sozialwissenschaften, Germany)
Wolfgang Otto (GESIS Leibniz Institut für Sozialwissenschaften, Germany)
Frank Krueger (Wismar University of Applied Sciences, Germany)
Stefan Dietze (GESIS Leibniz Institut für Sozialwissenschaften, Cologne & Heinrich-Heine-University Düsseldorf, Germany)
For inquiries: somd25@googlegroups.com. Join our Google Group for updates and discussions related to the competition.
Funding
This work has received funding through the DFG project NFDI4DS (no. 460234259)
We wish to thank NFDI4DS for both funding and support. A special thanks goes to all institutions and actors engaging for the association and its goals.
For more information about NFDI4DS, visit the website
References
-
Hennen, M., Babl, F., & Geierhos, M. (2024). ITER: Iterative Transformer-based Entity Recognition and Relation Extraction. 11209-11223.
DOI:10.18653/v1/2024.findings-emnlp.655
-
Huguet Cabot, P.-L., & Navigli, R. (2021). REBEL: Relation Extraction By End-to-end Language generation. 2370-2381.
DOI:10.18653/v1/2021.findings-emnlp.204
-
Li, Q., & Ji, H. (2014). Incremental Joint Extraction of Entity Mentions and Relations. 402-412.
DOI:10.3115/v1/P14-1038
-
Wadden, D., Wennberg, U., Luan, Y., & Hajishirzi, H. (2019). Entity, Relation, and Event Extraction with Contextualized Span Representations. 5783-5788.
DOI:10.18653/v1/D19-1585
-
Ye, D., Lin, Y., Li, P., & Sun, M. (2022). Packed Levitated Marker for Entity and Relation Extraction. 4904-4917.
DOI:10.18653/v1/2022.acl-long.337
-
Zeng, D., Liu, K., Lai, S., Zhou, G., & Zhao, J. (2014). Relation Classification via Convolutional Deep Neural Network.
ACL Anthology
-
Zhang, Y., Zhong, V., Chen, D., Angeli, G., & Manning, C. D. (2017). Position-aware Attention and Supervised Data Improve Slot Filling. 35-45.
DOI:10.18653/v1/D17-1004
-
Nakayama, H. (2018). seqeval: A Python framework for sequence labeling evaluation.
Software available from https://github.com/chakki-works/seqeval
-
Schindler, D., Bensmann, F., Dietze, S., & Krüger, F. (2021). SoMeSci- A 5 Star Open Data Gold Standard Knowledge Graph of Software Mentions in Scientific Articles. 4574–4583.
DOI:10.1145/3459637.3482017