Associate Professor, University of Copenhagen
Isabelle Augenstein is an associate professor at the University of Copenhagen, Department of Computer Science, where she heads the Copenhagen Natural Language Understanding research group as well as the Natural Language Processing section. She also co-heads the research team at CheckStep Ltd, a content moderation start-up. Her main research interests are fact checking, low-resource learning and explainability. Before starting a faculty position, she was a postdoctoral research associate at UCL, mainly investigating machine reading from scientific articles. She has a PhD in Computer Science from the University of Sheffield. She currently hold a prestigious DFF Sapere Aude Research Leader fellowship on 'Learning to Explain Attitudes on Social Media'. Isabelle Augenstein is the current president of the ACL Special Interest Group on Representation Learning (SIGREP), as well co-founder of Widening NLP (WiNLP).
Most work on scholarly document processing assumes that the information processed is trustworthy and factually correct. However, this is not always the case. There are two core challenges, which should be addressed: 1) ensuring that scientific publications are credible -- e.g. that claims are not made without supporting evidence, and that all relevant supporting evidence is provided; and 2) that scientific findings are not misrepresented, distorted or outright misreported when communicated by journalists or the general public. I will present some first steps towards addressing these problems and outline remaining challenges.
Professor, Bar Ilan University
Digitization and search has revolutionized information access. Yet, current search systems are all geared towards a specific kind of information need. The majority of systems are precision oriented, getting you the most relevant documents on a given topic. Some expert-oriented system are recall focused, and aim to find all documents on a given topic. Some systems provide snippets rather than full documents, and recent advances in QA allow to highlight the spans where the answer may be. In all these cases, the user needs to look at all the returned answers and process them themselves. This works very well if the answer you are looking for is written in a single, or a few, documents. But not all information needs are like that.
We present a different kind of search system, which is geared toward answering information needs whose answers are based on aggregation of pieces of information over a large corpus. The key component is allowing users to define query elements that act as variables, or "captures", which are then extracted from each matching result, and presented in aggregation. This allows us to formulate queries to answer questions such as "what are various ways of referring to leprosy", "what are common reported incubation period for covid-19", "what are the kinds of treatments considered in the literature for Alzheimer's disease", "what is being coated by fibronectin" and so on. We demonstrate SPIKE, a publicly available prototype of such an extractive search system, and discuss its current capabilities, and also limitations and future directions.
Assistant Professor, University of Washington
Hanna Hajishirzi is an Assistant Professor in the Paul G. Allen School of Computer Science & Engineering at the University of Washington and a Research Fellow at the Allen Institute for AI. Her research spans different areas in NLP and AI, focusing on developing machine learning algorithms that represent, comprehend, and reason about diverse forms of data at large scale. Applications for these algorithms include question answering, reading comprehension, representation learning, knowledge extraction, and conversational dialogue. Honors include the Sloan Fellowship, Intel Rising Star, Allen Distinguished Investigator Award, multiple best paper and honorable mention awards, and several industry research faculty awards. Hanna received her PhD from University of Illinois and spent a year as a postdoc at Disney Research and CMU.
Enormous amounts of ever-changing knowledge are available online in diverse emergent textual styles (e.g., news vs. science text). Recent advances in deep learning algorithms, large-scale datasets, and industry-scale computational resources are spurring progress in many Natural Language Processing (NLP) tasks. Nevertheless, current models lack the ability to understand emergent domains such as scientific articles related to Covid-19 when training data are scarce. This talk presents some of recent efforts in our lab to address the problem of textual comprehension and reasoning about scientific articles. First, I discuss our multi-task learning approach for identifying and classifying entities and their relations in scientific articles. I further show how we can extend this approach to extract mechanism relations from Covid-19 articles to construct a scientific knowledge graph, which supports advanced search for medical doctors. Second, I introduce scientific claim verification, a new task to select abstracts from the research literature containing evidence that supports or refutes a given scientific claim, and to identify rationales justifying each decision. I finally show that our claim verification system is able to identify plausible evidence for 70% claims relevant to COVID-19 on the CORD-19 corpus.