Kathleen R. McKeown
Henry and Gertrude Rothschild Professor of Computer Science at Columbia University
Founding Director of the Data Science Institute

Industrial AI

2023 IEEE Technical Field Award Recipient

June 5, 11:15am
Location: Magnolia

What Can We Do About Large Language Models that Lie? Case Studies in Text Summarization

Text summarization is a sub-field within natural language processing that aims to automatically generate short, paragraph length summaries given an input document. Much of the work has been done on news, but there have also been efforts on many other genres, including journal articles, medical documents, email, dialog, legal documents and even creative texts such as novel chapters. The advent of large language models promises a new level of performance in summarization, enabling the generation of summaries that are far more fluent, coherent and relevant than was previously possible. However, they also introduce a major new problem: they wholly hallucinate facts out of thin air. They may incorrectly intermingle facts from the input, they may introduce facts that were not mentioned at all, and worse yet, they may even make up things that are not true in the real world. In this talk, I will discuss our work in characterizing the kinds of errors that can occur with different models and methods that we have developed to help mitigate hallucination in language modeling approaches to text summarization for a variety of genres.

Kathleen R. McKeown is the Henry and Gertrude Rothschild Professor of Computer Science at Columbia University and the Founding Director of the Data Science Institute, serving as Director from 2012 to 2017. She is also an Amazon Scholar. In earlier years, she served as Department Chair (1998-2003) and as Vice Dean for Research for the School of Engineering and Applied Science (2010-2012). A leading scholar and researcher in the field of natural language processing, McKeown focuses her research on the use of data for societal problems; her interests include text summarization, question answering, natural language generation, social media analysis and multilingual applications. She has received numerous honors and awards, including 2023 IEEE Innovation in Societal Infrastructure Award, American Philosophical Society Elected member, American Academy of Arts and Science elected member, American Association of Artificial Intelligence Fellow, a Founding Fellow of the Association for Computational Linguistics and an Association for Computing Machinery Fellow. Early on she received the National Science Foundation Presidential Young Investigator Award, and a National Science Foundation Faculty Award for Women. In 2010, she won both the Columbia Great Teacher Award—an honor bestowed by the students—and the Anita Borg Woman of Vision Award for Innovation.


 

 

June 5, 9:15am
Location: Magnolia

2023 IEEE INNOVATION IN SOCIETAL INFRASTRUCTURE AWARD

Sponsored by Hitachi Ltd. and the IEEE Computer Society 

For pushing the boundaries of natural language processing for social media analysis, news summarization, crisis informatics, and creating a digital library for patient care.

Kathleen McKeown has developed natural language processing techniques that promote social good. She has created methods that summarize and analyze news reports on past disasters to provide updates on current disasters as they unfold. Her work in social media uses data science to understand the personal narratives of those who have experienced a disaster and employs sentiment detection to identify where people are still suffering. Earlier in her career, she developed a system for the personalized search and summarization of medical literature to build a digital library for patient care that presents in lay terms information that is helpful to patients and their families.