While a career in computer science was certain, it was during her Ph.D. work at Stanford that Chen found her niche. “I’m most interested in the intersection of human language, knowledge, and computers,” says Chen, whose work on natural language processing and machine learning has earned her awards and coveted fellowships, including one at Facebook. Her main focus is on developing algorithms for computers to better comprehend human languages.
Chen’s Work: A SamplingCOMPUTER KNOWLEDGE While computers historically have been able to interpret basic words and simple sentence syntax, “I’m creating a computer program that can read a document or passage and answer comprehension questions,” Chen says. Her aim is to expand the reading comprehension of digital devices such as Siri and Alexa so that they can more deeply “understand” the texts they’re working from. This would improve the accuracy of results in online search queries, expand these devices’ data bank to draw knowledge and enhance linguistic flow, and increase the number of possible commands and queries (e.g., “How many authorized COVID-19 vaccines are there?”).
WORDS INTO ACTION Although “a lot of human knowledge is stored in text,” Chen says, much of it is not explicitly stated in a way that computers can understand. Think of a recipe: People understand how the list of ingredients and instructions relate to the cooking steps with minimal training. Computers don’t. Chen is working on extracting this implicit knowledge “so we can use it in robotics or intelligent systems.” This might include translating procedural texts into physical actions — so a robot could follow any recipe and actually cook a meal, for example — or enabling a personal digital assistant to tell you what to do next when you’re cooking.
LEARNING LANGUAGE When you’re drafting an email or typing a text, you may notice that your smart device suggests words and phrases to use next. This is a result of research into statistical patterns in language, Chen explains. For a computer to learn language patterns, it needs a large amount of raw data. Computers “learn” from these raw data in two ways: “unsupervised,” in which they scan texts that are freely and widely available online; and “supervised,” in which humans feed them annotated texts. Chen is coding computers to need less supervised learning to pick up language patterns, and to apply existing annotated data to additional domains. This could reduce device-response times, improve the accuracy of results, and expand the repertoire of helpful suggestions.