Bioinformatics, Statistical Proteomics and Genomics and Systems Biology



Natural Language Processing

Project:Natural Language Processing Technology for Guided Study of Bioinformatics

Agency:CSE Directorate for Computer and Information Science and Engineering

PI:Roth, D

Co-PIs:Rodriguez-Zas, S. L.; Pellegrino J., Zhai, C., Litman, D.

Award Number:0428472



Recent advances in Natural Language Processing, in particular the ability to use unstructured data to answer natural language questions, are very exciting from an educational perspective. They offer the promise of systems that can automatically respond to students' questions, thus supporting not only a guided but also an open ended, exploration based, approach to learning. Developing software that supports students' learning is all about constructing the right kind of environment for students, one that facilitates rather than inhibits inquiry through a known knowledge space and provides a jumping-off space for trying to find or generate new knowledge. The goal of this project is to apply research in Computer Science -- particularly Natural Language Processing -- and the Learning Sciences, to developing an intelligent tutor that can provide this needed environment. This tutor will inhabit a human-computer interactive environment in which the computer is able to detect and track the user's cognitive and academic state and act based on this knowledge to aid the student in identifying and accessing relevant knowledge, contribute relevant factual information the student may need and guide the student in selecting potentially relevant subtasks. The testbed domain in this project involves high school and undergraduate level students studying concepts in Bioinformatics -- building on the enormous amounts of biological data and software made freely available on the Web by the Bioinformatics community, and specifically, making use of the Biology Workbench system developed at NCSA. In the context of this project, researchers will (1) develop the necessary machine learning, natural language and inference methods that can robustly support a level of natural language understanding that is sufficient to ``understand'' students and their queries well enough to direct it to the right material, make relevant suggestions and develop a meaningful dialog in the context of the subject matter; (2) create a system that is able to accommodate different student backgrounds and goals and behave appropriately, and learn as it does so; and (3) study how students learn and how to support students' learning in a computer-aided context. This project will contribute to the understanding of how students learn in a computer aided environment and use it to develop improved methods for supporting learning in these environments. This has the potential for large educational impact for large classes, distance education, and self-paced instruction. The project's computational results in areas such as natural language based human machine interaction, adaptive dialog management, user-sensitive information retrieval and extraction, and machine learning, would be widely applicable to many other domains, including intelligent information access and interactive support systems for senior citizens and other groups.


Associated Publications: