Human language technology (HLT) describes computational, mathematical, and statistical tools for the analysis of language, which can be used in a wide range of application areas needing language assistance. Drawing on theory and research from computational linguistics, speech technology, and cognitive modeling of language production, HLT is used in domains such as machine translation (MT), information extraction (IE) from documents, and computational models of foreign language acquisition. Advances in four areas have fueled a recent surge of growth in HLT research. These are: 1) increased computational resources, 2) more available document pairs in native and foreign languages to serve as training data for MT, 3) better metrics for evaluating HLT performance on real-world tasks, and 4) improved techniques for building computational linguistic models. Technical developments in HLT have led to increased interest by federal agencies and industry in its fundamental theory and applications, as reflected in several interrelated multidisciplinary projects at Kansas State. These include current projects in the application areas of:
- Homeland security, biosecurity and defense: Assisting analysts by automatically annotating, summarizing, translating, and relating documents from multiple sources.
- Medical informatics and bioinformatics: Developing models for customized views of documents in medicine, genome biology, and environmental science.
- Language curation and discourse analysis: Cataloguing and analyzing components of extant languages, including endangered languages, based on dialogues and discourses.
- Cognitive linguistics and foreign languages: Modeling the cognitive and social processes of language users – human students learning foreign languages, translators, etc.
- Digital libraries: Categorizing or annotating documents by extracting document descriptions, keywords, and summary information.
These projects have been funded by several federal agencies such as NSF, ONR, and NIH, and by industrial sponsors such as American Diagnostic Medicine. Although several facets of HLT are being studied by independent groups at K-State, there is insufficient coordination and synergy to integrate scientific and technological advances by these groups. This creates a disadvantage in developing working systems, demonstrations, cross-cutting research programs, and curricula in areas where K-State researchers possess individual and complementary technical strengths. The overall goal of this proposed program is to address this pressing need through an Initiative in Human Language Technology that will develop and organize a research infrastructure in document-centered HLT. The scope of activities of this program includes:
- Building a software infrastructure for automated natural language processing and learning to facilitate rapid prototyping of experiments, demos and course materials in HLT
- Stimulating competitive research in computational linguistics, particularly MT, IE, and modeling of foreign language acquisition
- Developing an interdisciplinary training program in HLT consisting of an undergraduate minor and a graduate degree concentration
- Supporting HLT applications: translation software, other assistive technologies, digital document analysis, crisis management, and endangered language preservation.
Details of the machine translation (MT) are to appear in comptranslation; I will post information about the information extraction (IE) and foreign language acquisition aspects here in my own LJ if there is any interest. I'm also open to suggestions for communities on LJ and other venues where I can post workshop announcements and calls for discussion.