The field of research concerned with the development of lifelike computer characters that behave like sensitive and effective teachers, testers, therapists, trainers and tutors is only twenty years old, yet the effectiveness of systems already developed portend awesome future benefits to individuals and society. There are literally tens of millions of individuals on this planet who could benefit from individualized assessments and treatments, but it impossible to provide even a fraction of this population with the help they deserve. The vision is to invent virtual tutors and therapists that are imbued with the pedagogical and clinical knowledge and interpersonal behaviors of highly effective practitioners. Achieving this vision requires research breakthroughs in spoken dialog, computer vision and character animation technologies, and the integration of these technologies in computer programs that are informed by research on how to design multimedia programs that optimize clinical outcomes during interaction with virtual tutors. Boulder Learning has the underlying technologies and is working toward the realization of this vision. Success will see a world in which effective treatments are accessible to all who need them.
Since its founding in October 1998, researchers at CSLR have focused on research and development of programs that incorporate lifelike computer characters that interact with people like sensitive and effective tutors and clinicians. Our work has focused on
(a) research and development of the language and animation technologies that power virtual tutors,
(b) on the development of infrastructure to support these research and development activities, and
(c) developing and assessing programs that are designed to use virtual tutors to provide effective clinical treatments to children and adults.
To date, CSLR has developed eight programs that use virtual tutors. Each of these programs is currently under development, and being tested with human subjects. Each has been developed in close collaboration with “domain experts”-reading researchers, teachers and/or clinicians who have developed treatments that have demonstrated to be effective in the laboratory, classroom or clinic. The programs include:
The lifelike computer characters in each of these programs use one of the 3D characters in the CU Animate system , developed by Dr. Jiyong Ma and his colleagues at CSLR. In all of our applications, the virtual tutor or therapist produces accurate and natural visual speech, using a novel technique invented at CSLR that concatenates motions capture data collected from human lips. In all of our applications to date, the visual speech is synchronized with a recorded human voice, since the human voice is a remarkable instrument that conveys emotions, enthusiasm, etc., and imparts personality to the virtual tutor. The synchronization of the human voice to the movements of the lips in all of our programs occurs fully automatically; a voice talent (who may be a clinician) records an utterance, and the utterance and the associated text string are input to the alignment system. The system transforms the text string into a sequence of expected phonetic segments, and the SONIC speech recognition system aligns these phonetic segments to the recorded speech. The waveform is then played at the appropriate point in an application, and the time aligned phonetic segments are used to inform the CU Animate system when and how to move the lips and regions of the lower face of the 3D model. An algorithm developed by Jie Yan uses a set of rules to move the head and face while the character talks. In some applications, specific animation sequences are used to portray emotions when the virtual tutor is speaking or responding to the speech of the user.
I believe that a great way to advance science is to create and share tools, technologies and systems that enable research, and to test these products through collaboration with other scientists. I have followed this agenda since 1990, when I founded the Center for Spoken Language Understanding (CSLU) at the Oregon Institute in Beaverton Oregon. Between 1990 and 1998, CSLU developed and freely distributed over 20 different annotated speech corpora in 22 languages. These corpora are widely used today, and have stimulated research and development of speech recognition, speaker recognition and language identification systems around the world. In addition, my colleagues and I created and distribute the CSLU Toolkit, a software environment for collecting and annotating speech data, and for researching and developing spoken language applications that incorporate natural dialog interaction with virtual tutors. The CSLU Toolkit has been installed in over 25,000 sites in over 100 countries and has become a major tool for research and education. For example, Cliff Nass and his students at Stanford have published over a dozen journal articles using the CSLU Toolkit (Cliff Nass articles). The Toolkit has also proven to be a powerful platform for application development; for example, the Vocabulary Wizard developed by Jacques de Villiers at CSLU is a wonderful platform for rapidly designing applications for teaching vocabulary. The Vocabulary Wizard was used to generate hundreds of applications at the Tucker Maxon School in Portland Oregon to teach new vocabulary, speech perception, speech production and reading skills to students with profound hearing loss. (See below for a brief description of this project.)
CSLR has continued this tradition, and today provides leadership in developing and distributing tools and technologies that will support research and development of a new generation of virtual tutor systems. These activities take many forms.
My research on virtual tutors was initiated in 1997 when I was awarded a three year “Challenge Grant” of $1.8 million from the National Science Foundation (with Dominic Massaro & Alex Waibel as co-Principal Investigators). The goal of this grant was to develop and integrate computer speech recognition, speech synthesis and character animation technologies into the CSLU Toolkit, and to use the toolkit to design applications to teach speech and language skills to students at the Tucker Maxon School in Portland Oregon. The research produced a number of articles, and more importantly, significant and lasting benefits to the students who used the program (and to the many students who continue to use the program today).
Looking back on this wonderful project, it seems to me that a guardian angel must have guided our efforts. I was at CSLU in 1996, working on the CSLU Toolkit, which by then integrated computer speech recognition (Developed at CSLU) and the Festival speech synthesis system (developed by Paul Taylor and Alan Black at University of Edinburgh). Using the CSLU Toolkit’s Rapid Application Developer, or RAD, a graphical user interface for designing spoken dialogs, it was possible to develop a number of sophisticated applications, such as conversing with the system to retrieve weather forecasts from a Web site. Also in 1996 I reconnected with Dom Massaro who invented the Baldi system with Michael Cohen. Baldi is a 3D talking head with very accurate lips. I had not seen Dom in over 25 years, when I was grad student at UC Riverside and Dom was a Postdoc at UC San Diego. Shortly after I read the NSF Challenge Grant program announcement, I ran into a colleague at a supermarket (that I had never gone to before) who used to work at the Oregon Graduate Institute. kathy told me she had left OGI and was now working at the Tucker Maxon Oral School (Now the Tucker Maxon School), a school that used an oral approach to instruction and language training for students with profound hearing loss. At that moment, the proverbial light bulb went on in my mind. I asked her if she thought Tucker Maxon might like to partner on an NSF grant proposal to develop a talking head that would converse with their students to teach speech and language skills. Dom was as excited as I was, we submitted the proposal, and to our great surprise, it was awarded. The research was conducted between 1997-2000.
On March 15, 2001, ABV TV’s Prime Time Thursday featured a segment in which Baldi, the 3D talking head invented by Dom Massaro and Michael Cohen at UC Santa Cruz, was shown helping children learn new vocabulary and dramatically improving their speech recognition and production skills. Prime Time introduced the segment with the words “This is what a small miracle looks like.” The National Science Foundation also featured the Baldi project on the NSF home page during March and April 2001. While the Baldi project brought great benefits and joy to many children, and continues to benefit children with sensory and cognitive disabilities through the efforts of Animated Speech Corporation, which has licensed the technology and extended its capabilities, the premier of the Prime Time segment was overshadowed by the tragic death of Mike Macon, an exceptional speech researcher who invented the voice of Baldi, who died the evening of the broadcast. After moving to CU and establishing the Center for Spoken Language Research, I applied for an NSF ITR grant to develop perceptive animated agents that could be used as virtual tutors and therapists. This grant was awarded, and the technologies developed with support from this and other grants (e.g., the SONIC speech recognition system, the CU Animate system) led to subsequent grants / projects resulting in virtual tutoring and therapy programs.