Computerized Map Responds To Speech
And Gestures
Penn State researchers have developed
a prototype system to help visitors locate campus parking lots and buildings
by talking with a computer-controlled map that responds not only to the
spoken word, but also to natural hand gestures.
Project leader Dr. Rajeev Sharma, assistant
professor of computer science and engineering, says, "There still is a
lot of work to be done but we have a pretty fair ground-level demonstration
model of a system in which a person can interact with a computer by using
the most natural human mode of communication -- talking while gesturing.
"Besides the current application, the
system could potentially be adapted to help tourists locate the sights
in large cities, shoppers to find stores in malls, visitors to find patients
in hospitals or even for roles in crisis management, mission planning and
briefing," he adds.
In a recent demonstration, Sanshzar
Kettebekov, a doctoral student, stood about 5 feet away from a map of the
Penn State campus projected on a 4-foot-by-3-foot screen. "Scroll," he
said gently into the cordless microphone attached to his T-shirt, and the
map moved. "Stop," Kettebekov directed and the map did.
He waved his hand in the air and a
little red hand appeared on the screen. As Kettebekov continued to gesture
with his hand, the on-screen hand followed it, like a cursor obeying a
mouse. When the red hand settled on one of the buildings, Kettebekov said,
"Show me the nearest parking lot," and a bright blue line immediately appeared
and connected the building to the closest lot.
The system is based on off-the-shelf-equipment.
The computer is a standard PC workstation equipped with a video camera,
the system's "eye" on the gesturing human. A commercially available speech
recognition package currently takes care of the conversation. However,
the Penn State researchers developed new gesture recognition software and
used footage of TV weather broadcasters narrating the weather to "train"
it.
The new Penn State gesture recognition
software is based on a technique called Hidden Markov Models (HMM), a time-varying
pattern recognition method. HMMs had been used previously in gesture recognition
systems. However, only predefined gestures, such as sign language, had
been used.
The new Penn State approach, based
on weathercaster movements, enables the computer to recognize and "understand"
a rich store of natural gestures that occur in combination with speech.
At this point, although the system
recognizes quite a few human gestures and spoken words, it doesn't like
small talk. You can't tell it, "Well, I'd like to go to the Creamery for
an ice cream cone first and then stop off at Old Main before parking at
Beaver Stadium." At least not yet.
Yuhui Zhou, a master's degree candidate,
has a background in linguistics. She is working on dialog design and feedback
systems that will enable the computer to extract the most salient information
from a human conversation stream.
Jiongyu Cai, doctoral candidate, is
working on extracting the salient gestures from the random hand waving
that most people use while talking. Kettebekov is trying to understand
the combination of speech and gestures so that he can develop software
that enables the computer to interpret gestures in the speech context.
The research team is also paying attention
to the fact that people from different cultures gesture differently but,
at present, plans call for the map to respond only to English.
Sharma says, "Computer users have been
slaves to the mouse and the keyboard too long. The equipment has, so far,
limited the potential for human interaction with computers. Incorporating
gesture, which computer vision makes possible, allows us to imagine all
kinds of potential applications.
"For example, I can imagine a computer
you wear on your head, like a virtual reality helmet, that could help you
repair your PC by telling you what to do and then 'watching' as you do
it. Or, a wearable computerized surgical aide that could help direct a
surgeon to the precise location of a tumor.
"For now, our group will be working
on trying to enable the computer to more effectively talk back to the user.
We'd like to model the human/computer dialog so that the display could
interactively influence the user input, enabling the computer to play a
more active role in the natural speech/gesture interface," he adds.
The research group has detailed the
new system in a paper, "Toward Interpretation of Natural Speech/Gesture:
Spatial Planning on a Virtual Map" published in the Proceedings of the
Army Research Laboratory Annual Symposium on Advanced Display, held
in February 1999 in Adelphi, Md. The work on gesture recognition is detailed
in Indrajit Poddar's master's thesis, completed in May, entitled "Continuous
Recognition of Dieictic Gestures for Multimodal Interfaces."
The research was supported, in part,
by grants from the National Science Foundation and the Army Research Laboratory.
Related website:
Dr.
Sharma's home page
[Contact: Dr.
Rajeev Sharma]
20-Aug-1999
|