|
Prosody for Dialog Systems
Prosody for Dialog Systems
Investigators
Elizabeth Shriberg
Andreas Stolcke
Harry Bratt
Luciana Ferrer
Kemal Sönmez
Project Summary
SRI is investigating the use of prosody, the rhythm and melody of
speech, in voice input to human-computer dialog systems. Current
dialog systems often model prosody on the output side, to
generate acceptable speech synthesis, but few systems use prosody on
the input side, as this is a quite difficult task.
Nevertheless, we believe the that pursuing this goal will be worth the
effort, because prosody is one of the main cues that people use in
conveying information to each other.
Prosody can enhance spoken
interaction with dialog systems in several important ways, for example by
detecting user emotions (such as frustration or boredom),
disfluencies and repairs, locating endpoints of user utterances,
and distinguishing statements from questions.
SRI research in these areas was funded by the Intelligent Systems program at
NASA's Ames Research Center,
and by DARPA through the
Reliable Omni-Present Automatic Recognition (ROAR) program.
The project has also benefitted from two prior SRI projects:
Hidden Event Modeling
and Information Extraction from Speech, and
the research on emotion involves an ongoing collaboration with
ICSI.
Recent Publications and Presentations:
-
E. Shriberg, A. Stolcke, & J. Ang,
Prosody-Based Detection of Annoyance and
Frustration in Communicator Dialogs,
Presentation at the DARPA ROAR Workshop, Orlando, FL,
Nov. 30, 2001.
(PowerPoint)
-
E. Shriberg & A. Stolcke,
Harnessing Speech Prosody for Human-Computer
Interaction,
Presentation at the NASA Intelligent Systems Workshop, Pensacola, FL,
Feb. 26, 2002.
(PowerPoint)
-
J. Ang, R. Dhillon, A. Krupski, E. Shriberg, and A. Stolcke (2002),
Prosody-Based Automatic Detection of Annoyance and Frustration
in Human-Computer Dialog.
Proc. Intl. Conf. on Spoken Language Processing, Denver,
vol. 3, pp. 2037-2040.
(PDF)
-
L. Ferrer, E. Shriberg, and A. Stolcke (2002),
Is the Speaker Done Yet? Faster and More Accurate
End-of-Utterance Detection Using Prosody in Human-Computer Dialog.
Proc. Intl. Conf. on Spoken Language Processing, Denver,
vol. 3, pp. 2061-2064.
(PDF)
-
L. Ferrer, E. Shriberg, and A. Stolcke (2003),
A prosody-based approach to end-of-utterance detection that does not
require speech recognition.
Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing,
Hong Kong, vol. 1, pp. 608-611.
(PDF)
|
|