Meeting Recognition and Understanding
Meeting Recognition and Understanding
Andreas Stolcke
Elizabeth Shriberg
Dimitra Vergyri
Gokhan Tur
SRI is carrying out reseach and development of techniques that
allow the accurate automatic transcription and higher-level processing
of multi-party meetings.
Our main focus in this area is on speech recognition for meetings, and
in the use of prosody for extracting beyond-the-words information,
such as sentence boundaries, dialog acts, and emotions ("hot spots").
This work is currently supported by several projects:
CALO Meeting Assistant
The CALO Meeting Assistant is part of the larger
Cognitive Assistant that Learns and Organizes project,
and aims to construct an interactive agent that provides
online and offline assistance to meeting participants.
This is a team effort with groups from CMU, GA Tech, MIT, OHSU,
Stanford CSLI, and U. Washington.
The SRI STAR Lab's main contributions are in meeting recognition and dialog act
segmentation and tagging.
Speech Processing for Meetings (with ICSI)
A long-term collaboration with the speech group at
International Computer Science Institute
has investigated many aspects of automatic meeting processing, from
data collection to extraction of "hot spots", or regions of high
participant involvement.
This work is funded by a variety of sources, including a
National Science Foundation ITR grant
with ICSI, Columbia University, and the University of Washington on
"Mapping Meetings",
and the European AMI and IM2 projects.
As part of this collaboration we are also regular participants in the
NIST Rich Transcription Spring evaluations, which have focussed on
meeting diarization and recognition.
Representative Publications
A. Stolcke (2011),
Making the Most from Multiple Microphones in Meeting Recognition,
Proc. IEEE ICASSP, pp. 4992-4995, Prague.
X. Lei, W. Wang, & A. Stolcke (2010),
Unsupervised Domain Adaptation With Multiple Acoustic Models,
Proc. IEEE Spoken Language Technology Workshop,
pp. 235-240, Berkeley, CA.
G. Tur, A. Stolcke, L. Voss, S. Peters, D. Hakkani-Tür, J. Dowding, B. Favre,
R. Fernández, M. Frampton, M. Frandsen, C. Frederickson, M. Graciarena,
D. Kintzing, K. Leveque, S. Mason, J. Niekrasz, M. Purver, K. Riedhammer,
E. Shriberg, J. Tien, D. Vergyri, & F. Yang (2010),
The CALO Meeting Assistant system,
IEEE Trans. Audio, Speech, and Language Processing 18, 1601-1611.
A. Stolcke, G. Friedland, & D. Imseng (2010),
Leveraging Speaker Diarization for Meeting Recognition from Distant
Proc. IEEE ICASSP, pp. 4390-4393, Dallas.
D. Vergyri, A. Stolcke, & G. Tur (2009),
Exploiting User Feedback for Language Model Adaptation in
Meeting Recognition,
Proc. IEEE ICASSP, pp. 4737-4740, Taipei.
G. Tur, A. Stolcke, L. Voss, J. Dowding, B. Favre, R. Fernandez, M. Frampton,
M. Frandsen, C. Frederickson, M. Graciarena, D. Hakkani-Tür, D. Kintzing,
K. Leveque, S. Mason, J. Niekrasz, S. Peters, M. Purver, K. Riedhammer,
E. Shriberg, J. Tien, D. Vergyri, & F. Yang (2008),
The CALO Meeting Speech Recognition and Understanding System,
Proc. IEEE Spoken Language Technology Workshop, pp. 69-72, Goa, India.
A. Stolcke, X. Anguera, K. Boakye, O. Cetin, A. Janin, M. Magimai-Doss,
C. Wooters, & J. Zheng (2008),
The SRI-ICSI Spring 2007 Meeting and Lecture Recognition System,
R. Stiefelhagen, R. Bowers, and J. Fiscus (eds.),
CLEAR 2007 and RT 2007,
Springer Lecture Notes in Computer Science 4625,
pp. 450-463.
J. Zheng A. Stolcke (2007),
fMPE-MAP: Improved Discriminative Adaptation for Modeling New Domains,
to appear in Proc. Interspeech/Eurospeech, Antwerp
G. Tur & A. Stolcke (2007),
Unsupervised Language Model Adaptation for Meeting Recognition,
vol. 4, pp. 173-176, Honolulu, Hawaii.
A. Janin, A. Stolcke, X. Anguera, K. Boakye, O. Cetin, J. Frankel, J. Zheng
The ICSI-SRI Spring 2006 Meeting Recognition System.
Machine Learning for Multimodal Interaction:
Third International Workshop, MLMI 2006,
Springer Lecture Notes in Computer Science Series,
S. Renals, S. Bengio, & J. Fiscus, editors, pp. 444-456.
© 2006 Springer-Verlag.
K. Boakye & A. Stolcke (2006),
Improved Speech Activity Detection Using Cross-Channel Features for Recognition
of Multiparty Meetings.
Proc. ICSLP, pp. 1962-1965, Pittsburgh.
M. Zimmermann, A. Stolcke, & E. Shriberg (2006),
Joint Segmentation and Classification of Dialog Acts in Multiparty Meetings.
Proc. IEEE ICASSP, vol. 1, pp. 581-584, Toulouse.
A. Stolcke, X. Anguera, K. Boakye, O. Cetin, F. Grezl, A. Janin,
A. Mandal, B. Peskin, C. Wooters, & J. Zheng (2005),
Further Progress in Meeting Recognition: The ICSI-SRI Spring 2005
Speech-to-Text Evaluation System.
Proc. NIST MLMI Meeting Recognition Workshop, Edinburgh.
Also in
Machine Learning for Multimodal Interaction:
Second International Workshop, MLMI 2005,
Springer Lecture Notes in Computer Science Series,
Volume 3869, S. Renals and S. Bengio, editors, pp. 463-475.
© 2006 Springer-Verlag.
M. Zimmermann, Y. Liu, E. Shriberg, & A. Stolcke (2005),
A* based Joint Segmentation and Classification of Dialog Acts in Multiparty
Proc. IEEE Speech Recognition and Understanding Workshop,
pp. 215-219, San Juan, Puerto Rico.
N. Mirghafori, A. Stolcke C. Wooters, T. Pirinen, I. Bulyko,
D. Gelbart, M. Graciarena, S. Otterson, B. Peskin, & M. Ostendorf (2004),
From Switchboard to Meetings:
Development of the 2004 ICSI-SRI-UW Meeting Recognition System.
Proc. Intl. Conf. Spoken Language Processing,
pp. 1957-1960, Jeju, Korea.
E. Shriberg, R. Dhillon, S. Bhagat, J. Ang, and H. Carvey (2004).
The ICSI Meeting Recorder Dialog Act (MRDA) Corpus.
Proc. 5th SIGdial Workshop on Discourse and Dialogue, M. Strube and C.
Sidner (Eds.), April 30 - May 1, Cambridge, MA, pp. 97-100.
B. Wrede and E. Shriberg (2003),
The Relationship Between Dialogue Acts and Hot Spots in Meetings.
Proc. IEEE Speech Recognition and Understanding Workshop,
St. Thomas, U.S. Virgin Islands.
A. Janin, D. Baron, J. Edwards, D. Ellis, D. Gelbart, N. Morgan,
B. Peskin, T. Pfau, E. Shriberg, A. Stolcke, C. Wooters (2003),
The ICSI Meeting Corpus.
Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing,
Hong Kong.
D. Baron, E. Shriberg, and A. Stolcke (2002),
Automatic Punctuation and Disfluency Detection in Multi-Party
Meetings Using Prosodic and Lexical Cues.
Proc. Intl. Conf. on Spoken Language Processing,
vol. 2, pp. 949-952, Denver.
T. Pfau, D.P.W. Ellis, & A. Stolcke (2001),
Multispeaker Speech Activity Detection for the ICSI Meeting Recorder.
Proc. IEEE Automatic Speech Recognition and Understanding Workshop,
pp. 107-110,
Madonna di Campiglio, Italy.
E. Shriberg, A. Stolcke, & D. Baron (2001),
Observations on Overlap: Findings and Implications for
Automatic Processing of Multi-Party Conversation.
Proc. EUROSPEECH, vol. 2, pp. 1359-1362,
Aalborg, Denmark.