Speech Technology and Research (STAR) Laboratory Seminar Series
Past talks: 2010
Abstract: Statistical language models (LM) are a key component of large vocabulary continuous speech recognition systems since they assist a system in discriminating the utterances with the highest likelihood. To do so, these models gather short word sequence probabilities (n-gram pr obabilities) trained once and forall on very large multi-topic text corpora. However, since n-gram probabilities can change with topics, such baseline LMs suffer from a lack of adequacy when transcribing topic-specific spoken documents. In this presentation, the problem of topic LM adaptation will be discussed through an unsupervised method which claims to avoid the use o f any a priori knowledge about encountered topics and to integrate natural language processing techniques. More precisely, we will see ho w classical techniques can be specialized to take into account lexico-syntactic characteristics of topics while limitations of classical n-grams LMs will be highlighted. This latter point will give rise to discuss about more promising LM types to face adaptation problems.
Abstract: For many years the human auditory system has been an inspiration for developers of automatic speech recognition systems because of its ability to interpret speech accurately in a wide variety of difficult acoustical environments. This talk will discuss the application of physiologically-motivated and psychophysically-motivated approaches to signal processing that facilitates robust automatic speech recognition. The talk will begin by reviewing selected aspects of auditory processing that are believed to be especially relevant to speech perception, and that had been components of signal processing schemes that were proposed in the 1980s. We will review and discuss the motivation for, and the structure of, classical and contemporary computational models of auditory processing that have been applied to speech recognition, and we will evaluate and compare their impact on improving speech recognition accuracy. Finally, we will discuss some of the reasons why we believe that progress to date has been limited, and share insights that we have gleaned about auditory processing from recent work at Carnegie Mellon.
Richard M. Stern received the S.B. degree from the Massachusetts Institute of Technology in 1970, the M.S. from the University of California, Berkeley, in 1972, and the Ph.D. from MIT in 1977, all in electrical engineering. He has been on the faculty of Carnegie Mellon University since 1977, where he is currently a Professor in the Electrical and Computer Engineering, Computer Science, and Biomedical Engineering Departments, the Language Technologies Institute, and a Lecturer in the School of Music. Much of Dr. Stern's current research is in spoken language systems, where he is particularly concerned with the development of techniques with which automatic speech recognition can be made more robust with respect to changes in environment and acoustical ambience. He has also developed sentence parsing and speaker adaptation algorithms for earlier CMU speech systems. In addition to his work in speech recognition, Dr. Stern has worked extensively in psychoacoustics, where he is best known for theoretical work in binaural perception. Dr. Stern is a Fellow of the Acoustical Society of America, the 2008-2009 Distinguished Lecturer of the International Speech Communication Association, a recipient of the Allen Newell Award for Research Excellence in 1992, and he served as General Chair of Interspeech 2006. He is also a member of the IEEE and the Audio Engineering Society.