Speech Technology and Research (STAR) Laboratory Seminar Series
Past talks: 2006
Abstract:
Most modern speech recognizers are based on continuous-density hidden Markov models (CD-HMMs). The hidden states in these CD-HMMs model different phonemes or sub-phonetic elements, while the observations model cepstral feature vectors. Distributions of cepstral feature vectors are most often represented by Gaussian mixture models (GMMs). The accuracy of the recognizer depends critically on the careful estimation of GMM parameters. The most basic approach involves maximum likelihood (ML) estimation. The main attraction of the EM algorithm is that no free parameters need to be tuned for its convergence. However, in general, maximum likelihood training criteria do not optimize classification error rates directly. In many cases, alternative training criteria which track error rates more explicitly, tend to perform better. Two well-known examples are discriminative methods like conditional maximum likelihood (CML)/maximum mutual information (MMI) and minimum classification errors (MCE). In this talk, I will present a new framework of discriminative training called large margin hidden Markov models. Inspired by the principles of large margin, a well-studied statistical learning framework, the large margin HMMs parameter estimation techniques learn parameters by separating correct labeling sequence from incorrect labeling sequences by a large margin. The large margin is directly proportional to the number of labeling mistakes. The training is cast as a convex optimization which maximizes the margins. I will describe the framework and the training algorithm of the large margin HMMs. I will also present experimental results of applying this training criteria to building phoneme recognizers. We found significantly improved phoneme recognition accuracy on the TIMIT speech corpus. We also systematically compared to other leading discriminative training methods. We found greater error reduction from baseline systems than both CML and MCE. Joint work with Dr. Lawrence K. Saul (U. of California, San Diego).
References:
Fei Sha and Lawrence K. Saul (2006). Large margin Gaussian mixture models for automatic speech recognition. To appear in Neural Information Processing Systems Conference 2006 (Vancouver, CA).
Fei Sha and Lawrence K. Saul (2006). Large margin Gaussian mixture modeling for phonetic classification and recognition. Proc. of ICASSP 2006, Tolouse, France.
Fei Sha and Lawrence K. Saul (2007). Comparison of large margin training to other discriminative methods for phonetic recognition by hidden Markov models. Submitted to ICASSP 2007.