Speech Technology and Research (STAR) Laboratory Seminar Series
Past talks: 2008
Abstract:
In 1949, Warren Weaver suggested applying cryptanalytic approaches to the problem of automatic language translation. ("When I look at an article in Russian, I say: this is really written in English, but it has been coded in some strange symbols. I will now proceed to decode"). Claude Shannon had just laid a foundation of information theory for cryptology, while Alan Turing and others had developed practical techniques and machinery. Shannon's work was declassified in 1949, and Turing's in 1996. The history of postwar cryptology has not yet been written. The 1990s actually saw Weaver's language translation idea picked up. Since then, there has been tremendous progress in statistical language translation. We take large, human-translated text collections (up to half a billion words) and train models. Some models can be viewed as word substitution/transposition ciphers, while others are linguistically more sophisticated. The need for large translated texts often perplexes newcomers, who ask: (1) can I train translation systems without parallel text, and (2) how much text do I need? These questions are annoying those working in the field, but Turing and Shannon would relate, as codebreakers, they did not have the luxury of parallel plaintext/ciphertext collections, and a short ciphers were at the epitome of data sparsity. We'll look at these two questions in this talk.
Abstract:
After a short introduction to summarization, I'll describe two meeting data sets (AMI and ICSI) and their annotations. Current state of the art automatic evaluation methods include the text summarization rooted ROUGE and a weighted precision measure. In preparation for understanding the limits in extractive summarization, I'll give detailed examples for these measures. The important reason for baseline and limit results is that prior works on meeting summarization always changed preprocessing, summary lengths and evaluation criterion, which makes it very hard to compare algorithms and results. Accompanying new results with baseline and limit results for the same conditions allows a comparison between algorithms and results. To do so, I introduce two simple baselines for summarization (random selection and longest utterances). To determine the upper limit, we mapped the summarization problem to a knapsack problem, searching for the best subset of utterances to achieve the best evaluation score while satisfying a given length constraint. We solve that optimization problem with a linear integer program and give results for manual transcripts and ASR data. Finally, I give a brief outlook on further work to do in meeting summarization.
Abstract:
We describe our early experience building and optimizing GOOG-411, a fully automated, voice-enabled, business finder. We show how taking an iterative approach to system development allows us to optimize the various components of the system, thereby progressively improving user-facing metrics. We show the contributions of different data sources to recognition accuracy. For business listing language models, we see a nearly linear performance increase with the logarithm of the amount of training data. To date, we have improved our correct accept rate by 25% absolute, and increased our transfer rate by 35% absolute.Brian Strope has been working on building, testing, deploying, and re-optimizing goog411 for the last couple years. Before that he worked on acoustic modeling, speech detection, and application tuning at Nuance. His PhD from UCLA is on signal processing, perceptual experiments, and ASR robustness. In a past life he designed workstation hardware for HP, and he currently spends a lot of his spare time playing golf with his 5 3/4 year old son.
Francoise Beaufays is a research scientist at Google where she develops speech recognition products, and researches ways to optimize their performance. For the last 2+ years she has focussed mostly on building and growing Goog411. Prior to Google, she was a researcher in speech recognition at SRI and then Nuance. She holds a PhD, EE from Stanford. Francoise spends a lot of her spare time with her 5 and 7 year old daughters, Gina and Barbara.
Abstract:
In this talk, we present some results of our on-going work on English to Turkish statistical machine translation. Turkish is an agglutinative language with very rich inflectional and derivational morphology. Turkish is also a free constituent order with almost no formal ordering constraints at the sentence level. These and the fact that Turkish -- English parallel corpora is a scarce resource compared to other languages popular in SMT research, bring about interesting issues for SMT involving Turkish. After a discussion of the highlights of relevant aspects of Turkish, we investigate different representational granularities for sub-lexical representation. We find that (i) representing both Turkish and English at the morpheme-level but with some selective morpheme-grouping on the Turkish side of in the training data, (ii) augmenting the training data with ``sentences'' comprising only the "content words" of the original training data, and (iii) re-ranking the n-best decoder outputs with a word-level language model by combining translation model scores with word-level language model scores, provide a non-trivial improvement over a fully word-based baseline model. Additional improvements are obtained by iterative model training (which may very loosely be called "statistical post-editing"), augmenting training data with phrase-pairs which are high-probability translations of each other, and by "word-repair" -- automatically identifying and correcting morphologically malformed words. Despite our relatively limited training data, we improve from 19.77 BLEU for the baseline, to 28.41 BLEU for a 42% relative improvement. We also touch briefly on the suitability of BLEU for languages like a Turkish and present an overview of our BLUE+ tool which considers root and morphological proximity when comparing candidate sentence words to reference sentence words and also provides various oracle BLUE scores.Kemal Oflazer has got his PhD from Computer Science at Carmegie Mellon University in 1987. He is currently a faculty member at Sabanci University, associated with the Computer Science pro gram. He is directing the Human Language and Speech Processing Laboratory. He is mainly interested in Natural Language Processing with specific applications to Turkish. Currently he is working on s tatistical machine translation (MT) between English and Turkish and developing NLP-based application for language learning. He is especially well known for his work on applying finite state methods for language processing and error tolerant finite state recognition. Two recent very interesting studies include extension of BLEU, called as BLEU+ for the evaluation of MT systems of morphologica lly rich languages and adaptation of the Turkish MT system to other Turkic languages, such as Uzbek or Turkmen. He has co-authored more than 100 international conference and peer reviewed journal p apers. Prof. Oflazer is in the editorial board of Computational Linguistics, Machine Translation, and a number of other journals. He is in the organization committees of EACL'09, IJCNLP'08, EACL'06 , ACL'05, ACL'04, EACL'03, and many others.
Abstract:
Minimum error rate training (MERT) is a widely used learning procedure for statistical machine translation models. I will contrast three search strategies for MERT: Powell's method, the variant of coordinate descent found in the Moses MERT utility, and a novel stochastic method that outperforms both of these. I will also present a method for regularizing the MERT objective that achieves statistically significant gains when combined with both Powell's method and coordinate descent.
Abstract:
Models that align phrases instead of words offer an appealing alternative to the standard relative frequency estimates of phrase translation probabilities. But, while some effective word alignment models (Model 1, Model 2 & HMM) can be estimated tractably with EM, phrase alignment models cannot. I'll talk about how to show that estimation and inference under these models is intractable. Then, I'll present two useful approximation techniques. First, I'll talk about how to cast phrase alignment search as an integer linear programming (ILP) problem and find the optimal alignment reliably and quickly with off-the-shelf ILP software. Some applications of this technique include training phrase alignment models and interpreting the output of word alignment models. Second, we'll look at how to estimate translation probabilities under a phrase alignment model using a Gibbs sampling procedure. The sampler has some nice asymptotic convergence properties and also seems to produce good results in practice. I'll walk through the different models we've trained and how they performed.