Acoustic model and pronunciation adaptation in automatic speech recognition

Date of Award




Degree Name

Doctor of Philosophy (Ph.D.)


Electrical and Computer Engineering

First Committee Member

Michael Scordilis - Committee Chair


The general goal of this thesis is to improve the performance of state-of-the-art statistical automatic speech recognition (ASR) operating in real time conditions. Such operating environment dictates that the system adapts to acoustic and linguistic mismatch between the development and deployment conditions. In order for this to be practically possible the system will need to cope with sparse available new training data, and perform in real time with modest computational and memory usage. Towards this goal, this investigation proceeded within the framework of statistical speech recognition and focuses on two aspects of ASR: (a) statistical model adaptation and its extension to acoustic model adaptation and (b) pronunciation adaptation. The basic strategy is to adjust the parameters of the pre-trained system, based on the information from new data, to fit characteristics of the new environment. We explore the actually observed acoustic mismatch, pronunciation variations and changes in background/channel condition, and investigate their essential effects on real-time statistical speech recognition. Then, we propose new methods of online acoustic model adaptation and online pronunciation adaptation. Also we generalize the state-level joint pronunciation variation modeling together with acoustic adaptation.We propose a new online EM alternative for adaptive (GMM) training together with a new unsupervised adaptation algorithm, which can dramatically reduce both computation and memory needs at the expense of only slight performance degradation when compared to its supervised counterpart. The proposed online EM alternative can have more general applicability in statistical adaptive training and it is not restricted to GMM.The investigation continues into the modeling of pronunciation variations, in which we propose an adaptive method to model pronunciation variations by introducing an incremental decision tree. By coupling each pronunciation rewriting rule with an associated occurrence probability and the underlying acoustical model, the partial change and new pronunciation can both be modeled with greater flexibility and enhanced accurately. The analyzed results demonstrate the effectiveness of the proposed method.


Engineering, Electronics and Electrical

Link to Full Text


Link to Full Text