Confidence measures as a search guide in speech recognition

Date of Award




Degree Name

Doctor of Philosophy (Ph.D.)

First Committee Member

Michael Scordilis, Committee Chair


Despite the significant advances in speech and language technologies speech recognition systems are still not perfect. Every time a recognition hypothesis is produced, there is some degree of uncertainty inherent to it. Utterance verification was proposed as a backup technique used to verify the reliability of speech recognition results. This technique uses quantitative scores, such as confidence measures, to estimate the reliability of a recognition decision. However, this technique generally provides confidence information after the recognition phase. The maximum expected benefit is confined to the verification of the final decoding output without incorporating any mechanism for early detection and avoidance of these errors.In this thesis an online confidence estimation and hypothesis verification approach is introduced. By incorporating confidence information early in the search phase the recognizer may be directed to the most promising paths, which may lead to more accurate final decoding result. For this purpose three techniques are proposed. The first technique is Confidence Based Pruning (CBP). In this technique confidence information plays the role of online filter that is applied to the word level partial hypotheses to make a decision of either considering them for future expansions or discarding them from the search space. The second technique is Confidence Based Language Modeling (CBLM). In this technique confidence information is used to adjust the score of the language model. This confidence based score tuning makes the language model score favored in regions of well matched acoustics, and make it plays a second fiddle when the acoustics are ambiguous. The main advantage of this technique is the minimization of the language overwhelming errors type. Usually these errors come in the form of word insertions that doesn't have any acoustic evidence. The third technique is Confidence Based Fast Match (CBFM). In this technique confidence information is used to look ahead in time and identify search extensions with poor acoustic score and discarded them before applying the expensive detailed match evaluation. With this technique a considerable amount of improvement in decoding speed can be achieved with very little sacrifice in accuracy.Incorporating confidence measures in the decoding process enforce some constraints on the type of measures that can be used. The measure has to be computationally inexpensive so it doesn't affect the efficiency of the search process. Also it should be extracted synchronously from the on line information that are available during the search process. In this thesis the usage of two confidence measures, the posterior probability measure and the average base-phone rank measure, is investigated for on-line confidence estimation during the search process. Both of these two measures satisfy the efficiency and synchronization conditions. Moreover, they have the advantage of being derived only from the acoustic model therefore they can be used as a tuning parameter for the language model score in the proposed CBLM technique without suffering from circular reasoning. Also as these two measures are derived according to two different views of the acoustic scores, integrating them in a composite measure using a neural network is used to build a more robust measure.


Computer Science

Link to Full Text