Publication Date
2007-01-01
Availability
Open access
Degree Type
Thesis
Degree Name
Master of Science (MS)
Department
Electrical and Computer Engineering (Engineering)
Date of Defense
2007-11-21
First Committee Member
Michael Scordilis - Committee Chair
Second Committee Member
Xiaodang Cai - Committee Member
Third Committee Member
SubramanianRamakrishnan - Committee Member
Abstract
In automatic speech recognition systems (ASRs), training is a critical phase to the system?s success. Communication media, either analog (such as analog landline phones) or digital (VoIP) distort the speaker?s speech signal often in very complex ways: linear distortion occurs in all channels, either in the magnitude or phase spectrum. Non-linear but time-invariant distortion will always appear in all real systems. In digital systems we also have network effects which will produce packet losses and delays and repeated packets. Finally, one cannot really assert what path a signal will take, and so having error or distortion in between is almost a certainty. The channel introduces an acoustical mismatch between the speaker's signal and the trained data in the ASR, which results in poor recognition performance. The approach so far, has been to try to undo the havoc produced by the channels, i.e. compensate for the channel's behavior. In this thesis, we try to characterize the effects of different transmission media and use that as an inexpensive and repeatable way to train ASR systems.
Keywords
Speech Recognition; System Identification; Channel Modeling; Adaptive Filters; Filter; ARMA; IIR; FIR; ASR; ROC; Optimization; Search; Cyclic Coordinate; Minima; Optimum; Feature; MFCC; Mel Frequency; Cepstral Coefficient; CMS; RASTA; Differential; Breakdown Effect; RLS; RMS; LMS; NLMS; Self-orthogonalizing; Norm; Space; Feature Space; Time Domain; Frequency Domain; Cepstral Domain; Cepstrum; Error Minimization; Error; Measure; Telephone; Landline; Cellphone; Voice Over Ip; Voip; Ip; Jitter; Network; Gilbert-Elliot Model; Impulse Response; Distance Gain Ratio; MOS; PESQ; Perceptual; Audio; Speech; Sound; Voice; Vocal; Model; Channel; System; Training; Simulation; Simulate; IBM; UM-IBM Project; Linear; Nonlinear; Time-invariant; LTI; Dynamic
Recommended Citation
Sklar, Alexander Gabriel, "Channel Modeling Applied to Robust Automatic Speech Recognition" (2007). Open Access Theses. 87.
https://scholarlyrepository.miami.edu/oa_theses/87