Psychoacoustic modeling for perceptual audio coding and watermarking with improved synchronization

Date of Award




Degree Name

Doctor of Philosophy (Ph.D.)


Electrical and Computer Engineering

First Committee Member

Michael S. Scordilis, Committee Chair


The availability of increased computational power and the proliferation of the Internet have facilitated the production and distribution of unauthorized copies of multimedia information. As a result the problem of copyright protection has attracted the interest of the worldwide scientific and the business communities. The most promising solution seems to be the watermarking process where the original data is marked with ownership information hidden in an imperceptible manner in the original signal. Compared to embedding watermarks into still images, audio watermarking is much more challenging due to the extreme sensitivity of the human auditory system to changes in the audio signal. Understanding of the human perception processes and including them in effective psychoacoustic models, is the key to successful watermarking. Aside from the psychoacoustic modeling, synchronization is also an important component for a successful watermarking system. In order to recover the embedded watermark from the watermarked signal the detector has to know the beginning location of the embedded watermark first.In this dissertation proposal, we focus on those two issues. We propose a psychoacoustic model which is based on the discrete wavelet packet transform (DWPT). This model takes advantage of the flexibility of DWPT decomposition to closely approximates the critical bands and provides precise masking thresholds, resulting in increased extent of inaudible spectrum and reduction of sum to signal masking ratio (SSMR) compared to the existing competing techniques. The proposed psychoacoustic model has direct application to digital perceptual audio coding as well as digital audio watermarking.For digital perceptual audio coding, greater extent of inaudible spectrum provided by the psychoacoustic model results more audio samples to be quantized to zero, leading to a decreased compression bit rate. The reduction of SSMR on the other hand, allows coarser quantization step, which further cuts the necessary bits for audio representation in the audible spectrum areas. In another words, the audio compressed with the proposed digital perceptual codec achieves better subjective quality than an existing coding standard when operating at the same information rate, which is proven by the subjective listening test. Digital audio watermarking applications will benefit from the proposed psychoacoustic model by embedding more watermarks to the inaudible spectrum, which results to a watermark payload increase and higher energy watermarks to the audible spectrum areas, which leads to improved robustness and greater resiliency to attacks and signal transformations than existing techniques, as proven by the experimental results.We finally introduce a fast and robust synchronization algorithm for watermarking which exploits the consistency of the signal energy distribution under varying transformation conditions and uses a matched filter approach in a fast search for determining the precise watermark location. The proposed synchronization method achieves error free sample-to sample synchronization under different attacks and signal transformations and shows very high robustness to severe malicious time scaling manipulation.


Engineering, Electronics and Electrical

Link to Full Text