Publication Date

2016-01-16

Availability

Open access

Embargo Period

2016-01-20

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PHD)

Department

Electrical and Computer Engineering (Engineering)

Date of Defense

2015-12-14

First Committee Member

Miroslav Kubat

Second Committee Member

Kamal Premaratne

Third Committee Member

Mei-Ling Shyu

Fourth Committee Member

Xiaodong Cai

Fifth Committee Member

James N. Wilson

Abstract

Traditional classification techniques assume samples are described by vectors of features. However, in some domains samples are gathered by measuring a variable with respect to two or more other variables: for a given value of x and y measure z. In such domains, samples are more naturally described by matrices or by higher dimensional arrays. We present a novel latent Dirichlet allocation (LDA)-based approach for modeling and analyzing fluorescent spectroscopy excitation-emission Matrices (EEMs) and other three way datasets. We introduce parallels between topic modeling and three-way arrays which allow us to create adaptations to use LDA-based methods in latent fluorophore studies. The proposed framework views the EEMs as being generated from an underlying hidden pool of flourophore compounds, and provides a latent flourophore-space representation of an EEM. We show that this LDA-based model can increase classification performance, especially when paired with parallel factor analysis (PARAFAC) which may be regarded as perhaps the most popular and widely used tool for dealing with EEMs. Our experiments show that the proposed LDA-based algorithm is in some cases more robust than PARAFAC to certain types of noise and data disturbances. We also observe that pairing this LDA-based method with PARAFAC leads to an improvement in classification performance and to added robustness at high peak-signal-to-noise-ration (PSNR) values. We also present an extended graphical model that incorporates the effect of outside variables that may affect fluorescent expression of certain compounds. The extended model offers further insight into the interaction between these variables and the latent fluorophore components while facilitating the model building process. The performance of machine learning algorithms is known to be impaired if the representation of the individual classes in the training set is imbalanced, i.e., one class outnumbering the other class(es). Such is the case for several experiments in this proposal. Many approaches to deal with this problem have been developed, none of them totally satisfactory. Here we propose membership-based minority oversampling (MeMO), as yet another possible solution, and explores, experimentally, the conditions under which it outperforms earlier attempts. Finally we introduce a Dempster-Shafer based fusion model that is intended to adaptively merge the PARAFAC and LDA-based models when their outputs are being used for classification purposes.

Keywords

Excitation Emission Matrices; Multi-way Analysis; Probabilistic Graphical Models; Dempster-Shafer Theory

Share

COinS