Publication Date

2012-04-21

Availability

Open access

Embargo Period

2012-04-21

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PHD)

Department

Electrical and Computer Engineering (Engineering)

Date of Defense

2011-04-26

First Committee Member

Miroslav Kubat

Second Committee Member

Kamal Premaratne

Third Committee Member

Akmal A. Younis

Fourth Committee Member

Nigel M. John

Fifth Committee Member

Maria M. Llabre

Abstract

Traditional approach to automated classification assumes that each object should be assigned to only one out of two or more classes. However, some real-world applications digress from this generic scenario in two important ways. First, each example can belong to several classes simultaneously (multi-label classification). Second, the classes can be hierarchically ordered in the sense that some are more specific versions of others (hierarchical classification). Seeking to address both of these issues, the presented work deals with “hierarchical multi-label classification”. In non-hierarchical multi-label classification, literature survey indicates that good performance is achieved when a Support Vector Machine (SVM) is used to induce each class separately. This said, some experiments suggest that further improvement can be achieved by explicitly dealing with the problem of imbalanced training sets. The author proposes a solution in terms of a technique referred to as R-SVM; the idea is to re-adjust the SVM-hyperplane offset accordingly. Experiments in the first part of this dissertation rely on data from domains of text-categorization. More important, however, is then the second part that focuses on hierarchical multi-label classification. Here, the author proposes a new technique, HR-SVM, which constitutes a hierarchical extension of R-SVM proceeding in a top-down fashion with a new mechanism to correct errors propagated from classifiers at higher levels of the hierarchy. The system has been subjected to experiments with data from the field of gene function prediction. The results show that the new technique compares favorably with other existing approaches along various performance criteria.

Keywords

hierarchical multi-label classification; support vector machines; threshold adjustment; decision trees; text categorization; gene-function prediction

Share

COinS