Publication Date

2016-03-18

Availability

Open access

Embargo Period

2016-03-18

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PHD)

Department

Electrical and Computer Engineering (Engineering)

Date of Defense

2016-03-03

First Committee Member

Miroslav Kubat

Second Committee Member

Kamal Premaratne

Third Committee Member

Mei-Ling Shyu

Fourth Committee Member

Nigel John

Fifth Committee Member

Ubbo Visser

Abstract

Classical machine learning algorithms were tailored to automatically classify examples that belong to mutually exclusive classes; each example may belong to one class out of a finite set of classes. In realistic applications, however, examples often belong to more than one class at the same time. For example, a text document that belongs to Geography may also be labeled as Geology. Perhaps due to the popularity of its applications, targeting this category of problems has garnered great research interest over the past decade. A widely popular approach, called Binary Relevance (BR), is to induce a separate classifier for each class; to determine whether the class is relevant for an example, or not. Despite showing some success, researchers have pointed out a critical drawback in this method. By targeting each class independently, the learner does not model class correlations: knowing if an example belongs to class X may indicate that it is likely to belong also to class Y. Conversely, this information can make the example less likely to belong to class Z. Research groups sought to incorporate class correlation information into BR by using the class labels as additional example features. Since the information about which class an example belongs to is unknown in unseen instances, the missing values are typically filled-in using the outputs of other classifiers, which makes them prone to errors. This dissertation identifies two weaknesses in existing methods: unnecessary label correlations, and error-propagation. To overcome these problems, this dissertation introduces a new multi-label classification method, called PruDent. Experiments over a broad range of benchmark datasets indicate that PruDent compares rather favorably with existing state-of-the-art methods. Additionally, PruDent improves classification accuracy while maintaining a linear complexity in the number of classes.

Keywords

multi-label classification; stacking; machine learning; chaining; data mining; artificial intelligence

Share

COinS