Publication Date

2008-01-01

Availability

Open access

Degree Type

Thesis

Degree Name

Master of Science (MS)

Department

Electrical and Computer Engineering (Engineering)

Date of Defense

2008-04-15

First Committee Member

Dr. Miroslav Kubat - Committee Chair

Second Committee Member

Dr. Moiez A. Tapia - Committee Co-Chair

Third Committee Member

Dr. Huseyin Kocak - Committee Member

Abstract

Because of the explosion of digital and online text information, automatic organization of documents has become a very important research area. There are mainly two machine learning approaches to enhance the organization task of the digital documents. One of them is the supervised approach, where pre-defined category labels are assigned to documents based on the likelihood suggested by a training set of labeled documents; and the other one is the unsupervised approach, where there is no need for human intervention or labeled documents at any point in the whole process. In this thesis, we concentrate on the supervised learning task which deals with document classification. One of the most important tasks of information retrieval is to induce classifiers capable of categorizing text documents. The same document can belong to two or more categories and this situation is referred by the term multi-label classification. Multi-label classification domains have been encountered in diverse fields. Most of the existing machine learning techniques which are in multi-label classification domains are extremely expensive since the documents are characterized by an extremely large number of features. In this thesis, we are trying to reduce these computational costs by applying different types of algorithms to the documents which are characterized by large number of features. Another important thing that we deal in this thesis is to have the highest possible accuracy when we have the high computational performance on text document categorization.

Keywords

Decision Trees; Multi-label Classification

Share

COinS