Publication Date
2008-01-01
Availability
Open access
Degree Type
Thesis
Degree Name
Master of Science (MS)
Department
Electrical and Computer Engineering (Engineering)
Date of Defense
2008-04-15
First Committee Member
Dr. Miroslav Kubat - Committee Chair
Second Committee Member
Dr. Moiez A. Tapia - Committee Co-Chair
Third Committee Member
Dr. Huseyin Kocak - Committee Member
Abstract
Because of the explosion of digital and online text information, automatic organization of documents has become a very important research area. There are mainly two machine learning approaches to enhance the organization task of the digital documents. One of them is the supervised approach, where pre-defined category labels are assigned to documents based on the likelihood suggested by a training set of labeled documents; and the other one is the unsupervised approach, where there is no need for human intervention or labeled documents at any point in the whole process. In this thesis, we concentrate on the supervised learning task which deals with document classification. One of the most important tasks of information retrieval is to induce classifiers capable of categorizing text documents. The same document can belong to two or more categories and this situation is referred by the term multi-label classification. Multi-label classification domains have been encountered in diverse fields. Most of the existing machine learning techniques which are in multi-label classification domains are extremely expensive since the documents are characterized by an extremely large number of features. In this thesis, we are trying to reduce these computational costs by applying different types of algorithms to the documents which are characterized by large number of features. Another important thing that we deal in this thesis is to have the highest possible accuracy when we have the high computational performance on text document categorization.
Keywords
Decision Trees; Multi-label Classification
Recommended Citation
Sendur, Zeynel, "Text Document Categorization by Machine Learning" (2008). Open Access Theses. 209.
http://scholarlyrepository.miami.edu/oa_theses/209