Title :
A Framework for the Classification of Unstructured Data
Author :
Ostrowski, David Alfred
Abstract :
Increased sources and quantity of unstructured information has created a further need for categorization and interpretation of their content. This paper describes the design of an interchangeable framework to support learning from an unstructured data source. Our approach supports integration of two or more learning mechanisms with a traditional indexing method. The goal is to identify a higher semantic content and more meaningful keyword combinations, considering both supervised and unsupervised techniques. Within a specific implementation both Bayesian learning as well as clustering are integrated to support a boost parameter towards classification of unstructured text. We find that an implementation of this framework applied towards a set of Reuters news feeds supports a vastly improved recognition rate. Our effort is directed towards making associations between structured and unstructured information.
Keywords :
Bayes methods; pattern classification; text analysis; unsupervised learning; Bayesian learning; indexing method; keyword combinations; semantic content; supervised technique; unstructured data classification; unstructured information; unstructured text classification; unsupervised technique; Employment; Engines; Indexing; Learning systems; Machine learning; Machine learning algorithms; Ontologies; Sections; Supervised learning; Technological innovation; Bayesian Learning; Clustering; Lucene Index; Unstructured Data;
Conference_Titel :
Semantic Computing, 2009. ICSC '09. IEEE International Conference on
Conference_Location :
Berkeley, CA
Print_ISBN :
978-1-4244-4962-0
Electronic_ISBN :
978-0-7695-3800-6
DOI :
10.1109/ICSC.2009.48