Title :
Text mining: Finding right documents from large collection of unstructured documents
Author :
Amarakoon, Savidu ; Caldera, Amitha
Author_Institution :
Sch. of Comput., Univ. of Colombo., Colombo, Sri Lanka
Abstract :
In our day to day life we come across unstructured data in many forms. These include books journals, audio / video files and unstructured text such as emails, web pages and documents. And these data can be a vital source in order to make informed decisions. For example in any company there is a set of people who can be identified as the paramount from among its workforce. Identifying what is common among them and identifying others like them would undoubtedly improve the output of the company. This is the basis on which this research was carried out. The central aspect of the research was to use text mining techniques to mine the data in a set of documents and identify what are the common characteristics among them and then to identify other documents which contains these characteristics.
Keywords :
data mining; text analysis; data mining; right document finding; text mining techniques; unstructured document large collection; Indexing; Java; Libraries; Portable document format; Text mining; Data Mining; Document-based Searching; Lucene; Text Mining; Unstructured Data;
Conference_Titel :
Data Mining and Intelligent Information Technology Applications (ICMiA), 2011 3rd International Conference on
Conference_Location :
Macao
Print_ISBN :
978-1-4673-0231-9