Text mining: Finding right documents from large collection of unstructured documents

Author

Amarakoon, Savidu ; Caldera, Amitha

Author_Institution

Sch. of Comput., Univ. of Colombo., Colombo, Sri Lanka

fYear

2011

fDate

24-26 Oct. 2011

Firstpage

5

Lastpage

10

Abstract

In our day to day life we come across unstructured data in many forms. These include books journals, audio / video files and unstructured text such as emails, web pages and documents. And these data can be a vital source in order to make informed decisions. For example in any company there is a set of people who can be identified as the paramount from among its workforce. Identifying what is common among them and identifying others like them would undoubtedly improve the output of the company. This is the basis on which this research was carried out. The central aspect of the research was to use text mining techniques to mine the data in a set of documents and identify what are the common characteristics among them and then to identify other documents which contains these characteristics.

Keywords

data mining; text analysis; data mining; right document finding; text mining techniques; unstructured document large collection; Indexing; Java; Libraries; Portable document format; Text mining; Data Mining; Document-based Searching; Lucene; Text Mining; Unstructured Data;

fLanguage

English

Publisher

ieee

Conference_Titel

Data Mining and Intelligent Information Technology Applications (ICMiA), 2011 3rd International Conference on

Conference_Location

Macao

Print_ISBN

978-1-4673-0231-9

Type

conf

Filename

6108390