مرکز منطقه ای اطلاع رساني علوم و فناوري - Processing of unstructured data for information extraction

DocumentCode :

1877730

Title :

Processing of unstructured data for information extraction

Author :

Ingle, V.A.

Author_Institution :

Dept. of Comput. Sci. & IT, Dr. B.A.M. Univ., Aurangabad, India

fYear :

2012

fDate :

6-8 Dec. 2012

Firstpage :

Lastpage :

Abstract :

Unstructured data are those that have no predetermined form or structure and are full of textual data. It does not fit well into relational tables. Most enterprise data today can actually be considered unstructured. Typical unstructured systems include emails, reports, contracts, transcripts of telephone conversations, and other communications. Web pages also contain links and references to External, often unstructured content such as images, XML files, animations and databases. This paper focuses on extracting features in html pages by using tokenization and Non matrix factorization. Classification of text is done using bag of words approach. The workbench is dataset collected in university domain web pages.

Keywords :

Web sites; XML; information retrieval; text analysis; text detection; HTML pages; XML files; animations; contracts; databases; emails; enterprise data; feature extraction; images; information extraction; nonmatrix factorization; relational tables; reports; telephone conversation transcripts; textual data; tokenization; university domain Web pages; unstructured content; unstructured data processing; unstructured systems; Information Extraction; NMF; Text Mining; Tokenization; Unstructured data;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Engineering (NUiCONE), 2012 Nirma University International Conference on

Conference_Location :

Ahmedabad

Print_ISBN :

978-1-4673-1720-7

Type :

conf

DOI :

10.1109/NUICONE.2012.6493202

Filename :

6493202

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1877730