DocumentCode
3762542
Title
Comparative study between Part-of-Speech and statistical methods of text extraction in the tourism domain
Author
Guson P. Kuntarto;Fahmi L. Moechtar;Berkah I. Santoso;Irwan P. Gunawan
Author_Institution
Information Systems Department, Universitas Bakrie, Jakarta, Indonesia 12920
fYear
2015
Firstpage
1
Lastpage
6
Abstract
In this paper, a comparison between two different text extraction methods is given, namely the linguistic (Part-of-Speech / POS) and statistical methods (Term Frequency Inverse Document Frequency / TF-IDF). Text extractions were performed as part of ontology population in the Indonesian tourism domain. This paper also contributes in creating a multimedia corpus from three different resources or websites of Balinese tourism domain. Performance of each method is evaluated by means of several relevance measures. It was found that the statistical method used gives higher relevance than the linguistic methods. We have analysed that this is due to the limitation of the reference terms used in the initial ontology from our previous research.
Keywords
"Ontologies","Pragmatics","Sociology","Statistical analysis","Data mining","Engines"
Publisher
ieee
Conference_Titel
Information Technology Systems and Innovation (ICITSI), 2015 International Conference on
Print_ISBN
978-1-4673-6663-2
Type
conf
DOI
10.1109/ICITSI.2015.7437675
Filename
7437675
Link To Document