DocumentCode
476747
Title
Measuring the representativeness of index terms in literary texts: an experiment on the Quran
Author
Rahman, Hayati Abd ; Noah, Shahrul Azman ; Jimenez-Salazar, Hector
Author_Institution
Faculty of Information Science and Technology, National University of Malaysia, 43600 Bangi, Malaysia
Volume
2
fYear
2008
fDate
26-28 Aug. 2008
Firstpage
1
Lastpage
5
Abstract
Concept hierarchy is a hierarchically organized collection of domain concepts. It is particularly useful in many applications such as information retrieval, document browsing and document classification. One of the important tasks in the construction of concept hierarchy is the identification of suitable terms with appropriate size of domain vocabulary. One way of achieving such a size is by using term reduction. The aim of this paper is to examine the effectiveness of the reduction approach to reduce the size of vocabulary using term selection methods. An experiment has been conducted on the Quran which is assumed to be a literary text. The experiment compares the entropy method, the transition point method and the hybrid of transition point and entropy methods with the Vector Space Model (VSM). Results indicate the effectiveness of the Transition Point method as compared to the others in reducing the size of the vocabulary but at the same time preserve those important terms that exist in the literary documents.
Keywords
Buildings; Distortion measurement; Entropy; Indexing; Information retrieval; Information science; Natural languages; Noise reduction; Vocabulary; Writing;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Technology, 2008. ITSim 2008. International Symposium on
Conference_Location
Kuala Lumpur, Malaysia
Print_ISBN
978-1-4244-2327-9
Electronic_ISBN
978-1-4244-2328-6
Type
conf
DOI
10.1109/ITSIM.2008.4631699
Filename
4631699
Link To Document