Title :
Measuring the representativeness of index terms in literary texts: an experiment on the Quran
Author :
Rahman, Hayati Abd ; Noah, Shahrul Azman ; Jimenez-Salazar, Hector
Author_Institution :
Faculty of Information Science and Technology, National University of Malaysia, 43600 Bangi, Malaysia
Abstract :
Concept hierarchy is a hierarchically organized collection of domain concepts. It is particularly useful in many applications such as information retrieval, document browsing and document classification. One of the important tasks in the construction of concept hierarchy is the identification of suitable terms with appropriate size of domain vocabulary. One way of achieving such a size is by using term reduction. The aim of this paper is to examine the effectiveness of the reduction approach to reduce the size of vocabulary using term selection methods. An experiment has been conducted on the Quran which is assumed to be a literary text. The experiment compares the entropy method, the transition point method and the hybrid of transition point and entropy methods with the Vector Space Model (VSM). Results indicate the effectiveness of the Transition Point method as compared to the others in reducing the size of the vocabulary but at the same time preserve those important terms that exist in the literary documents.
Keywords :
Buildings; Distortion measurement; Entropy; Indexing; Information retrieval; Information science; Natural languages; Noise reduction; Vocabulary; Writing;
Conference_Titel :
Information Technology, 2008. ITSim 2008. International Symposium on
Conference_Location :
Kuala Lumpur, Malaysia
Print_ISBN :
978-1-4244-2327-9
Electronic_ISBN :
978-1-4244-2328-6
DOI :
10.1109/ITSIM.2008.4631699