• DocumentCode
    476747
  • Title

    Measuring the representativeness of index terms in literary texts: an experiment on the Quran

  • Author

    Rahman, Hayati Abd ; Noah, Shahrul Azman ; Jimenez-Salazar, Hector

  • Author_Institution
    Faculty of Information Science and Technology, National University of Malaysia, 43600 Bangi, Malaysia
  • Volume
    2
  • fYear
    2008
  • fDate
    26-28 Aug. 2008
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    Concept hierarchy is a hierarchically organized collection of domain concepts. It is particularly useful in many applications such as information retrieval, document browsing and document classification. One of the important tasks in the construction of concept hierarchy is the identification of suitable terms with appropriate size of domain vocabulary. One way of achieving such a size is by using term reduction. The aim of this paper is to examine the effectiveness of the reduction approach to reduce the size of vocabulary using term selection methods. An experiment has been conducted on the Quran which is assumed to be a literary text. The experiment compares the entropy method, the transition point method and the hybrid of transition point and entropy methods with the Vector Space Model (VSM). Results indicate the effectiveness of the Transition Point method as compared to the others in reducing the size of the vocabulary but at the same time preserve those important terms that exist in the literary documents.
  • Keywords
    Buildings; Distortion measurement; Entropy; Indexing; Information retrieval; Information science; Natural languages; Noise reduction; Vocabulary; Writing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology, 2008. ITSim 2008. International Symposium on
  • Conference_Location
    Kuala Lumpur, Malaysia
  • Print_ISBN
    978-1-4244-2327-9
  • Electronic_ISBN
    978-1-4244-2328-6
  • Type

    conf

  • DOI
    10.1109/ITSIM.2008.4631699
  • Filename
    4631699