• DocumentCode
    1886184
  • Title

    Efficient retrieval of Malay language documents using Latent Semantic Indexing

  • Author

    Sadjirin, Roslan ; Rahman, Nurazzah Abd

  • Author_Institution
    Fac. of Comput. & Math. Sci., Univ. Teknol. MARA, Shah Alam, Malaysia
  • Volume
    3
  • fYear
    2010
  • fDate
    15-17 June 2010
  • Firstpage
    1410
  • Lastpage
    1415
  • Abstract
    The main objectives of this research is to investigate whether by using Latent Semantic Indexing (LSI) will improve the retrieval effectiveness on Malay document, compared to by using exact term-matching technique. LSI is a mathematical approach that uses Singular Value Decomposition (SVD) to discover the important association of the relationship between terms and terms, terms and documents and documents and documents. Cosine similarity measurement is used to measure the similarity between the query word and terms as well as the documents. This research uses Malay Language Test Collection consisting of 210 Malay documents, queries, relevant judgment and Malay stemmer to stem Malay terms. Results and analyses show that, LSI retrieval method outperformed the exact term-matching technique despite the longer processing time it took during the indexing. The best result for retrieval effectiveness for Malay documents in this domain is achieved when k-dimension is 4 and the threshold value is 0.8, which is 80.2 percent.
  • Keywords
    indexing; information retrieval; natural language processing; singular value decomposition; word processing; Malay language document retrieval; Malay language test collection; Malay stemmer; exact term matching technique; latent semantic indexing; singular value decomposition; DSL; Decision support systems; Latent Semantic Analysis; Latent Semantic Indexing; Malay Information Retrieval;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology (ITSim), 2010 International Symposium in
  • Conference_Location
    Kuala Lumpur
  • ISSN
    2155-897
  • Print_ISBN
    978-1-4244-6715-0
  • Type

    conf

  • DOI
    10.1109/ITSIM.2010.5561613
  • Filename
    5561613