DocumentCode
1886184
Title
Efficient retrieval of Malay language documents using Latent Semantic Indexing
Author
Sadjirin, Roslan ; Rahman, Nurazzah Abd
Author_Institution
Fac. of Comput. & Math. Sci., Univ. Teknol. MARA, Shah Alam, Malaysia
Volume
3
fYear
2010
fDate
15-17 June 2010
Firstpage
1410
Lastpage
1415
Abstract
The main objectives of this research is to investigate whether by using Latent Semantic Indexing (LSI) will improve the retrieval effectiveness on Malay document, compared to by using exact term-matching technique. LSI is a mathematical approach that uses Singular Value Decomposition (SVD) to discover the important association of the relationship between terms and terms, terms and documents and documents and documents. Cosine similarity measurement is used to measure the similarity between the query word and terms as well as the documents. This research uses Malay Language Test Collection consisting of 210 Malay documents, queries, relevant judgment and Malay stemmer to stem Malay terms. Results and analyses show that, LSI retrieval method outperformed the exact term-matching technique despite the longer processing time it took during the indexing. The best result for retrieval effectiveness for Malay documents in this domain is achieved when k-dimension is 4 and the threshold value is 0.8, which is 80.2 percent.
Keywords
indexing; information retrieval; natural language processing; singular value decomposition; word processing; Malay language document retrieval; Malay language test collection; Malay stemmer; exact term matching technique; latent semantic indexing; singular value decomposition; DSL; Decision support systems; Latent Semantic Analysis; Latent Semantic Indexing; Malay Information Retrieval;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Technology (ITSim), 2010 International Symposium in
Conference_Location
Kuala Lumpur
ISSN
2155-897
Print_ISBN
978-1-4244-6715-0
Type
conf
DOI
10.1109/ITSIM.2010.5561613
Filename
5561613
Link To Document