DocumentCode :
2313427
Title :
Improving the retrieval performance by using distance-based bigram
Author :
Aimmanee, P. ; Theeramunkong, T.
Author_Institution :
Sirindhorn Int. Inst. of Technol., Thammasat Univ., Patumthani
fYear :
2009
fDate :
6-9 May 2009
Firstpage :
744
Lastpage :
747
Abstract :
In this paper, we discussed a new scheme of forming and weighing a term called a distance-based bigram. In this scheme, the distance between two words is considered for a new term and a new weighting. This new scheme is applied to the vector formation in the process of the vector space model with other standard term forming schemes: unigram and bigram. The tested domains are English and Thai medical corpora. The results show that our proposed method performs well for the Thai corpus under the condition that only a few returned documents are needed. Within the first ten percent of recall, our method improves the precision over the standard unigram by nearly 30%.
Keywords :
document handling; information retrieval; natural languages; Thai medical corpora; distance-based bigram; information retrieval; vector formation; Biomedical engineering; Computer networks; Computer science education; Diseases; IP networks; Information retrieval; Internet; Medical tests; Natural languages; Space technology; bigrams; distance-based bigram.; information retrieval; unigrams; vector space model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, 2009. ECTI-CON 2009. 6th International Conference on
Conference_Location :
Pattaya, Chonburi
Print_ISBN :
978-1-4244-3387-2
Electronic_ISBN :
978-1-4244-3388-9
Type :
conf
DOI :
10.1109/ECTICON.2009.5137154
Filename :
5137154
Link To Document :
بازگشت