DocumentCode
2594519
Title
CASIT: Content Based Identification of Textual Information in a Large Database
Author
Guezouli, Larbi ; Essafi, Hassane
Author_Institution
Comput. Sci. Dept., Batna Univ., Batna, Algeria
fYear
2010
fDate
20-23 April 2010
Firstpage
621
Lastpage
625
Abstract
This paper describes CASIT model (CAlculation of SImilarity of Text). Starting from a coarse confrontation of text documents, based on the Latent Semantic Indexing model (LSI), CASIT method calculates in a finer way, the rate of similarity between model documents of text and others which are confronted to them. Our approach takes into account the neighbourhood of the words, which makes it possible to balance the words in the calculation of the score.
Keywords
text analysis; CASIT model; calculation of similarity of text; content based identification; latent semantic indexing model; text documents; textual information; Application software; Computer science; Conferences; Databases; Filters; Frequency; Indexing; Information retrieval; Large scale integration; Matrix decomposition; CASIT; Component; LSI; textual research; vectorial model;
fLanguage
English
Publisher
ieee
Conference_Titel
Advanced Information Networking and Applications Workshops (WAINA), 2010 IEEE 24th International Conference on
Conference_Location
Perth, WA
Print_ISBN
978-1-4244-6701-3
Type
conf
DOI
10.1109/WAINA.2010.133
Filename
5480625
Link To Document