DocumentCode :
256517
Title :
Studying the effects of conflicting tokenization on LSA dimension reduction
Author :
Fahsi, Mahmoud ; Benslimane, Sidi Mohamed
Author_Institution :
Comput. Sci. Dept., Djillali Liabes Univ., Sidi Bel Abbes, Algeria
fYear :
2014
fDate :
14-16 April 2014
Firstpage :
542
Lastpage :
546
Abstract :
With the growing needs of dimension reduction for term selection and recommendation and the up to date trends in natural language processing modules integrated in existing architectures and multiple semantic web system such as search engine. The existence of multiples tokenization techniques of the same text represents a persistent problem in current semantic search engine practice and create a non-trivial problem the query expansion and their efficiency in general. In this work we try to study the effect of the tokenization technique in context of query expansion terms selection within a statistical latent semantic indexing. Finally we talk about the results from a corpuslinguistic point of view.
Keywords :
indexing; natural language processing; query processing; search engines; semantic Web; statistical analysis; LSA dimension reduction; conflicting tokenization; multiples tokenization technique; natural language processing module; query expansion; search engine; semantic Web system; statistical latent semantic indexing; Context; Indexing; Information retrieval; Large scale integration; Matrix decomposition; Semantics; Vectors; Dimension Reduction; LSA; Query Expansion; Tokenisation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Multimedia Computing and Systems (ICMCS), 2014 International Conference on
Conference_Location :
Marrakech
Print_ISBN :
978-1-4799-3823-0
Type :
conf
DOI :
10.1109/ICMCS.2014.6911367
Filename :
6911367
Link To Document :
بازگشت