DocumentCode
256517
Title
Studying the effects of conflicting tokenization on LSA dimension reduction
Author
Fahsi, Mahmoud ; Benslimane, Sidi Mohamed
Author_Institution
Comput. Sci. Dept., Djillali Liabes Univ., Sidi Bel Abbes, Algeria
fYear
2014
fDate
14-16 April 2014
Firstpage
542
Lastpage
546
Abstract
With the growing needs of dimension reduction for term selection and recommendation and the up to date trends in natural language processing modules integrated in existing architectures and multiple semantic web system such as search engine. The existence of multiples tokenization techniques of the same text represents a persistent problem in current semantic search engine practice and create a non-trivial problem the query expansion and their efficiency in general. In this work we try to study the effect of the tokenization technique in context of query expansion terms selection within a statistical latent semantic indexing. Finally we talk about the results from a corpuslinguistic point of view.
Keywords
indexing; natural language processing; query processing; search engines; semantic Web; statistical analysis; LSA dimension reduction; conflicting tokenization; multiples tokenization technique; natural language processing module; query expansion; search engine; semantic Web system; statistical latent semantic indexing; Context; Indexing; Information retrieval; Large scale integration; Matrix decomposition; Semantics; Vectors; Dimension Reduction; LSA; Query Expansion; Tokenisation;
fLanguage
English
Publisher
ieee
Conference_Titel
Multimedia Computing and Systems (ICMCS), 2014 International Conference on
Conference_Location
Marrakech
Print_ISBN
978-1-4799-3823-0
Type
conf
DOI
10.1109/ICMCS.2014.6911367
Filename
6911367
Link To Document