Title :
Studying the effects of conflicting tokenization on LSA dimension reduction
Author :
Fahsi, Mahmoud ; Benslimane, Sidi Mohamed
Author_Institution :
Comput. Sci. Dept., Djillali Liabes Univ., Sidi Bel Abbes, Algeria
Abstract :
With the growing needs of dimension reduction for term selection and recommendation and the up to date trends in natural language processing modules integrated in existing architectures and multiple semantic web system such as search engine. The existence of multiples tokenization techniques of the same text represents a persistent problem in current semantic search engine practice and create a non-trivial problem the query expansion and their efficiency in general. In this work we try to study the effect of the tokenization technique in context of query expansion terms selection within a statistical latent semantic indexing. Finally we talk about the results from a corpuslinguistic point of view.
Keywords :
indexing; natural language processing; query processing; search engines; semantic Web; statistical analysis; LSA dimension reduction; conflicting tokenization; multiples tokenization technique; natural language processing module; query expansion; search engine; semantic Web system; statistical latent semantic indexing; Context; Indexing; Information retrieval; Large scale integration; Matrix decomposition; Semantics; Vectors; Dimension Reduction; LSA; Query Expansion; Tokenisation;
Conference_Titel :
Multimedia Computing and Systems (ICMCS), 2014 International Conference on
Conference_Location :
Marrakech
Print_ISBN :
978-1-4799-3823-0
DOI :
10.1109/ICMCS.2014.6911367