• DocumentCode
    256517
  • Title

    Studying the effects of conflicting tokenization on LSA dimension reduction

  • Author

    Fahsi, Mahmoud ; Benslimane, Sidi Mohamed

  • Author_Institution
    Comput. Sci. Dept., Djillali Liabes Univ., Sidi Bel Abbes, Algeria
  • fYear
    2014
  • fDate
    14-16 April 2014
  • Firstpage
    542
  • Lastpage
    546
  • Abstract
    With the growing needs of dimension reduction for term selection and recommendation and the up to date trends in natural language processing modules integrated in existing architectures and multiple semantic web system such as search engine. The existence of multiples tokenization techniques of the same text represents a persistent problem in current semantic search engine practice and create a non-trivial problem the query expansion and their efficiency in general. In this work we try to study the effect of the tokenization technique in context of query expansion terms selection within a statistical latent semantic indexing. Finally we talk about the results from a corpuslinguistic point of view.
  • Keywords
    indexing; natural language processing; query processing; search engines; semantic Web; statistical analysis; LSA dimension reduction; conflicting tokenization; multiples tokenization technique; natural language processing module; query expansion; search engine; semantic Web system; statistical latent semantic indexing; Context; Indexing; Information retrieval; Large scale integration; Matrix decomposition; Semantics; Vectors; Dimension Reduction; LSA; Query Expansion; Tokenisation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Multimedia Computing and Systems (ICMCS), 2014 International Conference on
  • Conference_Location
    Marrakech
  • Print_ISBN
    978-1-4799-3823-0
  • Type

    conf

  • DOI
    10.1109/ICMCS.2014.6911367
  • Filename
    6911367