• DocumentCode
    1985492
  • Title

    N-gram and Local Context Analysis for Persian text retrieval

  • Author

    Aleahmad, Abolfazl ; Hakimian, Parsia ; Mahdikhani, Farzad ; Oroumchian, Farhad

  • Author_Institution
    Electr. & Comput. Eng. Dept., Univ. of Tehran, Tehran
  • fYear
    2007
  • fDate
    12-15 Feb. 2007
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    The Persian language is one of the languages in Middle-East, so there are significant amount of Persian documents available on the Web. But there are relatively few studies on retrieval of Persian documents in the literature. In this experimental study, we assessed term and N-gram based vector space model and a query expansion method, namely, local context analysis using different weighting schemes on a realistic corpus containing 160000+ news articles. Then we compared our results with previous works reported on Persian language. Our experimental results show that among the assessed methods, 4-gram based vector space model with Lnu.ltu weighting scheme has acceptable performance and Local context analysis has the best performance for Persian text retrieval so far.
  • Keywords
    query processing; text analysis; Middle-East; N-gram based vector space model; Persian documents; Persian text retrieval; local context analysis; query expansion method; weighting scheme; Context modeling; Encoding; Extraterrestrial measurements; Functional analysis; Fuzzy systems; Information retrieval; Natural languages; Performance analysis; Testing; Writing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing and Its Applications, 2007. ISSPA 2007. 9th International Symposium on
  • Conference_Location
    Sharjah
  • Print_ISBN
    978-1-4244-0778-1
  • Electronic_ISBN
    978-1-4244-1779-8
  • Type

    conf

  • DOI
    10.1109/ISSPA.2007.4555345
  • Filename
    4555345