• DocumentCode
    3490413
  • Title

    Culturomics on a Bengali Newspaper Corpus

  • Author

    Phani, S. ; Lahiri, S. ; Biswas, Arijit

  • Author_Institution
    Dept. of IT, BESU, Howrah, India
  • fYear
    2012
  • fDate
    13-15 Nov. 2012
  • Firstpage
    237
  • Lastpage
    240
  • Abstract
    We introduce culturomic studies on a leading Bengali newspaper corpus - Ananda Bazar Patrika, in the same spirit as [15]. Based on 11 years´ worth of Bengali newswire text, we are able to extract trajectories of salient words that are of importance in contemporary West Bengal. To the best of our knowledge, this is the first time a culturomic trend analysis is being performed on an Indic language. As a result of our analysis, we obtain interesting insights into word usage and cultural shift in contemporary West Bengal. Moreover, we model culturomic trajectories using ARIMA and obtain word usage predictions that closely follow actual usage patterns.
  • Keywords
    autoregressive moving average processes; cultural aspects; humanities; natural language processing; publishing; text analysis; word processing; ARIMA process; Ananda Bazar Patrika; Bengali newspaper corpus; Bengali newswire text; Indic language; West Bengal; cultural shift; culturomic trajectory model; culturomic trend analysis; salient word trajectory extraction; word usage predictions; Google; Market research; Nominations and elections; Predictive models; Smoothing methods; Time series analysis; Trajectory; ARIMA; Ananda Bazar Patrika; Bengali; Indic language; culture shift; culturomics; time series; trend analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Asian Language Processing (IALP), 2012 International Conference on
  • Conference_Location
    Hanoi
  • Print_ISBN
    978-1-4673-6113-2
  • Electronic_ISBN
    978-0-7695-4886-9
  • Type

    conf

  • DOI
    10.1109/IALP.2012.68
  • Filename
    6473740