• DocumentCode
    2447492
  • Title

    Model of Data Gathering and Processing on Tibetan and Uyghur Language

  • Author

    Weng, Yu ; Jia, Hanxin ; Ma, Qingli

  • Author_Institution
    Coll. of Inf. Eng., Minzu Univ. of China, Beijing, China
  • fYear
    2012
  • fDate
    1-3 Nov. 2012
  • Firstpage
    264
  • Lastpage
    266
  • Abstract
    A model of web data gathering and processing on Tibetan and Uyghur language is introduced in this paper, including page crawler, content extraction, word segmentation and frequency statistics and display. Firstly, It extracts the website\´s templates and use the template to extract the content and title of the web page, then the software transforms the HTML file to the XML file. The second step is to segment the content of XML file into words and to count the number of words, in order to store the statistics into database. Finally", "there is a web page to display the the result of the frequency statistics.
  • Keywords
    Web sites; XML; data handling; hypermedia markup languages; natural language processing; HTML; Tibetan language; Uyghur language; Web data gathering; Web page; Website templates; XML; content extraction; data processing; frequency statistics; page crawler; word segmentation; Data mining; Data models; Databases; Java; Transforms; Web pages; XML; Data Processing; Data gathering; Tibetan and Uyghur language;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Networks and Intelligent Systems (ICINIS), 2012 Fifth International Conference on
  • Conference_Location
    Tianjin
  • Print_ISBN
    978-1-4673-3083-1
  • Type

    conf

  • DOI
    10.1109/ICINIS.2012.81
  • Filename
    6376538