DocumentCode
2447492
Title
Model of Data Gathering and Processing on Tibetan and Uyghur Language
Author
Weng, Yu ; Jia, Hanxin ; Ma, Qingli
Author_Institution
Coll. of Inf. Eng., Minzu Univ. of China, Beijing, China
fYear
2012
fDate
1-3 Nov. 2012
Firstpage
264
Lastpage
266
Abstract
A model of web data gathering and processing on Tibetan and Uyghur language is introduced in this paper, including page crawler, content extraction, word segmentation and frequency statistics and display. Firstly, It extracts the website\´s templates and use the template to extract the content and title of the web page, then the software transforms the HTML file to the XML file. The second step is to segment the content of XML file into words and to count the number of words, in order to store the statistics into database. Finally", "there is a web page to display the the result of the frequency statistics.
Keywords
Web sites; XML; data handling; hypermedia markup languages; natural language processing; HTML; Tibetan language; Uyghur language; Web data gathering; Web page; Website templates; XML; content extraction; data processing; frequency statistics; page crawler; word segmentation; Data mining; Data models; Databases; Java; Transforms; Web pages; XML; Data Processing; Data gathering; Tibetan and Uyghur language;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Networks and Intelligent Systems (ICINIS), 2012 Fifth International Conference on
Conference_Location
Tianjin
Print_ISBN
978-1-4673-3083-1
Type
conf
DOI
10.1109/ICINIS.2012.81
Filename
6376538
Link To Document