Title :
A Method for Collecting Tibetan-Websites
Author :
Zhi-juan, Wang ; Xiao-bin, Zhao ; Rui, Yang
Author_Institution :
Nat. Language Resource Monitoring & Res. Center, Minzu Univ. of China, Beijing, China
Abstract :
Features of Tibetan-websites are analyzed first in this paper. Then, the method to collect Tibetan-websites is introduced in three steps: collect the web pages using Tibetan high-frequency words first, judge whether the web page is in Tibetan or not according to the frequency of Tibetan syllable dot in one web page, at last, find the URL of Tibetan-website using the URL of Tibetan web page. The method is proved to be efficient and fast in collecting Tibetan-websites. The Tibetan websites information collected using this method is already submitted to National Language Resource Monitoring & Research Center.
Keywords :
Web sites; natural languages; National Language Resource Monitoring & Research Center; Tibetan high-frequency words; Tibetan syllable dot; Tibetan web page URL; Tibetan-Websites collection method; Encoding; Equations; HTML; Internet; Mathematical model; Monitoring; Web pages; Tibetan-websites; web page collecting; web page language;
Conference_Titel :
Intelligent Networks and Intelligent Systems (ICINIS), 2011 4th International Conference on
Conference_Location :
Kunming
Print_ISBN :
978-1-4577-1626-3
DOI :
10.1109/ICINIS.2011.3