Title :
Unsupervised construction of a word list on tourism from Wikipedia
Author :
Dittaya Wanvarie;Sansanee Ek-atchariya;Thanakon Kaewwipat
Author_Institution :
Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University
Abstract :
The demand for word lists in a specialized domain is increasing in language learning. We propose an unsupervised framework to extract a word list from Wikipedia data for a language learning class specialized on tourism. We extract topics in Wikipedia articles using non-negative matrix factorization. Each topic is classified as tourism related or not using articles in WikiVoyage. We choose paragraphs in Wikipedia that are classified as in-domain and rank words in such paragraphs by their frequencies. The proposed framework retrieves more than 90% of words in the gold list, but the extracted list still includes a large number of general terms.
Keywords :
"Yttrium","RNA","Chlorine","Integrated circuits"
Conference_Titel :
Computer Science and Engineering Conference (ICSEC), 2015 International
DOI :
10.1109/ICSEC.2015.7401412