DocumentCode :
3628500
Title :
A generic method for multi word extraction from Wikipedia
Author :
Bozo Bekavac;Marko Tadic
Author_Institution :
Faculty of Humanities and Social Sciences, University of Zagreb, Ivana Lu?i?a 3, 10000, Croatia
fYear :
2008
fDate :
6/1/2008 12:00:00 AM
Firstpage :
663
Lastpage :
668
Abstract :
This paper presents the generic method for multiword expression extraction from Wikipedia. The method is using the properties of this specific encyclopedic genre in its HTML format and it relies on the intention of the authors of articles to link to other articles. The relevant links were processed by applying local regular grammars within the NooJ development environment. We tested the method on a Croatian version of Wikipedia and we present the results obtained.
Keywords :
"Internet","Encyclopedias","Information services","Electronic publishing","HTML","Filtering","Artificial neural networks"
Publisher :
ieee
Conference_Titel :
Information Technology Interfaces, 2008. ITI 2008. 30th International Conference on
ISSN :
1330-1012
Print_ISBN :
978-953-7138-12-7
Type :
conf
DOI :
10.1109/ITI.2008.4588490
Filename :
4588490
Link To Document :
بازگشت