DocumentCode :
3051996
Title :
Labeling Turkish news stories with CRF
Author :
Kazkilinc, Seda ; Adali, Esref
Author_Institution :
Istanbul Tech. Univ., Istanbul, Turkey
fYear :
2013
fDate :
23-25 Oct. 2013
Firstpage :
1
Lastpage :
5
Abstract :
Drastically document increase in Web requires semantic web applications in order to lead the Web to its full potential. Extracting important phrases in a document facilitates finding expected information. In this paper, a new approach that is labeling the main subject, main predicate, main location and main date of an electronic document is introduced. The main subject label tells whom or what the document about. The main predicate label tells what the subject is or does. The main location label tells where the activities passed and the main date label tells when the document passed. With the help of this new methodology, extraction of not only high level description of the content, but also the attribute of a phrase in a document is provided. As experimental set, Turkish news stories are selected. To use as a training and test set, manual labeling is made by human annotators. Then, different models for each label are implemented to extract the labels automatically and they are compared to manually labeled results to evaluation process of this study.
Keywords :
document handling; information retrieval; natural language processing; semantic Web; statistical analysis; CRF; Turkish news stories; conditional random fields; electronic document; human annotators; label extraction; natural language processing; phrase extraction; semantic Web applications; Data mining; Equations; Feature extraction; Labeling; Machine learning algorithms; Mathematical model; Training; Conditional Random Fields; Information Extraction; Natural Language Processing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Application of Information and Communication Technologies (AICT), 2013 7th International Conference on
Conference_Location :
Baku
Print_ISBN :
978-1-4673-6419-5
Type :
conf
DOI :
10.1109/ICAICT.2013.6722634
Filename :
6722634
Link To Document :
بازگشت