مرکز منطقه ای اطلاع رساني علوم و فناوري - Word-level information extraction from science and technology announcements corpus based on CRF

DocumentCode :

644014

Title :

Word-level information extraction from science and technology announcements corpus based on CRF

Author :

Yushu Cao ; Jun Wang ; Lei Li

Author_Institution :

Sch. of Eng. & Appl. Sci., Univ. of Pennsylvania, Philadelphia, PA, USA

Volume :

fYear :

2012

fDate :

Oct. 30 2012-Nov. 1 2012

Firstpage :

1529

Lastpage :

1533

Abstract :

Conditional Random Field (CRF) has been applied widely in information extraction and natural language processing. However, according to corpus types, it has not been made much use of on corpus about science and technology declarations. In this paper, we extract word-level information from amounts of science and technology announcements corpus, and analyze the performance of CRF, comparing with Naïve Bayes as a baseline. According to our experiments, we show that CRF has much high precision except for a few unknown data. Also, Naïve Bayes model is satisfactory in closed domains, but it always makes mistakes when the data belong to a less weighted class.

Keywords :

information resources; natural language processing; scientific information systems; text analysis; CRF; closed domains; conditional random field; naïve Bayes; natural language processing; science and technology announcements corpus; science and technology declarations; word-level information; word-level information extraction; Data mining; Data models; Hidden Markov models; Information retrieval; Niobium; Testing; Training; conditional random field; information extraction; naïve bayes; science and technology corpus; word-level;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Cloud Computing and Intelligent Systems (CCIS), 2012 IEEE 2nd International Conference on

Conference_Location :

Hangzhou

Print_ISBN :

978-1-4673-1855-6

Type :

conf

DOI :

10.1109/CCIS.2012.6664640

Filename :

6664640

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=644014