DocumentCode
3082220
Title
Entity Relation Extraction from geological text using Conditional Random Fields and subsequence kernels
Author
Sobhana, N.V. ; Ghosh, Soumya K. ; MITRA, PINAKI
Author_Institution
Indian Inst. of Technol., Kharagpur, Kharagpur, India
fYear
2012
fDate
7-9 Dec. 2012
Firstpage
832
Lastpage
840
Abstract
An important research field in text mining is Entity Relation Extraction. Extracting various relations between geological entities is of immense benefit to developing intelligent search tools for geology researchers. In this paper Conditional Random Fields (CRFs) as well as sequence kernels are used for extracting relations between entities from a geological corpus. A geological corpus was developed from a collection of scientific reports and articles on the geology of the Indian subcontinent. The training set, consisting of more than 200K words, has been annotated with a named entity tag set of seventeen tags and with labeled instances of part-of and nearby relations. The system is able to recognize part-of and near-by relations with 71.57% and 77.27% F-measure values for T-CRF, and 78.25% and 83.71% for subsequence kernels. The extracted relations were used for query expansion in a retrieval system to achieve a gain of 10.86% for T-CRF, and 10.58% for subsequence kernels over the baseline Mean Average Precision.
Keywords
data mining; geographic information systems; query processing; text analysis; F-measure values; Indian subcontinent geology; T-CRF; baseline mean average precision; conditional random fields; entity relation extraction; geological corpus; geological text; intelligent search tools; query expansion; retrieval system; scientific reports collection; sequence kernels; subsequence kernels; text mining; Feature extraction; Geology; Kernel; Labeling; Semantics; Training; Weight measurement; F-measure; Geological corpus; Mean Average Precision; Precision; Recall;
fLanguage
English
Publisher
ieee
Conference_Titel
India Conference (INDICON), 2012 Annual IEEE
Conference_Location
Kochi
Print_ISBN
978-1-4673-2270-6
Type
conf
DOI
10.1109/INDCON.2012.6420733
Filename
6420733
Link To Document