DocumentCode :
1787411
Title :
Mining Semantic Structures from Syntactic Structures in Free Text Documents
Author :
Mousavi, Hojjat ; Kerr, Donald ; Iseli, Markus ; Zaniolo, Carlo
fYear :
2014
fDate :
16-18 June 2014
Firstpage :
84
Lastpage :
91
Abstract :
The Web has made possible many advanced text-mining applications, such as news summarization, essay grading, question answering, and semantic search. For many of such applications, statistical text-mining techniques are ineffective since they do not utilize the morphological structure of the text. Thus, many approaches use NLP-based techniques, that parse the text and use patterns to mine and analyze the parse trees which are often unnecessarily complex. Therefore, we propose a weighted-graph representation of text, called Text Graphs, which captures the grammatical and semantic relations between words and terms in the text. Text Graphs are generated using a new text mining framework which is the main focus of this paper. Our framework, SemScape, uses a statistical parser to generate few of the most probable parse trees for each sentence and employs a novel two-step pattern-based technique to extract from parse trees candidate terms and their grammatical relations. Moreover, SemScape resolves co references by a novel technique, generates domain-specific Text Graphs by consulting ontologies, and provides a SPARQL-like query language and an optimized engine for semantically querying and mining Text Graphs.
Keywords :
data mining; graph theory; ontologies (artificial intelligence); query languages; query processing; statistical analysis; text analysis; NLP-based techniques; SPARQL-like query language; SemScape statistical parser; essay grading; free text documents; grammatical relations; natural language processing; news summarization; ontologies; parse trees; question answering; semantic querying; semantic relations; semantic search; semantic structure mining; syntactic structures; text graphs; text mining applications; text morphological structure; two-step pattern-based technique; weighted-graph text representation; Encyclopedias; Ontologies; Pattern matching; Semantics; Text mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Semantic Computing (ICSC), 2014 IEEE International Conference on
Conference_Location :
Newport Beach, CA
Print_ISBN :
978-1-4799-4002-8
Type :
conf
DOI :
10.1109/ICSC.2014.31
Filename :
6882005
Link To Document :
بازگشت