Title :
On Modeling and Querying of Text Corpora
Author :
Dingjia Liu ; Guohua Liu ; Yuanyuan Liu
Author_Institution :
Sch. of Inf. Sci. & Eng., Yanshan Univ., Qinhuangdao, China
Abstract :
This article proposes a novel data model for text corpora and discusses the issues on corpus query. First, a formalized definition of the corpus data is presented. Second, a data model is proposed in terms of the relational model, which is also proved to be complete. On this basis, we extend the query semantics of the traditional corpus query that generates KWIC (Keyword in Context) concordances and define the query problems. Finally, we investigate the data complexity of these querying problems and an experiment is also presented. These conclusions lay a theoretical foundation for the study of the modeling and querying of text corpora.
Keywords :
query processing; text analysis; KWIC; corpus data; corpus query; formalized definition; keyword in context; query problems; query semantics; text corpora modeling; text corpora querying; Calculus; Complexity theory; Data models; Databases; Educational institutions; Pragmatics; Semantics; corpus; data complexity; data mode; query; relational model;
Conference_Titel :
Computational Intelligence and Security (CIS), 2014 Tenth International Conference on
Conference_Location :
Kunming
Print_ISBN :
978-1-4799-7433-7
DOI :
10.1109/CIS.2014.37