DocumentCode
1791683
Title
Pairwise Topic Model via relation extraction
Author
Xiaoli Song ; Yue Shang ; Yuan Ling ; Mengwen Liu ; Xiaohua Hu
Author_Institution
Coll. of Comput. & Inf., Drexel Univ., Philadelphia, PA, USA
fYear
2014
fDate
27-30 Oct. 2014
Firstpage
96
Lastpage
103
Abstract
Topic modeling is a powerful tool to model documents to find their underlying topics. However, the unstructured nature of the raw text makes it hard to model the semantic relationship between the text units, which may be the words, phrases or sentences, and thus even harder to model their corresponding underlying topics. In our work, we try to examine the pairwise relationship of the underlying topics through relation extraction. We first extract the entity pairs within one relation tuple out of the raw text. Then, we model the relationship between the entity pairs by adding the dependencies between entities and their corresponding topics. We propose six different versions of Pairwise Topic Model (PTM) to simultaneously discover the latent topics and their pairwise relationship. The experiment on four data sets (AP news articles, DUC 2004 task2, Clinical Notes and Neuroscience Papers) shows the PTM models are better-structured language model than the traditional topic model Latent Dirichlet Allocation (LDA). Also, empirical results show that the proposed Pairwise Topic Models (PTMs) can explicitly explain how two topics are related.
Keywords
text analysis; LDA; PTM; documents modeling; entity pairs extraction; latent Dirichlet allocation; latent topics; pairwise relationship; pairwise topic model; phrases; raw text relation tuple; relation extraction; semantic relationship; sentences; structured language model; text units; words; Data mining; Data models; Data structures; Educational institutions; Hidden Markov models; Joints; Syntactics; Pairwise Topic Modeling; Relation Extraction; Structured Data;
fLanguage
English
Publisher
ieee
Conference_Titel
Big Data (Big Data), 2014 IEEE International Conference on
Conference_Location
Washington, DC
Type
conf
DOI
10.1109/BigData.2014.7004362
Filename
7004362
Link To Document