DocumentCode
3563723
Title
Construction of concept network from large numbers of texts for information examination using TF-IDF and deletion of unrelated words
Author
Doen, Yuta ; Murata, Masaki ; Otake, Ryuta ; Tokuhisa, Masato ; Qing Ma
Author_Institution
Dept. of Inf. & Electron., Tottori Univ., Tottori, Japan
fYear
2014
Firstpage
1108
Lastpage
1113
Abstract
We propose new methods to construct a network that describes information about the relations of things that are related to a certain keyword from electronic texts. The proposed method has two characteristics (TF-IDF and deletion of unrelated words). We extract related words using a term frequency-inverse document frequency (TF-IDF)-based method. Using TF-IDF, we extract only important words. We use TF-IDF as a weight for an edge in a network. We also delete unrelated words in the network. When expanding a network and adding words, unrelated words are likely to be added. The proposed system deletes such unrelated words using two methods, the topic-restricted and topic-related methods. We have experimentally confirmed that the proposed TF-IDF-based related word extraction method obtains better results than a method that uses conditional probabilities to extract related words. We also conducted experiments to verify the effectiveness of deleting unrelated words. We found that the topic-restricted method could delete most unrelated words and maintain approximately 0.8 of the related words from the original network. The topic-related method can delete some unrelated words and maintain most related words from the original network.
Keywords
probability; text analysis; TF-IDF; concept network construction; conditional probabilities; electronic text; information examination; term frequency-inverse document frequency based method; thing relation; topic-related method; topic-restricted method; unrelated word deletion; word extraction; Data mining; Earthquakes; Equations; Power generation; Semantics; Tsunami; Web sites;
fLanguage
English
Publisher
ieee
Conference_Titel
Soft Computing and Intelligent Systems (SCIS), 2014 Joint 7th International Conference on and Advanced Intelligent Systems (ISIS), 15th International Symposium on
Type
conf
DOI
10.1109/SCIS-ISIS.2014.7044701
Filename
7044701
Link To Document