DocumentCode :
3563723
Title :
Construction of concept network from large numbers of texts for information examination using TF-IDF and deletion of unrelated words
Author :
Doen, Yuta ; Murata, Masaki ; Otake, Ryuta ; Tokuhisa, Masato ; Qing Ma
Author_Institution :
Dept. of Inf. & Electron., Tottori Univ., Tottori, Japan
fYear :
2014
Firstpage :
1108
Lastpage :
1113
Abstract :
We propose new methods to construct a network that describes information about the relations of things that are related to a certain keyword from electronic texts. The proposed method has two characteristics (TF-IDF and deletion of unrelated words). We extract related words using a term frequency-inverse document frequency (TF-IDF)-based method. Using TF-IDF, we extract only important words. We use TF-IDF as a weight for an edge in a network. We also delete unrelated words in the network. When expanding a network and adding words, unrelated words are likely to be added. The proposed system deletes such unrelated words using two methods, the topic-restricted and topic-related methods. We have experimentally confirmed that the proposed TF-IDF-based related word extraction method obtains better results than a method that uses conditional probabilities to extract related words. We also conducted experiments to verify the effectiveness of deleting unrelated words. We found that the topic-restricted method could delete most unrelated words and maintain approximately 0.8 of the related words from the original network. The topic-related method can delete some unrelated words and maintain most related words from the original network.
Keywords :
probability; text analysis; TF-IDF; concept network construction; conditional probabilities; electronic text; information examination; term frequency-inverse document frequency based method; thing relation; topic-related method; topic-restricted method; unrelated word deletion; word extraction; Data mining; Earthquakes; Equations; Power generation; Semantics; Tsunami; Web sites;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Soft Computing and Intelligent Systems (SCIS), 2014 Joint 7th International Conference on and Advanced Intelligent Systems (ISIS), 15th International Symposium on
Type :
conf
DOI :
10.1109/SCIS-ISIS.2014.7044701
Filename :
7044701
Link To Document :
بازگشت