• DocumentCode
    3563723
  • Title

    Construction of concept network from large numbers of texts for information examination using TF-IDF and deletion of unrelated words

  • Author

    Doen, Yuta ; Murata, Masaki ; Otake, Ryuta ; Tokuhisa, Masato ; Qing Ma

  • Author_Institution
    Dept. of Inf. & Electron., Tottori Univ., Tottori, Japan
  • fYear
    2014
  • Firstpage
    1108
  • Lastpage
    1113
  • Abstract
    We propose new methods to construct a network that describes information about the relations of things that are related to a certain keyword from electronic texts. The proposed method has two characteristics (TF-IDF and deletion of unrelated words). We extract related words using a term frequency-inverse document frequency (TF-IDF)-based method. Using TF-IDF, we extract only important words. We use TF-IDF as a weight for an edge in a network. We also delete unrelated words in the network. When expanding a network and adding words, unrelated words are likely to be added. The proposed system deletes such unrelated words using two methods, the topic-restricted and topic-related methods. We have experimentally confirmed that the proposed TF-IDF-based related word extraction method obtains better results than a method that uses conditional probabilities to extract related words. We also conducted experiments to verify the effectiveness of deleting unrelated words. We found that the topic-restricted method could delete most unrelated words and maintain approximately 0.8 of the related words from the original network. The topic-related method can delete some unrelated words and maintain most related words from the original network.
  • Keywords
    probability; text analysis; TF-IDF; concept network construction; conditional probabilities; electronic text; information examination; term frequency-inverse document frequency based method; thing relation; topic-related method; topic-restricted method; unrelated word deletion; word extraction; Data mining; Earthquakes; Equations; Power generation; Semantics; Tsunami; Web sites;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Soft Computing and Intelligent Systems (SCIS), 2014 Joint 7th International Conference on and Advanced Intelligent Systems (ISIS), 15th International Symposium on
  • Type

    conf

  • DOI
    10.1109/SCIS-ISIS.2014.7044701
  • Filename
    7044701