• DocumentCode
    3597161
  • Title

    A new method for construction filed association terms using co-occurrence words and declinable words information

  • Author

    Atlam, El-Sayed ; Fuketa, M. ; Kashiji, S. ; Nakata, H. ; Aoe, Jun-Ichi

  • Author_Institution
    Dept. of Inf. Sci. & Intelligent Syst., Tokushima Univ., Japan
  • Volume
    4
  • fYear
    2002
  • Abstract
    Readers can know the subject of many document fields by reading only some specific words called field association (FA) terms. It is very important to construct these FA terms to decide correctly the document fields from few word information in part of the file. The field can be decided efficiently if the number of these FA terms is many and the frequency rate is high. If the number of level 1 (words that directly connect to terminal fields) FA words is limited, old methods cannot determine the documents filed easily and fast, specially when there is a small number of corpus documents. This paper proposes a new method for deciding FA terms using the weight of co-occurrence words and declinable words which are related to the narrow association category with eliminating FA terms´ ambiguity. Moreover, efficient FA terms are difficult to be extracted only by the information of the frequency of them. This paper proposes a new efficient method using new co-occurrence word weighting which makes precision and recall higher than the case of degree of frequency.
  • Keywords
    classification; information retrieval; natural languages; text analysis; vocabulary; association category; classification; co-occurrence words; corpus documents; declinable words; document field subject; filed association terms; information retrieval; keywords; precision; recall; word weighting; Data mining; Frequency; Humans; Information retrieval; Information science; Intelligent systems; Research and development;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems, Man and Cybernetics, 2002 IEEE International Conference on
  • ISSN
    1062-922X
  • Print_ISBN
    0-7803-7437-1
  • Type

    conf

  • DOI
    10.1109/ICSMC.2002.1173247
  • Filename
    1173247