• DocumentCode
    1909052
  • Title

    Building New Field Association Word Candidates Automatically Using Search Engine

  • Author

    Atlam, Elsayed ; Elmarhomy, Ghada ; Morita, Kazuhiro ; Fuketa, Masao ; Aoe, Jun-Ichi

  • Author_Institution
    Dept. of Inf. Sci. & Intell. Syst., Tokushima Univ., Tokushima
  • fYear
    2007
  • fDate
    Aug. 30 2007-Sept. 1 2007
  • Firstpage
    22
  • Lastpage
    27
  • Abstract
    With increasing popularity of the Internet and tremendous amount of on-line text, automatic document classification is important for organizing huge amounts of data. Readers can know the subject of many document fields by reading only some specific Field Association (FA) words. Document fields can be decided efficiently if there are many FA words and if the frequency rate is high. This paper proposes a method for automatically building new FA words. A WWW search engine is used to extract FA word candidates from document corpora. New FA word candidates in each field are automatically compared with previously determined FA words. Then new FA words are appended to an FA word dictionary. From the experiential results, our new system can automatically appended around 44% of new FA words to the existence FA word Dictionary. Moreover, the concentration ratio 0.9 is also effective for extracting relevant FA words that needed for the system design to build FA words automatically.
  • Keywords
    Internet; classification; information retrieval; search engines; Internet; automatic document classification; data organization; field association word candidates; search engine; Dictionaries; Frequency; Information science; Intelligent structures; Intelligent systems; Internet; Nominations and elections; Organizing; Search engines; World Wide Web;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2007. NLP-KE 2007. International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-1610-3
  • Electronic_ISBN
    978-1-4244-1611-0
  • Type

    conf

  • DOI
    10.1109/NLPKE.2007.4368006
  • Filename
    4368006