• DocumentCode
    3601361
  • Title

    Discovering Latent Semantics in Web Documents Using Fuzzy Clustering

  • Author

    I-Jen Chiang ; Liu, Charles Chih-Ho ; Yi-Hsin Tsai ; Kumar, Ajit

  • Author_Institution
    Grad. Inst. of Biomed. Inf., Taipei Med. Univ., Taipei, Taiwan
  • Volume
    23
  • Issue
    6
  • fYear
    2015
  • Firstpage
    2122
  • Lastpage
    2134
  • Abstract
    Web documents are heterogeneous and complex. There exists complicated associations within one web document and linking to the others. The high interactions between terms in documents demonstrate vague and ambiguous meanings. Efficient and effective clustering methods to discover latent and coherent meanings in context are necessary. This paper presents a fuzzy linguistic topological space along with a fuzzy clustering algorithm to discover the contextual meaning in the web documents. The proposed algorithm extracts features from the web documents using conditional random field methods and builds a fuzzy linguistic topological space based on the associations of features. The associations of cooccurring features organize a hierarchy of connected semantic complexes called “CONCEPTS,” wherein a fuzzy linguistic measure is applied on each complex to evaluate 1) the relevance of a document belonging to a topic, and 2) the difference between the other topics. Web contents are able to be clustered into topics in the hierarchy depending on their fuzzy linguistic measures; web users can further explore the CONCEPTS of web contents accordingly. Besides the algorithm applicability in web text domains, it can be extended to other applications, such as data mining, bioinformatics, content-based, or collaborative information filtering, etc.
  • Keywords
    computational linguistics; document handling; feature extraction; fuzzy set theory; random processes; semantic Web; CONCEPTS; Web documents; conditional random field methods; connected semantic complexes; feature extraction; fuzzy clustering; fuzzy linguistic topological space; latent semantics; Clustering algorithms; Context; Data mining; Feature extraction; Neural networks; Pragmatics; Semantics; Fuzzy aggregation algorithm; fuzzy aggregation algorithm; fuzzy linguistic topological space; fuzzy semantic topology; fuzzy web hierarchical clustering; named entity recognition; named entity recognition (NER);
  • fLanguage
    English
  • Journal_Title
    Fuzzy Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-6706
  • Type

    jour

  • DOI
    10.1109/TFUZZ.2015.2403878
  • Filename
    7042824