• DocumentCode
    1910772
  • Title

    Clustering of Chinese Sentences Using the SMM Model

  • Author

    Du, Tiansang ; Xu, Xinying ; Chen, Liang ; Chang, Baobao

  • Author_Institution
    Peking Univ., Beijing
  • fYear
    2007
  • fDate
    Aug. 30 2007-Sept. 1 2007
  • Firstpage
    491
  • Lastpage
    496
  • Abstract
    The purpose of this article is to research the clustering method based on statistical model, then deal with the Chinese sentence clustering problem on bilingual lexicographical platform. In the view of cooccurrence data, we develop the Sentence Cluster Model as a multidimensional SMM, and get the solution of parameter estimation by EM algorithm. Based on this model, we represent three methods for sentence clustering, and use Rand index to evaluate our method through experiments on corpus with comparison to the k-means algorithm. We mainly discuss the result on aspect of word sense distinction, part-of-speech distinction and window size choosing.
  • Keywords
    computational linguistics; expectation-maximisation algorithm; natural language processing; pattern clustering; Chinese sentence clustering problem; EM algorithm; Rand index; SMM model; bilingual lexicographical platform; parameter estimation; part-of-speech distinction; sentence cluster model; statistical model; window size choosing; word sense distinction; Clustering algorithms; Clustering methods; Computational linguistics; Dictionaries; Iterative algorithms; Maximum likelihood estimation; Multidimensional systems; Parameter estimation; Partial response channels; Statistical distributions;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2007. NLP-KE 2007. International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-1611-0
  • Electronic_ISBN
    978-1-4244-1611-0
  • Type

    conf

  • DOI
    10.1109/NLPKE.2007.4368076
  • Filename
    4368076