• DocumentCode
    3423307
  • Title

    Generating a Topic Hierarchy from Dialect Texts

  • Author

    De Smet, W. ; Moens, Marie-Francine

  • Author_Institution
    ICRI, Leuven
  • fYear
    2007
  • fDate
    3-7 Sept. 2007
  • Firstpage
    249
  • Lastpage
    253
  • Abstract
    We built a system for the automatic creation of a text- based topic hierarchy, meant to be used in a geographically defined community. This poses two main problems. First, the appearance of both standard language and a community-related dialect, demanding that dialect words should be as much as possible corrected to standard words, and second, the automatic hierarchic clustering of texts by their topic. The problem of correcting dialect words is dealt with by performing a nearest neighbor search over a dynamic set of known words, using a set of transition rules from dialect to standard words, which are learned from a parallel corpus. We solve the clustering problem by implementing a hierarchical co-clustering algorithm that automatically generates a topic hierarchy of the collection and simultaneously groups documents and words into clusters.
  • Keywords
    natural language processing; text analysis; automatic hierarchic clustering; community-related dialect; dialect texts; dialect words; geographically defined community; hierarchical coclustering algorithm; standard language; text-based topic hierarchy; Application software; Cities and towns; Clustering algorithms; Computer science; Databases; Dictionaries; Document handling; Expert systems; Nearest neighbor searches; Text recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Database and Expert Systems Applications, 2007. DEXA '07. 18th International Workshop on
  • Conference_Location
    Regensburg
  • ISSN
    1529-4188
  • Print_ISBN
    978-0-7695-2932-5
  • Type

    conf

  • DOI
    10.1109/DEXA.2007.149
  • Filename
    4312895