• DocumentCode
    1865759
  • Title

    Discover Linguistic Patterns in Parsed Corpus with Frequent Subrtree Mining

  • Author

    Wang, Bo ; Zhao, Tiejun ; Yang, Muyun ; Li, Sheng

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., Harbin, China
  • fYear
    2010
  • fDate
    9-10 Jan. 2010
  • Firstpage
    86
  • Lastpage
    89
  • Abstract
    Recognition of special linguistic patterns in a certain language is very helpful for many NLP applications such as information extraction, machine translation and parsing. State-of-the-arts syntax parsers are based on given grammar. The used grammar is context free and cannot discover complex patterns which contain multiple linguistic units. We propose an unsupervised method to automatically discover the complex linguistic patterns from a classically parsed corpus. A specialized and efficient algorithm is applied to mine the frequent subtrees in the forest and the found subtrees are formalized as the linguistic patterns. The approach is validated on the Penn Chinese Treebank with found linguistic patterns.
  • Keywords
    context-free grammars; data mining; natural language processing; trees (mathematics); NLP applications; complex linguistic patterns; complex pattern discovery; context free grammar; discover linguistic patterns; frequent subrtree mining; information extraction; machine translation; parsed corpus; parsing; syntax parsers; Application software; Computer science; Data mining; Humans; Natural language processing; Natural languages; Neural networks; Pattern recognition; linguistic patterns; parsing; subtree mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Knowledge Discovery and Data Mining, 2010. WKDD '10. Third International Conference on
  • Conference_Location
    Phuket
  • Print_ISBN
    978-1-4244-5397-9
  • Electronic_ISBN
    978-1-4244-5398-6
  • Type

    conf

  • DOI
    10.1109/WKDD.2010.9
  • Filename
    5432720