• Title of article

    Machine learning of syntactic parse trees for search and classification of text

  • Author/Authors

    Boris Galitsky، نويسنده , , Boris، نويسنده ,

  • Pages
    20
  • From page
    1072
  • To page
    1091
  • Abstract
    We build an open-source toolkit which implements deterministic learning to support search and text classification tasks. We extend the mechanism of logical generalization towards syntactic parse trees and attempt to detect weak semantic signals from them. Generalization of syntactic parse tree as a syntactic similarity measure is defined as the set of maximum common sub-trees and performed at a level of paragraphs, sentences, phrases and individual words. We analyze semantic features of such similarity measure and compare it with semantics of traditional anti-unification of terms. Nearest-neighbor machine learning is then applied to relate a sentence to a semantic class. Using syntactic parse tree-based similarity measure instead of bag-of-words and keyword frequency approach, we expect to detect a weak semantic signal otherwise unobservable. The proposed approach is evaluated in a four distinct domains where a lack of semantic information makes classification of sentences rather difficult. We describe a toolkit which is a part of Apache Software Foun-dation project OpenNLP, designed to aid search engineers in tasks requiring text relevance assessment.
  • Keywords
    Machine Learning , Parse trees , Text classification , Text search
  • Journal title
    Astroparticle Physics
  • Record number

    2047740