• DocumentCode
    3762312
  • Title

    Deriving labeled training data for topic link detection by alternating words

  • Author

    Marc W. Abel;Soon M. Chung

  • Author_Institution
    Dept. of Computer Science and Engineering, Wright State University, Dayton, Ohio 45435, USA
  • fYear
    2015
  • Firstpage
    83
  • Lastpage
    88
  • Abstract
    Although classifiers can be trained to estimate whether two short text segments relate to a common topic, obtaining training data for supervised learning presents a hurdle. The natural approach would be to train with topic-aligned pairs of text segments from a large corpus, but nothing is available to locate such alignments. We offer that simply partitioning the words of a large document according to their odd and even positions will yield training data suitable for certain applications and sets of features. The reason is that the partitioned texts are topic-aligned along their respective lengths despite sharing no original word instances. We further show that parametrically introducing a small amount of overlap into the partitioned texts can greatly improve the precision of a classifier.
  • Keywords
    "Training data","Supervised learning","Software engineering","Backpropagation","Magnetic resonance","Image segmentation","Data visualization"
  • Publisher
    ieee
  • Conference_Titel
    Data and Software Engineering (ICoDSE), 2015 International Conference on
  • Print_ISBN
    978-1-4673-8428-5
  • Type

    conf

  • DOI
    10.1109/ICODSE.2015.7436976
  • Filename
    7436976