DocumentCode
3762312
Title
Deriving labeled training data for topic link detection by alternating words
Author
Marc W. Abel;Soon M. Chung
Author_Institution
Dept. of Computer Science and Engineering, Wright State University, Dayton, Ohio 45435, USA
fYear
2015
Firstpage
83
Lastpage
88
Abstract
Although classifiers can be trained to estimate whether two short text segments relate to a common topic, obtaining training data for supervised learning presents a hurdle. The natural approach would be to train with topic-aligned pairs of text segments from a large corpus, but nothing is available to locate such alignments. We offer that simply partitioning the words of a large document according to their odd and even positions will yield training data suitable for certain applications and sets of features. The reason is that the partitioned texts are topic-aligned along their respective lengths despite sharing no original word instances. We further show that parametrically introducing a small amount of overlap into the partitioned texts can greatly improve the precision of a classifier.
Keywords
"Training data","Supervised learning","Software engineering","Backpropagation","Magnetic resonance","Image segmentation","Data visualization"
Publisher
ieee
Conference_Titel
Data and Software Engineering (ICoDSE), 2015 International Conference on
Print_ISBN
978-1-4673-8428-5
Type
conf
DOI
10.1109/ICODSE.2015.7436976
Filename
7436976
Link To Document