Title :
A semisupervised associative classification method for POS tagging
Author :
Rani, Pratibha ; Pudi, Vikram ; Sharma, Dipti Misra
Author_Institution :
Int. Inst. of Inf. Technol., Hyderabad, India
Abstract :
We present here a data mining approach for part-of-speech (POS) tagging, an important Natural language processing (NLP) classification task. We propose a semi-supervised associative classification method for POS tagging. Existing methods for building POS taggers require extensive domain and linguistic knowledge and resources. Our method uses a combination of a small POS tagged corpus and untagged text data as training data to build the classifier model using association rules. Our tagger works well with very little training data also. The use of semi-supervised learning provides the advantage of not requiring a large high quality tagged corpus. These properties make it especially suitable for resource poor languages. Our experiments on various resource-rich, resource-moderate and resource-poor languages show good performance without using any language specific linguistic information. We note that inclusion of such features in our method may further improve the performance. Results also show that for smaller training data sizes our tagger performs better than state-of-the-art CRF tagger using same features as our tagger.
Keywords :
computational linguistics; data mining; learning (artificial intelligence); natural language processing; pattern classification; text analysis; CRF tagger; NLP classification task; POS tagged corpus; POS tagging; association rules; classifier model; data mining approach; linguistic knowledge; natural language processing classification task; part-of-speech tagging; resource-moderate languages; resource-poor languages; resource-rich languages; semisupervised associative classification method; semisupervised learning; training data; untagged text data; Accuracy; Association rules; Context; Hidden Markov models; Pragmatics; Tagging;
Conference_Titel :
Data Science and Advanced Analytics (DSAA), 2014 International Conference on
DOI :
10.1109/DSAA.2014.7058067