Title :
Rule-based and machine learning approach for event sentence extraction in Indonesian online news articles
Author :
Abidin, Taufik Fuadi ; Dimyathi, Rahmad ; Ferdhiana, Ridha
Author_Institution :
Dept. of Inf., Syiah Kuala Univ., Banda Aceh, Indonesia
Abstract :
With the rapid maturity of internet and web technology over the last decades, the number of Indonesian online news articles is growing rapidly on the web at a pace we never experienced before. In this paper, we introduce a combination of rule-based and machine learning approach to find the sentences that have tropical disease information in them, such as the incidence date and the number of casualty, and we measure its accuracy. Given a set of web pages in tropical disease topic, we first extract the sentences in the pages that match contextual and morphological patterns for a date and number of casualty using a rule-based algorithm. After that, we classify the sentences using Support Vector Machine and collect the sentences that have tropical disease information in them. The results show that the proposed method works well and has good accuracy.
Keywords :
Internet; Web sites; diseases; knowledge based systems; learning (artificial intelligence); medical computing; pattern classification; support vector machines; Indonesian online news articles; Internet; Web pages; Web technology; contextual pattern; event sentence extraction; machine learning approach; morphological pattern; rule-based approach; sentence classification; sentence extraction; support vector machine; tropical disease information; Accuracy; Data mining; Dictionaries; Diseases; Feature extraction; Kernel; Support vector machines; Event sentence extraction; accuracy measure; support vector machine;
Conference_Titel :
Information Technology Systems and Innovation (ICITSI), 2014 International Conference on
DOI :
10.1109/ICITSI.2014.7048232