DocumentCode :
4579
Title :
An HMM-Based Algorithm for Content Ranking and Coherence-Feature Extraction
Author :
Chien-Liang Liu ; Wen-Hoar Hsaio ; Chia-Hoang Lee ; Hsiao-Cheng Chi
Author_Institution :
Dept. of Comput. Sci., Nat. Chiao Tung Univ., Hsinchu, Taiwan
Volume :
43
Issue :
2
fYear :
2013
fDate :
Mar-13
Firstpage :
440
Lastpage :
450
Abstract :
In this paper, we propose an algorithm called coherence hidden Markov model (HMM) to extract coherence features and rank content. Coherence HMM is a variant of HMM and is used to model the stochastic process of essay writing and identify topics as hidden states, given sequenced clauses as observations. This study uses probabilistic latent semantic analysis for parameter estimation of coherence HMM. In coherence-feature extraction, support vector regression (SVR) with surface features and coherence features is used for essay grading. The experimental results indicate that SVR can benefit from coherence features. The adjacent agreement rate and the exact agreement rate are 95.24% and 59.80%, respectively. Moreover, this study submits high-scoring essays to the same experiment and finds that the adjacent agreement rate and exact agreement rate are 98.33% and 64.50%, respectively. In content ranking, we design and implement an intelligent assisted blog writing system based on the coherence-HMM ranking model. Several corpora are employed to help users efficiently compose blog articles. When users finish composing a clause or sentence, the system provides candidate texts for their reference based on current clause or sentence content. The experimental results demonstrate that all participants can benefit from the system and save considerable time on writing articles.
Keywords :
Web sites; content management; feature extraction; hidden Markov models; parameter estimation; probability; regression analysis; support vector machines; HMM-based algorithm; SVR; adjacent agreement rate; article writing; coherence hidden Markov model; coherence-feature extraction; content ranking; essay grading; essay writing; exact agreement rate; high-scoring essays; intelligent assisted blog writing system; parameter estimation; probabilistic latent semantic analysis; sequenced clause; stochastic process; support vector regression; surface features; topic identification; Blogs; Coherence; Feature extraction; Hidden Markov models; Indexes; Parameter estimation; Writing; Coherence-feature extraction; hidden Markov model (HMM); input devices and strategies; natural language processing (NLP); predictive content;
fLanguage :
English
Journal_Title :
Systems, Man, and Cybernetics: Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
2168-2216
Type :
jour
DOI :
10.1109/TSMCA.2012.2207104
Filename :
6408207
Link To Document :
بازگشت