DocumentCode
134209
Title
Linear model incorporating feature ranking for Chinese documents readability
Author
Gang Sun ; Zhiwei Jiang ; Qing Gu ; Daoxu Chen
Author_Institution
State Key Lab. for Novel Software Technol., Nanjing Univ., Nanjing, China
fYear
2014
fDate
12-14 Sept. 2014
Firstpage
29
Lastpage
33
Abstract
Assessing the readability of documents is always a rewarding work. In this paper, we apply linear regression models for readability assessment of Chinese documents, and put forward LiFR (Linear model incorporating Feature Ranking), which uses feature ranking to select the most appropriate text features to build the linear model. Text features specialized for Chinese are developed, which include the surface, part of speech, parse tree and entropy features. The experimental results demonstrate that both linear and log-linear regression models are worthy of confidence for readability assessment, and can achieve competitive performance to other machine learning methods, such as SVR (Support Vector Machine for Regression). Also the designed features are valuable, and feature ranking is essential to build useful linear functions.
Keywords
learning (artificial intelligence); natural language processing; regression analysis; speech processing; support vector machines; Chinese document readability assessment; LiFR; SVR; entropy feature; feature ranking; log-linear regression model; machine learning; parse tree; part of speech; support vector machine; text features; Abstracts; History; Learning systems; Manganese; Measurement; Software; Training; Chinese; Feature Ranking; Linear Regression Models; Readability Assessment;
fLanguage
English
Publisher
ieee
Conference_Titel
Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
Conference_Location
Singapore
Type
conf
DOI
10.1109/ISCSLP.2014.6936601
Filename
6936601
Link To Document