• DocumentCode
    134209
  • Title

    Linear model incorporating feature ranking for Chinese documents readability

  • Author

    Gang Sun ; Zhiwei Jiang ; Qing Gu ; Daoxu Chen

  • Author_Institution
    State Key Lab. for Novel Software Technol., Nanjing Univ., Nanjing, China
  • fYear
    2014
  • fDate
    12-14 Sept. 2014
  • Firstpage
    29
  • Lastpage
    33
  • Abstract
    Assessing the readability of documents is always a rewarding work. In this paper, we apply linear regression models for readability assessment of Chinese documents, and put forward LiFR (Linear model incorporating Feature Ranking), which uses feature ranking to select the most appropriate text features to build the linear model. Text features specialized for Chinese are developed, which include the surface, part of speech, parse tree and entropy features. The experimental results demonstrate that both linear and log-linear regression models are worthy of confidence for readability assessment, and can achieve competitive performance to other machine learning methods, such as SVR (Support Vector Machine for Regression). Also the designed features are valuable, and feature ranking is essential to build useful linear functions.
  • Keywords
    learning (artificial intelligence); natural language processing; regression analysis; speech processing; support vector machines; Chinese document readability assessment; LiFR; SVR; entropy feature; feature ranking; log-linear regression model; machine learning; parse tree; part of speech; support vector machine; text features; Abstracts; History; Learning systems; Manganese; Measurement; Software; Training; Chinese; Feature Ranking; Linear Regression Models; Readability Assessment;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
  • Conference_Location
    Singapore
  • Type

    conf

  • DOI
    10.1109/ISCSLP.2014.6936601
  • Filename
    6936601