• DocumentCode
    1653750
  • Title

    Measuring Comprehensibility of Web Pages Based on Link Analysis

  • Author

    Akamatsu, Kouichi ; Pattanasri, Nimit ; Jatowt, Adam ; Tanaka, Katsumi

  • Author_Institution
    Dept. of Social Inf., Kyoto Univ., Kyoto, Japan
  • Volume
    1
  • fYear
    2011
  • Firstpage
    40
  • Lastpage
    46
  • Abstract
    We put forward a hypothesis that if there is a link from one page to another, it is likely that comprehensibility of the two pages is similar. To investigate whether this hypothesis is true or not, we conduct experiments using existing readability measures. We investigate the relationship between links and readability of text extracted from web pages for two datasets, set of English and Japanese pages. We could find that links and readability of text extracted from web pages are correlated. Based on the hypothesis, we propose a link analysis algorithm to measure comprehensibility of web pages. Our method is based on the Trust Rank algorithm which is originally used for combating web spam. We use link structure to propagate readability scores from source pages selected based on their comprehensibility. The results of experimental evaluation demonstrate that our method could improve estimation of comprehensibility of pages.
  • Keywords
    Web services; information retrieval; search engines; text analysis; unsolicited e-mail; English; Japanese pages; TrustRank algorithm; Web pages; Web spam; comprehensibility; extracted text readability; link analysis; Algorithm design and analysis; Complexity theory; Educational institutions; Search engines; Vocabulary; Web pages; Web search; comprehensibility; link analysis; readability;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence and Intelligent Agent Technology (WI-IAT), 2011 IEEE/WIC/ACM International Conference on
  • Conference_Location
    Lyon
  • Print_ISBN
    978-1-4577-1373-6
  • Electronic_ISBN
    978-0-7695-4513-4
  • Type

    conf

  • DOI
    10.1109/WI-IAT.2011.242
  • Filename
    6040494