• DocumentCode
    1055936
  • Title

    Automatic Website Summarization by Image Content: A Case Study with Logo and Trademark Images

  • Author

    Baratis, Evdoxios ; Petrakis, Euripides G M ; Milios, Evangelos

  • Author_Institution
    Dept. of Electron. & Comput. Eng., Tech. Univ. of Crete (TUC), Chania
  • Volume
    20
  • Issue
    9
  • fYear
    2008
  • Firstpage
    1195
  • Lastpage
    1204
  • Abstract
    Image-based abstraction (or summarization) of a Web site is the process of extracting the most characteristic (or important) images from it. The criteria for measuring the importance of images in Web sites are based on their frequency of occurrence, characteristics of their content and Web link information. As a case study, this work focuses on logo and trademark images. These are important characteristic signs of corporate Web sites or of products presented there. The proposed method incorporates machine learning for distinguishing logo and trademarks from images of other categories (e.g., landscapes, faces). Because the same logo or trademark may appear many times in various forms within the same Web site, duplicates are detected and only unique logo and trademark images are extracted. These images are then ranked by importance taking frequency of occurrence, image content and Web link information into account. The most important logos and trademarks are finally selected to form the image-based summary of a Web site. Evaluation results of the method on real Web sites are also presented. The method has been implemented and integrated into a fully automated image-based summarization system which is accessible on the Web (www.intelligence.tuc.gr/websummarization)
  • Keywords
    Web sites; abstracting; document image processing; feature extraction; learning (artificial intelligence); Web link information; automatic Website summarization; corporate Web sites; feature extraction; image content; image extraction; image ranking; image-based abstraction; importance ranking; logo image; machine learning; occurrence frequency; trademark image; Abstracting methods; Applications; Content Analysis and Indexing; Indexing Methods; Information Storage and Retrieval;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2008.34
  • Filename
    4445672