• DocumentCode
    2594358
  • Title

    Image Extraction from Online Text Streams: A Straightforward Template Independent Approach without Training

  • Author

    Adam, George ; Bouras, Christos ; Poulopoulos, Vassilis

  • Author_Institution
    Comput. Eng. & Inf. Dept., Univ. of Patras, Patras, Greece
  • fYear
    2010
  • fDate
    20-23 April 2010
  • Firstpage
    609
  • Lastpage
    614
  • Abstract
    In this paper we present an efficient system that processes HTML pages in order to extract the useful images from them. The proposed mechanism is template independent and is focalized on HTML pages that include news articles from major portals and blogs. As useful images we define the pictures that are relevant to the news report. In order to extract the image objects of the article we deconstruct the HTML page to its DOM model and we apply a set of algorithms in order to clean and correct the HTML code, locate and characterize each node of the DOM model and finally keep the nodes that are characterized as useful nodes. The proposed mechanism is applied as a subsystem of peRSSonal, a web tool that is used to obtain news articles from all over the world, process them and present them back to the end users in a personalized manner. The role of the mechanism is to feed peRSSonal´s database with digital images for browsing and searching purposes. We present the basic algorithms and experimental results on the efficiency of the proposed implementation.
  • Keywords
    feature extraction; image retrieval; multimedia computing; object detection; text analysis; DOM model; HTML page; image object extraction; online text stream; peRSSonal Web tool; template independent approach; Blogs; Computer networks; Content based retrieval; Data mining; HTML; Informatics; Information retrieval; Portals; Streaming media; Web pages; image annotation; image retrieval; multimedia extraction; web information extraction; web mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Information Networking and Applications Workshops (WAINA), 2010 IEEE 24th International Conference on
  • Conference_Location
    Perth, WA
  • Print_ISBN
    978-1-4244-6701-3
  • Type

    conf

  • DOI
    10.1109/WAINA.2010.131
  • Filename
    5480617