• DocumentCode
    3389797
  • Title

    IEEE article data extraction from internet.

  • Author

    Pham, Nam ; Wilamowski, Bogdan M.

  • Author_Institution
    Auburn University, Department of Electrical and Computer Engineering, AL, U.S.A
  • fYear
    2009
  • fDate
    16-18 April 2009
  • Firstpage
    251
  • Lastpage
    256
  • Abstract
    Article data extraction from internet is a way to download and extract the required data automatically from web servers. In this paper, we present a method called the Internet Robot to extract the data directly from a web server by using Perl scripting language with the powerful regular expressions. The regular expressions are widely used in this method to reduce the complexity of the program code as well as increase up the downloading and extracting speed. The Internet Robot in this paper is a process of three steps: data collection, data filtering and processing, data presentation. The final result of this process will be the html files- with all required data in the format as Fig. 1- presented under different links of a webpage as Fig. 5. The accuracy and speed make this method become unique in processing and extracting data not only from the internet but also from an available database.
  • Keywords
    Data mining; Databases; Information filtering; Information filters; Internet; Open source software; Robotics and automation; Service robots; Web pages; Web server;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Engineering Systems, 2009. INES 2009. International Conference on
  • Conference_Location
    Barbados
  • Print_ISBN
    978-1-4244-4111-2
  • Electronic_ISBN
    978-1-4244-4113-6
  • Type

    conf

  • DOI
    10.1109/INES.2009.4924771
  • Filename
    4924771