Title :
Automated Data Mining from Web Servers Using Perl Script
Author :
Neeli, Sandeep ; Govindasamy, Kannan ; Wilamowski, Bogdan M. ; Malinowski, Aleksander
Author_Institution :
Dept. of Electr. & Comput. Eng., Auburn Univ., Auburn, AL
Abstract :
Data mining from the Web is the process of extracting essential data from any web server. In this paper, we present a method called Ethernet Robot to extract information/data from a web server using perl scripting language and to process the data using regular expressions. The procedure involves fetching, filtering, processing and presentation of required data. The resultant HTML file consisting of the required data is displayed for the perusal of users. Future enhancements to our ethernet robot include optimization to improve performance and customization for use as a sophisticated client-specific search agent.
Keywords :
Internet; Perl; data acquisition; data mining; hypermedia markup languages; local area networks; Ethernet robot; HTML; Web servers; automated data mining; client-specific search agent; data extraction; information extraction; perl scripting language; regular expressions; Data analysis; Data mining; Ethernet networks; Filtering; HTML; Machine learning; Pattern analysis; Web pages; Web server; Web sites; Data Extraction; Data Mining; Perl; Regular Expressions; wget;
Conference_Titel :
Intelligent Engineering Systems, 2008. INES 2008. International Conference on
Conference_Location :
Miami, FL
Print_ISBN :
978-1-4244-2082-7
Electronic_ISBN :
978-1-4244-2083-4
DOI :
10.1109/INES.2008.4481293