Title :
Mining the Web for CC++ code with perl
Author :
Frenz, Christopher M.
Author_Institution :
Dept. of Comput. Eng. Technol., New York City Coll. of Technol., Brooklyn, OH, USA
Abstract :
The Web represents one of the largest repositories of information ever compiled by mankind and as such search techniques are essential to navigating its depths and returning pertinent information. Typically the search techniques employed in search engines such as Google entail the use of keywords in which Web pages containing the specified keywords are sought out and then ranked using an algorithm such as PageRank. While keywords are suitable for many search tasks, certain types of data cannot be readily searched using keywords alone. Regular expression based pattern matching allows for enhanced search capability in that it allows for a textual pattern to be specified and matching to be performed against the pattern. Regular expressions have been developed that allow for the identification of common CC++ code structures such a loops, conditionals and functions. These regular expressions are then integrated into a Perl program that performs a keyword based search of the Yahoo Search engine and used to extract any code elements that match those patterns. Thus an algorithm or programming technique can be specified with keywords, the Yahoo search used to identify Web pages pertinent to those keywords, and the regular expressions used to identify and extract any CC++ code found in the resultant Web pages. Application of this technique would likely be of great benefit towards creating specialized search capabilities for software developers.
Keywords :
C++ language; data mining; search engines; CC++ code structures; Google; PageRank; Perl program; Web mining; Yahoo search engine; programming technique; search techniques; software developers; textual pattern; Application software; Cities and towns; Data mining; Educational institutions; Navigation; Pattern matching; Search engines; Telephony; Web pages; Web sites; search; text mining;
Conference_Titel :
Applications and Technology Conference (LISAT), 2010 Long Island Systems
Conference_Location :
Farmingdale, NY
Print_ISBN :
978-1-4244-5548-5
Electronic_ISBN :
978-1-4244-5550-8
DOI :
10.1109/LISAT.2010.5478283