Abstract :
Looks at the custom tool developed by the author that leverages the Google Web search API (or a similar search service) to discover a list of Web pages matching a given topic; identify and extract trends and patterns from these Web pages´ text; and transform those trends and patterns into an understandable, useful, and well-organized information resource. The tool accomplishes these tasks using four main components. First, a search engine client discovers a list of relevant Web pages using the Google Web search API. An information extraction engine then mines concepts and associated text passages from these Web pages. Next, a clustering engine organizes the most significant concepts into a hierarchical taxonomy. Finally, a knowledge base generator uses this taxonomy to generate a hypertext knowledge base from the extracted concepts and text passages.
Keywords :
Internet; data mining; information retrieval; search engines; text analysis; Internet search engines; automated taxonomy generation; knowledge organization systems; text summarization; text-mining technologies; ultimate research assistant; Data mining; Frequency; Search engines; Simple object access protocol; Taxonomy; Uniform resource locators; Web pages; Web search; Web services; XML; Data mining; IT systems; Online research; Search engines; Text-mining technologies;