DocumentCode :
2840020
Title :
Converting Web pages into well-formed XML documents
Author :
Ouahid, H. ; Karmouch, A.
Author_Institution :
Multimedia & Mobile Agent Res. Lab., Ottawa, Ont., Canada
Volume :
1
fYear :
1999
fDate :
1999
Firstpage :
676
Abstract :
The work presented is part of a Web mining agent (WMA) system under development at our Multimedia and Mobile Agent Research Laboratory. The purpose of this system is to automatically extract specific information from Web pages and appropriately format the extracted information for further use. This requires resolving problems related to the disorganized nature of the Web that may result from ill-formatted HTML-based Web pages. The desired information is extracted from the Web documents by applying a sequence of filters to these documents. Each of the filters has a specific role. We discuss the filter that is used to convert Web documents into well-formed XML documents. This conversion involves the following operations: (i) syntactic mapping of HTML to XML, (ii) resolving ambiguity introduced by HTML tagging rules, and (iii) handling errors that may occur due to improper usage of HTML by the authors. The paper presents an overview of the Web mining agent system, then gives the motivations for the conversion into XML and finally, discusses in detail the transformation process performed on the Web documents
Keywords :
hypermedia markup languages; information resources; HTML tagging rules; HTML-based Web pages; Web mining agent system; Web pages conversion; World Wide Web; XML documents; ambiguity resolution; filters; syntactic mapping; Data mining; HTML; Information filtering; Information filters; Laboratories; Mobile agents; Multimedia systems; Web mining; Web pages; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Communications, 1999. ICC '99. 1999 IEEE International Conference on
Conference_Location :
Vancouver, BC
Print_ISBN :
0-7803-5284-X
Type :
conf
DOI :
10.1109/ICC.1999.768022
Filename :
768022
Link To Document :
بازگشت