Title :
Understanding theWeb Page Layout
Author :
Zhou, Minghong ; Li, Rubao ; Li, Wei
Author_Institution :
Inst. of Comput. Technol., Chinese Acad. of Sci.
Abstract :
Web pages express their semantics not only by free texts, but also by their layouts. While information is explicitly encoded in free texts, the layout implicitly uncovers the semantical relationships of the free texts. In this paper, we proposed a framework for mining the semantics implied by the layout. The core of our work is a new HTML document model, called nested table model, which synthesize the DOM model and the syntax of HTML language. By the nested table model, we could formally define the relevancy of free texts. And hence, free texts could be grouped by their relevancy. Our experiment results indicate that the relevancy correctly reflects the semantics of Web page layout
Keywords :
Internet; data mining; hypermedia markup languages; DOM model; HTML document model; Web page layout; free text relevancy; nested table model; semantic mining; semantical relationships; Computers; Conferences; Context modeling; Data mining; HTML; Relational databases; Search engines; Temperature; Weather forecasting; Web pages;
Conference_Titel :
Data Mining Workshops, 2006. ICDM Workshops 2006. Sixth IEEE International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
0-7695-2702-7
DOI :
10.1109/ICDMW.2006.163