Title :
A Web Page Segmentation Algorithm Based on Iterated Dividing and Shrinking
Author :
Jiuxin, Cao ; Bo, Mao ; Junzhou, Luo
Author_Institution :
Southeast Univ., Nanjing
Abstract :
Based on image processing technology and the web page special characteristics, a new web page segmentation algorithm - Iterated Dividing and Shrinking Algorithm is proposed. Image dividing conditions are introduced, and the dividing zone concept is given. Based on that, the web page is first transformed into image, and then by shrinking and splitting repeatedly, the image is divided into sub- images which are consentaneous in vision. Experiments show that the algorithm is suitable for web page segmentation, and does well in expansibility and performance.
Keywords :
Internet; document image processing; image segmentation; Web page segmentation algorithm; image dividing; iterated dividing and shrinking algorithm; Algorithm design and analysis; Computer networks; Computer science; HTML; Image processing; Image segmentation; Information security; Laboratories; Parallel processing; Web pages;
Conference_Titel :
Network and Parallel Computing Workshops, 2007. NPC Workshops. IFIP International Conference on
Conference_Location :
Liaoning
Print_ISBN :
978-0-7695-2943-1
DOI :
10.1109/NPC.2007.63