DocumentCode :
2699084
Title :
Postal Address Detection fromWeb Documents
Author :
Can, Lin ; Qian, Zhang ; Xiaofeng, Meng ; Wenyin, Liu
Author_Institution :
Sch. of Inf., Renmin Univ., Beijing
fYear :
2005
fDate :
8-9 April 2005
Firstpage :
40
Lastpage :
45
Abstract :
An approach to postal address detection from Web pages is proposed. The Web pages are first segmented into text blocks based on their visual similarity. The text content in each block undergoes the recognition process, which employs a syntactic approach. The grammars of almost all possible patterns of postal addresses are built for this purpose. The results of our preliminary experiments on 44 Web pages with 56 true addresses show that our approach can detect the postal addresses with a high precision (89.3%) and a low false alarms rate (3.8%)
Keywords :
Web sites; document image processing; grammars; image recognition; text analysis; Web documents; grammars; postal address detection; text blocks; text recognition; Application software; Computer applications; Computer science; Data mining; HTML; Information retrieval; Machine learning; Partial response channels; Statistical analysis; Text recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Information Retrieval and Integration, 2005. WIRI '05. Proceedings. International Workshop on Challenges in
Conference_Location :
Tokyo
Print_ISBN :
0-7695-2414-1
Type :
conf
DOI :
10.1109/WIRI.2005.28
Filename :
1552994
Link To Document :
بازگشت