Title :
Web sites thematic classification using hidden Markov models
Author :
Serradura, Lyonel ; Slimane, Mohamed ; Vincent, Nicole
Author_Institution :
Lab. d´´Inf., Tours Univ., France
fDate :
6/23/1905 12:00:00 AM
Abstract :
There is more and more information available on the Internet. We need tools to help us extract the right piece of information. We have developed a classification algorithm tackling this issue in French. It distinguishes web pages classifying their text content into themes. We use Hidden Markov Models (HMM) to build this method named STCoL (Supervised Thematic Corpus Learning). Once themes are modeled with HMMs, STCoL is able to classify documents from different sources. This method is not only efficient but is also robust
Keywords :
Internet; classification; hidden Markov models; information resources; French; Hidden Markov Models; Internet; STCoL; Supervised Thematic Corpus Learning; classification algorithm; text content; thematic classification; web pages; Data mining; Hidden Markov models; Internet; Law; Legal factors; Portals; Robustness; Waste materials; Web pages;
Conference_Titel :
Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on
Conference_Location :
Seattle, WA
Print_ISBN :
0-7695-1263-1
DOI :
10.1109/ICDAR.2001.953955