DocumentCode :
3101095
Title :
Multilingual hyperdocument recognition: a document mining approach
Author :
Nguyen, Tuan Dang ; Zreik, Khaldoun
Author_Institution :
Doctorant en Informatique, Univ. de Caen, France
fYear :
2004
fDate :
19-23 April 2004
Firstpage :
443
Lastpage :
444
Abstract :
This paper suggests a new distributive analysis approach to retrieve multilingual hyperdocument. To learn about the number of languages involved in a Web site, a set of general and computational knowledge is used, which is completely independent of the linguistic domains. The mining process considers three main stages: preprocessing (hyperdocument vectoring), processing (clustering), post processing (clusters pruning). A prototype of this system has been developed and tested two clustering approaches on several international Web - sites with very high and completely satisfactory performances.
Keywords :
Web sites; computational linguistics; data mining; information retrieval; pattern clustering; Web site; cluster pruning; computational knowledge; data clustering; data mining process; distributive analysis; document mining approach; hyperdocument vectoring; language learning; multilingual hyperdocument recognition; multilingual hyperdocument retrieval; post processing; preprocessing; Clustering algorithms; Computer architecture; Data mining; Distributed computing; Machine learning; Machine learning algorithms; Performance evaluation; Prototypes; System testing; Unsupervised learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information and Communication Technologies: From Theory to Applications, 2004. Proceedings. 2004 International Conference on
Print_ISBN :
0-7803-8482-2
Type :
conf
DOI :
10.1109/ICTTA.2004.1307822
Filename :
1307822
Link To Document :
بازگشت