مرکز منطقه ای اطلاع رساني علوم و فناوري - Multilingual hyperdocument recognition: a document mining approach

DocumentCode :

3101095

Title :

Multilingual hyperdocument recognition: a document mining approach

Author :

Nguyen, Tuan Dang ; Zreik, Khaldoun

Author_Institution :

Doctorant en Informatique, Univ. de Caen, France

fYear :

2004

fDate :

19-23 April 2004

Firstpage :

443

Lastpage :

444

Abstract :

This paper suggests a new distributive analysis approach to retrieve multilingual hyperdocument. To learn about the number of languages involved in a Web site, a set of general and computational knowledge is used, which is completely independent of the linguistic domains. The mining process considers three main stages: preprocessing (hyperdocument vectoring), processing (clustering), post processing (clusters pruning). A prototype of this system has been developed and tested two clustering approaches on several international Web - sites with very high and completely satisfactory performances.

Keywords :

Web sites; computational linguistics; data mining; information retrieval; pattern clustering; Web site; cluster pruning; computational knowledge; data clustering; data mining process; distributive analysis; document mining approach; hyperdocument vectoring; language learning; multilingual hyperdocument recognition; multilingual hyperdocument retrieval; post processing; preprocessing; Clustering algorithms; Computer architecture; Data mining; Distributed computing; Machine learning; Machine learning algorithms; Performance evaluation; Prototypes; System testing; Unsupervised learning;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Information and Communication Technologies: From Theory to Applications, 2004. Proceedings. 2004 International Conference on

Print_ISBN :

0-7803-8482-2

Type :

conf

DOI :

10.1109/ICTTA.2004.1307822

Filename :

1307822

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3101095