DocumentCode
2347375
Title
SuperMatrix: a General tool for lexical semantic knowledge acquisition
Author
Broda, Bartosz ; Piasecki, Maciej
Author_Institution
Inst. of Appl. Inf., Wroclaw Univ. of Technol., Wroclaw
fYear
2008
fDate
20-22 Oct. 2008
Firstpage
345
Lastpage
352
Abstract
The paper presents the supermatrix system, which was designed as a general tool supporting automatic acquisition of lexical semantic relations from corpora. The construction of the system is discussed, but also examples of different applications showing the potential of supermatrix are given. The core of the system is construction of co-incidence matrices from corpora written in any natural language as the system works on UTF-8 encoding and possesses modular construction. Supermatrix follows the general scheme of distributional methods. Many different matrix transformations and similarity computation methods were implemented in the system. As a result the majority of existing measures of semantic relatedness were re-implemented in the system. The system supports also evaluation of the extracted measures by the tests originating from the idea of the WordNet Based Synonymy Test. In the case of Polish, SuperMatrix includes the implementation of the language of lexico-syntactic constraints delivering means for a kind of shallow syntactic processing. SuperMatrix processes also multiword expressions as lexical units being described and elements of the description. Processing can be distributed as a number of matrix operations were implemented. The system serves huge matrices.
Keywords
knowledge acquisition; matrix algebra; natural language processing; SuperMatrix system; UTF-8 encoding; WordNet based synonymy test; co-incidence matrices; lexical semantic knowledge acquisition; lexical units; lexico-syntactic constraints; matrix operation; matrix transformation; multiword expression; natural language; similarity computation method; syntactic processing; Computer science; Encoding; Informatics; Information technology; Knowledge acquisition; Modular construction; Natural languages; Paper technology; Strontium; System testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Science and Information Technology, 2008. IMCSIT 2008. International Multiconference on
Conference_Location
Wisia
Print_ISBN
978-83-60810-14-9
Type
conf
DOI
10.1109/IMCSIT.2008.4747263
Filename
4747263
Link To Document