• DocumentCode
    538066
  • Title

    Parallel, massive processing in SuperMatrix—A general tool for distributional semantic analysis of corpus

  • Author

    Broda, Bartosz ; Jaworski, Damian ; Piasecki, Maciej

  • Author_Institution
    Inst. of Inf., Wroclaw Univ. of Technol., Wroclaw, Poland
  • fYear
    2010
  • fDate
    18-20 Oct. 2010
  • Firstpage
    373
  • Lastpage
    379
  • Abstract
    The paper presents an extended version of the SuperMatrix system-a general tool supporting automatic acquisition of lexical semantic relations from corpora. Extensions focus mainly on parallel processing of massive amounts of data. The construction of the system is discussed. Three distributed parts of the system are presented, i.e., distributed construction of co-incidence matrices from corpora, computation of similarity matrix and parallel solving of synonymy tests. An evaluation of a proposed approach to parallel processing is shown. Parallelization of similarity matrix computation demonstrates almost linear speedup. The smallest improvements were achieved for construction of matrices, as this process is mostly bound by reading huge amounts of data. Also, a few areas in which functionality of SuperMatrix was improved are described.
  • Keywords
    matrix algebra; parallel processing; programming language semantics; text analysis; SuperMatrix system; automatic acquisition; co-incidence matrices; corpus; data processing; distributional semantic analysis; lexical semantic; parallel processing; similarity matrix computation; synonymy test; text corpora; Algorithm design and analysis; Clustering algorithms; Complexity theory; Context; Random access memory; Semantics; Sparse matrices;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Information Technology (IMCSIT), Proceedings of the 2010 International Multiconference on
  • Conference_Location
    Wisla
  • ISSN
    2157-5525
  • Print_ISBN
    978-1-4244-6432-6
  • Type

    conf

  • DOI
    10.1109/IMCSIT.2010.5679915
  • Filename
    5679915