مرکز منطقه ای اطلاع رساني علوم و فناوري - Distributed training for Conditional Random Fields

DocumentCode :

2348138

Title :

Distributed training for Conditional Random Fields

Author :

Lin, Xiaojun ; Zhao, Liang ; Yu, Dianhai ; Wu, Xihong

Author_Institution :

Key Lab. of Machine Perception & Intell., Speech & Hearing Res. Center, Peking Univ., Beijing, China

fYear :

2010

fDate :

21-23 Aug. 2010

Firstpage :

Lastpage :

Abstract :

This paper proposes a novel distributed training method of Conditional Random Fields (CRFs) by utilizing the clusters built from commodity computers. The method employs Message Passing Interface (MPI) to deal with large-scale data in two steps. Firstly, the entire training data is divided into several small pieces, each of which can be handled by one node. Secondly, instead of adopting a root node to collect all features, a new criterion is used to split the whole feature set into non-overlapping subsets and ensure that each node maintains the global information of one feature subset. Experiments are carried out on the task of Chinese word segmentation (WS) with large scale data, and we observed significant reduction on both training time and space, while preserving the performance.

Keywords :

message passing; natural language processing; Chinese word segmentation; conditional random fields; distributed training method; message passing interface; Accuracy; Equations; Variable speed drives; Chinese word segmentation; Distributed strategy; conditional random fields; large-scale data; natural language processing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on

Conference_Location :

Beijing

Print_ISBN :

978-1-4244-6896-6

Type :

conf

DOI :

10.1109/NLPKE.2010.5587803

Filename :

5587803

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2348138