DocumentCode :
3703552
Title :
Learning better while sending less: Communication-efficient online semi-supervised learning in client-server settings
Author :
Han Xiao;Shou-De Lin;Mi-Yen Yeh;Phillip B. Gibbons;Claudia Eckert
Author_Institution :
Technische Universit?t M?nchen, 85748 Garching, Germany
fYear :
2015
Firstpage :
1
Lastpage :
10
Abstract :
We consider a novel distributed learning problem: A server receives potentially unlimited data from clients in a sequential manner, but only a small initial fraction of these data are labeled. Because communication bandwidth is expensive, each client is limited to sending the server only a small (high-priority) fraction of the unlabeled data it generates, and the server is limited in the amount of prioritization hints it sends back to the client. The goal is for the server to learn a good model of all the client data from the labeled and unlabeled data it receives. This setting is frequently encountered in real-world applications and has the characteristics of online, semi-supervised, and active learning. However, previous approaches are not designed for the client-server setting and do not hold the promise of reducing communication costs. We present a novel framework for solving this learning problem in an effective and communication-efficient manner. On the server side, our solution combines two diverse learners working collaboratively, yet in distinct roles, on the partially labeled data stream. A compact, online graph-based semi-supervised learner is used to predict labels for the unlabeled data arriving from the clients. Samples from this model are used as ongoing training for a linear classifier. On the client side, our solution prioritizes data based on an active-learning metric that favors instances that are close to the classifier´s decision hyperplane and yet far from each other. To reduce communication, the server sends the classifier´s weight-vector to the client only periodically. Experimental results on real-world data sets show that this particular combination of techniques outperforms other approaches, and in particular, often outperforms (communication expensive) approaches that send all the data to the server.
Keywords :
"Servers","Semisupervised learning","Distributed databases","Bandwidth","Labeling","Data models","Cameras"
Publisher :
ieee
Conference_Titel :
Data Science and Advanced Analytics (DSAA), 2015. 36678 2015. IEEE International Conference on
Print_ISBN :
978-1-4673-8272-4
Type :
conf
DOI :
10.1109/DSAA.2015.7344833
Filename :
7344833
Link To Document :
بازگشت