Presented in U of T Engineering Research Conference 2023
Federated learning has emerged as an important paradigm in modern distributed machine learning.
However, different from conventional distributed learning, the clients in federated learning are
placed in a wild environment where clients do not have consensus over data, systems, privacy and others.
What's more, the heterogeneity problems in federated learning can greatly affect performance.
Personalized federated learning is a method to customize the models on each client and try to
find efficient and effective ways to let there clients share particular knowledge and personalize, so as to achieve the best performance over local data.
In this paper, we go through several representative personalized federated learning methods, particularly over three types: global model,
local customized model and AutoML based methods. We fairly compare these methods on board with consideration of effectiveness,
feasibility and ubiquitousness. We also do some experiments to show that some methods are not as effective as they were claimed in the original papers.
Then we delve deeper into the federated neural architecture search based methods and find out it is a promising direction to solve the heterogeneity problems despite of several drawbacks.
On the basis of that, we give out several directions for future work about how to improve current federated NAS based methods and make them promising.
We also give out discussion about future work on designing better personalized federated learning methods.
Published in 41st IEEE International Conference on Distributed Computing Systems Slides Code
Federated Learning (FL) framework enables training over distributed datasets while keeping the
data local. However, it is difficult to customize a model fitting for all unknown local data. A pre-determined model is most likely to lead to slow convergence or low accuracy, especially when the distributed data is non-i.i.d.. To
resolve the issue, we propose a model searching method in the federated learning scenario, and the method automatically searches a model structure fitting for the unseen local data.We novelly design a reinforcement learningbased framework
that samples and distributes sub-models to the participants and updates its model selection policy by maximizing the reward. In practice, the model search algorithm takes a long time to converge, and hence we adaptively assign sub-models
to participants according to the transmission condition. We further propose delay-compensated synchronization to mitigate loss over late updates to facilitate convergence. Extensive experiments show that our federated model search
algorithm produces highly accurate models efficiently, particularly on non-i.i.d. data.
Published in 22nd IEEE International Conference on Data Mining
We focus on the privacy-preserving
problem in split learning n this work. In vanilla split learning, a neural network is split to different devices to be trained, risking leaking the private training data in the process. We novelly propose a patch shuffling scheme on
transformers to preserve training data privacy, yet without degrading the overall model performance. Formal privacy guarantee is provided and we further introduce the batch shuffling and the spectral shuffling schemes to enhance the
guarantee. We show through experiments that our methods successfully defend the black-box, white-box, and adaptive attacks in split learning, with superior performance over baselines, and are efficient to deploy with negligible overhead
compared to the vanilla split learning.
Published in Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 4, Article 188 / UbiComp 2022 Code
Empowered by machine learning, edge devices including smartphones, wearable, and IoT devices have become growingly intelligent, raising conflicts with the limited resource. On-device model personalization is particularly hard as training
models on edge devices is highly resource-intensive. In this work, we propose a novel training pipeline across the edge and the cloud, by taking advantage of the powerful cloud while keeping data local at the edge. Highlights of the
design incorporate the parallel execution enabled by our feature replay, reduced communication cost by our error-feedback feature compression, as well as the context-aware deployment decision engine. Working as an integrated system,
the proposed pipeline training framework not only significantly speeds up training, but also incurs little accuracy loss or additional memory/energy overhead. We test our system in a variety of settings including WiFi, 5G, household
IoT, and on different training tasks such as image/text classification, image generation, to demonstrate its advantage over the state-of-the-art. Experimental results show that our system not only adapts well to, but also draws on
the varying contexts, delivering a practical and efficient solution to edge-cloud model training.
Published in 40th IEEE International Conference on Distributed Computing Systems
While deep neural networks (DNNs) have led to a paradigm shift, its exorbitant computational requirement has always been a roadblock in its
deployment to the edge, such as wearable devices and smartphones. Hence a hybrid edge-cloud computational framework is proposed to transfer part of the computation to the cloud, by naively partitioning the DNN operations under the
constant network condition assumption. However, realworld network state varies greatly depending on the context, and DNN partitioning only has limited strategy space. In this paper, we explore the structural flexibility of DNN to fit
the edge model to varying network contexts and different deployment platforms. Specifically, we designed a reinforcement learning-based decision engine to search for model transformation strategies in response to a combined objective
of model accuracy and computation latency. The engine generates a context-aware model tree so that the DNN can decide the model branch to switch to at runtime. By the emulation and field experimental results, our approach enjoys a
30%~50% latency reduction while retaining the model accuracy.
Distributed Machine Learning Systems
Distributed Deep Learning Benchmarking
With widespread advances in machine learning, the scalability of deep learning models is increasing and the development of distributed deep learning, it is common to perform distributed training to speed up the training process. To help better instruct the
desgin of distributed data parallel (DDP) training system, in this project, we evaluated different DDP algorithms (All Reduce, Parameter Server) using various neural networks (CNN, transformers) with different frameworks (torch, Ray,
Hoplite) under different environments. Besides, we also tested over other possible factors including settings of hyperparameters, sharing usage of CPU, and communication backend. We constructed different GPU and CPU configurations over
AWS and slurm cluster to evaluate the performance including latency, variability and accuracy to instruct possible improvements when designing new DDP system. we have several suggestions: 1 We should try to shorten the iterations and
use as few iterations as possible to let the training converge. Because training for too many iterations can possibly lead to the spiking of latency. 2 We should use good schedule algorithms to assign the tasks on the distributed servers.
Let as many resources as possible of the same tasks be placed on the same machine or close to each other. And place different tasks separately. 3 We should have a good fencing design of the data center to minimize the effects of CPU
usage. 4 We should design an appropriate checkpointing period, to tradeoff between checkpointing overhead and possible loss caused by the fault happened during training.
Combining digital real-time navigation, robotic arm motion and grinding, a minimally invasive tooth extraction system with digital navigation is established. Combining the dynamic navigator, UR5 robotic arm and dental power unit, through the ICP point
cloud registration method, we can automatically achieve registration on UR5 robotic arm. We performed registration on various dental models, and introduced the dynamic navigator to guide the robotic arm to extract teeth. It is
able to simulate surgical operations on the model.