Research Interests

My research interests lie primarily in distributed ML in mobile computing (high performance, high privacy preserving, low energy consumption, low resource usage). More specifically, my research interests lie primarily in developing more efficient, privacy-preserving and scalable deep learning architecture across multiple mobile, edge devices including following topics: AutoML, Privacy Preserving ML; Federated learning, Distributed Deep Learning; Context-Aware Architecture.


With the development of deep learning, new applications have emerged and I believe we need a new architecture for mobile devices to meet future applications and demand. I wish to build efficient and light weight mobile systems across devices of large scale through network connection. These connected devices work together to complete real industry applications.


During my experiences of doing research works. I prefer a method of combining theory and experiment. I would like to first derive particular algorithms after investigating related works of my interested problems. After confirming the algorithms, I would like to use systemically method of doing experiments with various variables and then do minor modifications to my algorithms.

Research Projects

Federated Learning

Survey on Personalized Federated Learning

Presented in U of T Engineering Research Conference 2023
Federated learning has emerged as an important paradigm in modern distributed machine learning. However, different from conventional distributed learning, the clients in federated learning are placed in a wild environment where clients do not have consensus over data, systems, privacy and others. What's more, the heterogeneity problems in federated learning can greatly affect performance. Personalized federated learning is a method to customize the models on each client and try to find efficient and effective ways to let there clients share particular knowledge and personalize, so as to achieve the best performance over local data.
In this paper, we go through several representative personalized federated learning methods, particularly over three types: global model, local customized model and AutoML based methods. We fairly compare these methods on board with consideration of effectiveness, feasibility and ubiquitousness. We also do some experiments to show that some methods are not as effective as they were claimed in the original papers. Then we delve deeper into the federated neural architecture search based methods and find out it is a promising direction to solve the heterogeneity problems despite of several drawbacks. On the basis of that, we give out several directions for future work about how to improve current federated NAS based methods and make them promising. We also give out discussion about future work on designing better personalized federated learning methods.




Federated Model Search via Reinforcement Learning

Published in 41st IEEE International Conference on Distributed Computing Systems Slides Code
Federated Learning (FL) framework enables training over distributed datasets while keeping the data local. However, it is difficult to customize a model fitting for all unknown local data. A pre-determined model is most likely to lead to slow convergence or low accuracy, especially when the distributed data is non-i.i.d.. To resolve the issue, we propose a model searching method in the federated learning scenario, and the method automatically searches a model structure fitting for the unseen local data.We novelly design a reinforcement learningbased framework that samples and distributes sub-models to the participants and updates its model selection policy by maximizing the reward. In practice, the model search algorithm takes a long time to converge, and hence we adaptively assign sub-models to participants according to the transmission condition. We further propose delay-compensated synchronization to mitigate loss over late updates to facilitate convergence. Extensive experiments show that our federated model search algorithm produces highly accurate models efficiently, particularly on non-i.i.d. data.




Edge-Cloud Computing

Privacy-Preserving Split Learning via Patch Shuffling over Transformers

Published in 22nd IEEE International Conference on Data Mining Code Slides
We focus on the privacy-preserving problem in split learning n this work. In vanilla split learning, a neural network is split to different devices to be trained, risking leaking the private training data in the process. We novelly propose a patch shuffling scheme on transformers to preserve training data privacy, yet without degrading the overall model performance. Formal privacy guarantee is provided and we further introduce the batch shuffling and the spectral shuffling schemes to enhance the guarantee. We show through experiments that our methods successfully defend the black-box, white-box, and adaptive attacks in split learning, with superior performance over baselines, and are efficient to deploy with negligible overhead compared to the vanilla split learning.




Context-Aware Compilation of DNN Training Pipelines across Edge and Cloud

Published in Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 4, Article 188 / UbiComp 2022 Code
Empowered by machine learning, edge devices including smartphones, wearable, and IoT devices have become growingly intelligent, raising conflicts with the limited resource. On-device model personalization is particularly hard as training models on edge devices is highly resource-intensive. In this work, we propose a novel training pipeline across the edge and the cloud, by taking advantage of the powerful cloud while keeping data local at the edge. Highlights of the design incorporate the parallel execution enabled by our feature replay, reduced communication cost by our error-feedback feature compression, as well as the context-aware deployment decision engine. Working as an integrated system, the proposed pipeline training framework not only significantly speeds up training, but also incurs little accuracy loss or additional memory/energy overhead. We test our system in a variety of settings including WiFi, 5G, household IoT, and on different training tasks such as image/text classification, image generation, to demonstrate its advantage over the state-of-the-art. Experimental results show that our system not only adapts well to, but also draws on the varying contexts, delivering a practical and efficient solution to edge-cloud model training.




Context-Aware Deep Model Compression for Edge Cloud Computing

Published in 40th IEEE International Conference on Distributed Computing Systems
While deep neural networks (DNNs) have led to a paradigm shift, its exorbitant computational requirement has always been a roadblock in its deployment to the edge, such as wearable devices and smartphones. Hence a hybrid edge-cloud computational framework is proposed to transfer part of the computation to the cloud, by naively partitioning the DNN operations under the constant network condition assumption. However, realworld network state varies greatly depending on the context, and DNN partitioning only has limited strategy space. In this paper, we explore the structural flexibility of DNN to fit the edge model to varying network contexts and different deployment platforms. Specifically, we designed a reinforcement learning-based decision engine to search for model transformation strategies in response to a combined objective of model accuracy and computation latency. The engine generates a context-aware model tree so that the DNN can decide the model branch to switch to at runtime. By the emulation and field experimental results, our approach enjoys a 30%~50% latency reduction while retaining the model accuracy.




Distributed Machine Learning Systems

Distributed Deep Learning Benchmarking

With widespread advances in machine learning, the scalability of deep learning models is increasing and the development of distributed deep learning, it is common to perform distributed training to speed up the training process. To help better instruct the desgin of distributed data parallel (DDP) training system, in this project, we evaluated different DDP algorithms (All Reduce, Parameter Server) using various neural networks (CNN, transformers) with different frameworks (torch, Ray, Hoplite) under different environments. Besides, we also tested over other possible factors including settings of hyperparameters, sharing usage of CPU, and communication backend. We constructed different GPU and CPU configurations over AWS and slurm cluster to evaluate the performance including latency, variability and accuracy to instruct possible improvements when designing new DDP system. we have several suggestions: 1 We should try to shorten the iterations and use as few iterations as possible to let the training converge. Because training for too many iterations can possibly lead to the spiking of latency. 2 We should use good schedule algorithms to assign the tasks on the distributed servers. Let as many resources as possible of the same tasks be placed on the same machine or close to each other. And place different tasks separately. 3 We should have a good fencing design of the data center to minimize the effects of CPU usage. 4 We should design an appropriate checkpointing period, to tradeoff between checkpointing overhead and possible loss caused by the fault happened during training.




Navigating Tooth Extraction via UR5

Combining digital real-time navigation, robotic arm motion and grinding, a minimally invasive tooth extraction system with digital navigation is established. Combining the dynamic navigator, UR5 robotic arm and dental power unit, through the ICP point cloud registration method, we can automatically achieve registration on UR5 robotic arm. We performed registration on various dental models, and introduced the dynamic navigator to guide the robotic arm to extract teeth. It is able to simulate surgical operations on the model.