Some interesting projects

You can refer to my Github and see some open questions and projects. You can view these as my blog with open source code to discuss some interesting questions related to my research. I would like to thank my friends, teachers and colleagues for helping me complete these projects. We share equal contribution to all projects here.

Machine Learning

How to Construct a Good Model Source Code
Dixi Yao
In this project, I implement different machine learning models including basic linear models, non-linear models and neural networks. During each part, I have some extensions beyond the basic conclusion we can derive. After implementing these models, I also use some visualization methods to diagnosis the models and examine the parameters learned by these models. Some small experiments are conducted to verify my guesses and conclusions. Besides having a deeper understanding of machine learning, I generate some my own understanding and conclusions.

Human understanding of machine's representation Source Code
Dixi Yao, Zhanda Zhu, Hongjie Fang, Haoran Zhao*
In this project, We find that a networks acts with such process: first transforming the human knowledge into machine knowledge such as low dimension vectors gradually through convolutional blocks or residual blocks, then transforming the machine knowledge again into human knowledge such as probabilities. The reason behind a good architecture is the network with such architecture can successfully transform images into features for machine processing and reconstruct human knowledge from machine knowledge.

Recognize FlyExpress in Pytorch

Dixi Yao, Jiuzhang Wang
Fast regulated network over variable sets of features with loss annealing Source Code
Annotation of gene expression images of Drosophila embryos is a meaningful and interesting task. However, since the complexity and variety of gene expressions, it is also a difficult problem. In this paper, we propose a novel model architecture combining CNN and RNN models. We use CNN model to extract features and use RNN model to learn knowledge in sequences. Apart from that, we also propose a novel optimization methods including applying different loss functions and multiple tempering methods. We then evaluate our method on the open dataset FlyExpress, the experiments show our model can reach 95.9% AUC, 64.77% macro F1 score and 65.93% micro F1 score.


Towards Hybrid Fuzzing with Multi-level Coverage Tree and Reinforcement Learning in Greybox Fuzzing

Dixi Yao, Kai Shen, Xiaochong Wei
Coverage-guided greybox fuzzing is considered the state-of-the-art testing technique in vulnerability detection. As for the promising vulnerability detection technique, we address that there are still two challenges to be resolved for better performance. On the one hand, code coverage metrics should be more informative to distinguish between various program executions. On the other hand, seed scoring algorithms should be well-designed for a better balance between seed exploration and seed exploitation. In this paper, we propose a two-fold hybrid solution to tackle these unresolved challenges. We leverage a multi-level coverage tree to take advantage of various coverage metrics with efficiency. Meanwhile, state-of-the-art reinforcement learning algorithms are being leveraged for intelligently scoring the seeds. Evaluations of our work are conducted on a lightweight JSON parser benchmark, which is implemented for experiments under low computation budgets. More importantly, it reveals the superiority of our approach on the basis of unique crashes, unique hangs, and total covered paths

Optimization over Client Selection in Efficient Federated Learning

Dixi Yao, Kai Wang

Source Code Federated learning emerges as a branch of machine learning for solving problems by multiple clients in a decentralized fashion, where various clients learn the local models on the local datasets, and the master server aggregates the updates from selected clients for global updates. Therefore, efficiently selecting clients for participating in global aggregation is very crucial when considering the trade-off between performance and communication efficiency in federated learning. In this project, we formulate the client selection problem as solving a convex optimization problem to obtain the optimal selection solution. Experimental results on the benchmark dataset show that our problem convex optimization based client selection method achieves the same performance to FedAvg baseline model while taking significantly reduced time, improving the communication efficiency in federated learning.

Data Science

Extend Structure Entropy of Graph to Weighted and Directed Graph under communites

Dixi Yao
Source Code
IoT security is becoming more and more important and we find the problem of device detection attack in IoT networks. To solve the anti-device detection problem, we propose the concept of safety index. Traditional method based on graph isomorphism and graph homomorphism cannot well establish the index. As a result, we propose the safety index based on structral entropy of a weighted and directed graph where we extend the structral entropy to IoT networks. Base on that, we propose an efficient deception algorithm. The experiments on different networks show the efficiency and outstanding performance of our deceptor.

Mentor Relate

Dixi Yao,Chumen Liang, Hongkun Hao
MentoRelate: A Mentor Recommendation System Based on Non-Generalized Word Model and Word2Vec Source Code
Undergraduates have difficulty in searching for proper mentors of their own colleges, for existing searching engines do not support accurate academic retrieval for professors. In this paper, we propose a recommendation model based on Non-Generalized Word (NGW) model and Word2Vec embedding. We define key words with relatively low affinity with academic words as NGWs. By applying an attention mechanism on NGWs rather than other key words, our model can better detect mentors who focus on subdivision fields indicated by key words, rather than recommend mentors based on general and vague subjects information. We then develop a recommendation system based on proposed model. For evaluation, we conduct experiments to validate the effectiveness of our recommendation system based on the professor information of Shanghai Jiao Tong University. Reaching an HR of 82%, our system also performs a fast reaction speed. To the best of our knowledge, we are the first to propose the problem and give solution.

Engineering projects


Dixi Yao, Letian Peng, Hongxu Li
Full mark project Source Code
From the benefit, it can meet the needs of a large number of people who need to use solidwork software but do not have enough physical environment. Especially for the local environment of our school, all students from engineering platforms, students from the School of Design and students in the School of Mechanical and Power Engineering need to use solidwork.
In terms of cost, not only solidwork software, but with the development of computer and Internet technology, we are teaching and in the daily industry, we can also get rid of the situation of installing software on physical machines for students to use. Reduce capital and resource development Imagine that it would be very useful to convert a physical machine with application software in the Student Innovation Center into a cloud service with big resource savings.

Simple Simulated Five Level Pipeline CPU

Dixi Yao
Full mark project Source Code
I Conducted a comprehensive learning of a new language (Hard Design language) and designed a five level pipeline CPU which can simulate a 32-bits CPU in less than one month I also added some special feature like better branch prediction and more supported 32-bits commandsresource savings.

V-Guard applied in Supply Chain Management

Dixi Yao, Kai Shen, Xiaochong Wei
Source Code
Supply chains move products/services from origin to consumer through sourcing, manufacturing, packaging, transportation, and distribution. The tracking of goods and services is critical to identify bottlenecks and inefficiencies, ensure timely delivery, maintain customer satisfaction, improve security, and comply with regulations. By monitoring the movement of products or services, companies can optimize their operations, mitigate potential risks such as theft or fraud, and gain valuable insights into their supply chain.
We propose a solution for tracking goods and services in supply chain system by using V-Guard, which is designed for V2X network and can fit well for supply chain since trucks are main transportation power of the system.

Other Open Questions