Some interesting projects
You can refer to my Github and see some open questions and projects.
You can view these as my blog with open source code to discuss some interesting questions related to my research.
I would like to thank my friends, teachers and colleagues for helping me complete these projects. We share equal contribution
to all projects here.
How to Construct a Good Model Source Code
In this project, I implement different machine learning models
including basic linear models, non-linear models and neural
networks. During each part, I have some extensions beyond
the basic conclusion we can derive. After implementing these
models, I also use some visualization methods to diagnosis the
models and examine the parameters learned by these models.
Some small experiments are conducted to verify my guesses
and conclusions. Besides having a deeper understanding of
machine learning, I generate some my own understanding and
Human understanding of machine's representation Source Code
Dixi Yao, Zhanda Zhu, Hongjie Fang, Haoran Zhao*
In this project, We find that a networks acts with such process: first
transforming the human knowledge into machine knowledge
such as low dimension vectors gradually through convolutional
blocks or residual blocks, then transforming the machine
knowledge again into human knowledge such as probabilities.
The reason behind a good architecture is the network with such
architecture can successfully transform images into features
for machine processing and reconstruct human knowledge
from machine knowledge.
Recognize FlyExpress in Pytorch
Dixi Yao, Jiuzhang Wang
Fast regulated network over variable sets of features
with loss annealing Source Code
Annotation of gene expression images of Drosophila
embryos is a meaningful and interesting task. However, since
the complexity and variety of gene expressions, it is also a
difficult problem. In this paper, we propose a novel model
architecture combining CNN and RNN models. We use CNN
model to extract features and use RNN model to learn knowledge
in sequences. Apart from that, we also propose a novel
optimization methods including applying different loss functions
and multiple tempering methods. We then evaluate our method
on the open dataset FlyExpress, the experiments show our
model can reach 95.9% AUC, 64.77% macro F1 score and
65.93% micro F1 score.
Dixi Yao, Kai Shen, Xiaochong Wei
Coverage-guided greybox fuzzing is considered the state-of-the-art testing technique in vulnerability detection. As
for the promising vulnerability detection technique, we address that there are still two challenges to be resolved for
better performance. On the one hand, code coverage metrics
should be more informative to distinguish between various
program executions. On the other hand, seed scoring algorithms should be well-designed for a better balance between
seed exploration and seed exploitation.
In this paper, we propose a two-fold hybrid solution to
tackle these unresolved challenges. We leverage a multi-level
coverage tree to take advantage of various coverage metrics
with efficiency. Meanwhile, state-of-the-art reinforcement
learning algorithms are being leveraged for intelligently scoring the seeds. Evaluations of our work are conducted on a
lightweight JSON parser benchmark, which is implemented
for experiments under low computation budgets. More importantly, it reveals the superiority of our approach on the
basis of unique crashes, unique hangs, and total covered
Dixi Yao, Kai Wang
Federated learning emerges as a branch of machine learning for solving problems by multiple clients in a decentralized fashion,
where various clients learn the local models on the local datasets, and the master server aggregates the updates from selected clients for global updates.
Therefore, efficiently selecting clients for participating in global aggregation is very crucial when considering the trade-off between performance and communication efficiency in federated learning.
In this project, we formulate the client selection problem as solving a convex optimization problem to obtain the optimal selection solution.
Experimental results on the benchmark dataset show that our problem convex optimization based client selection method achieves the same performance to FedAvg baseline model while taking significantly reduced time,
improving the communication efficiency in federated learning.
Extend Structure Entropy of Graph to Weighted and Directed Graph under communites
IoT security is becoming more and more important
and we find the problem of device detection attack in IoT
networks. To solve the anti-device detection problem, we propose
the concept of safety index. Traditional method based on graph
isomorphism and graph homomorphism cannot well establish
the index. As a result, we propose the safety index based on
structral entropy of a weighted and directed graph where we
extend the structral entropy to IoT networks. Base on that,
we propose an efficient deception algorithm. The experiments
on different networks show the efficiency and outstanding performance
of our deceptor.
Dixi Yao,Chumen Liang, Hongkun Hao
MentoRelate: A Mentor Recommendation System
Based on Non-Generalized Word Model and
Word2Vec Source Code
Undergraduates have difficulty in searching for
proper mentors of their own colleges, for existing searching
engines do not support accurate academic retrieval for professors.
In this paper, we propose a recommendation model based on
Non-Generalized Word (NGW) model and Word2Vec embedding.
We define key words with relatively low affinity with academic
words as NGWs. By applying an attention mechanism on NGWs
rather than other key words, our model can better detect mentors
who focus on subdivision fields indicated by key words, rather
than recommend mentors based on general and vague subjects
information. We then develop a recommendation system based
on proposed model. For evaluation, we conduct experiments to
validate the effectiveness of our recommendation system based
on the professor information of Shanghai Jiao Tong University.
Reaching an HR of 82%, our system also performs a fast
reaction speed. To the best of our knowledge, we are the first to
propose the problem and give solution.
Dixi Yao, Letian Peng, Hongxu Li
Full mark project Source Code
From the benefit, it can meet the needs of a large number of people who need to use solidwork software but do not have enough physical environment.
Especially for the local environment of our school, all students from engineering platforms, students from the School of Design and
students in the School of Mechanical and Power Engineering need to use solidwork.
In terms of cost, not only solidwork software, but with the development of computer and Internet technology, we are teaching and
in the daily industry, we can also get rid of the situation of installing software on physical machines for students to use. Reduce capital and resource development
Imagine that it would be very useful to convert a physical machine with application software in the Student Innovation Center into a cloud service with big resource savings.
Simple Simulated Five Level Pipeline CPU
Full mark project Source Code
I Conducted a comprehensive learning of a new language (Hard Design language) and designed a five level pipeline
CPU which can simulate a 32-bits CPU in less than one month
I also added some special feature like better branch prediction and more supported 32-bits commandsresource savings.
Dixi Yao, Kai Shen, Xiaochong Wei
Supply chains move products/services from origin to consumer through sourcing, manufacturing, packaging, transportation, and distribution. The tracking of goods and services is critical to identify bottlenecks and inefficiencies, ensure timely delivery, maintain customer satisfaction, improve security, and comply with regulations. By monitoring the movement of products or services, companies can optimize their operations, mitigate potential risks such as theft or fraud, and gain valuable insights into their supply chain.
We propose a solution for tracking goods and services in supply chain system by using V-Guard, which is designed for V2X network and can fit well for supply chain since trucks are main transportation power of the system.
Other Open Questions