Rui Shu 束 锐

I was a member in the RAISE Lab (Real-world Artifical Intelligence for Software Engineering) at North Carolina State University, under the supervision of Dr. Tim Menzies. My research interests include machine learning and security. I passed my Ph.D. defense in Dec, 2021. Before this, I achieved my master degree in Peking University in 2014 and obtained my bachelor degree in Beijing Jiaotong University in 2010.


I was working on projects about machine learning model hyperparameter optimization, adversarial machine learning, generative adversarial networks, and semi-supervised learning. Besides, I also cooperate on the project of OpenMRS vulnerability detection.

Before joining the software engineering research lab, I worked in the system research lab, which focused on Docker security issues, including identifying security vulnerabilities in Docker images and detecting security anomalies in Docker containers using machine learning techniques.


A Study of Real-World Tensorflow Bugs
Kuang Gong, Jingzhu He, Yuchen Ji, Rui Shu
Under Submission, 2022

This work is an empirical study of recent real-world Tensorflow bugs.


Hyperparameter Optimization on Semi-supervised Learning
Rui Shu, Tianpei Xia, Huy Tu, Laurie Williams, Tim Menzies
Empirical Software Engineering (EMSE), (Under Submission), 2022

The goal of this work is to help security practitioners train useful security classification models when few labeled training data and many unlabeled training data are available.


Dazzle: Using Optimized Generative Adversarial Networks to Address Security Data Class Imbalance Issue
Rui Shu, Tianpei Xia, Laurie Williams, Tim Menzies
Proceedings of the 19th International Conference on Mining Software Repositories (MSR), 2022

The goal of this work is to help security practitioners address software security data class imbalanced issues and further help build better prediction models with resampled datasets. We introduce an approach called Dazzle which is an optimized version of conditional Wasserstein Generative Adversarial Networks with gradient penalty (cWGAN-GP).


Omni: Automated Ensemble with Unexpected Models against Adversarial Evasion Attack
Rui Shu, Tianpei Xia, Laurie Williams, Tim Menzies
Empirical Software Engineering (EMSE), 2021

The goal of this work is to help security practitioners and researchers build a more robust model against adversarial evasion attack through the use of ensemble learning. We propose an approach called OMNI, the main idea of which is to explore methods that create an ensemble of" unexpected models"; ie, models whose control hyperparameters have a large distance to the hyperparameters of an adversary's target model, with which we then make an optimized weighted ensemble prediction.

This paper is also accepted in ICSE 2022 as Journal First paper.


Do I really need all this work to find vulnerabilities? An empirical case study comparing vulnerability detection techniques on a Java application
Sarah Elder, Nusrat Zahan, Rui Shu, Monica Metro, Val Kozarev, Tim Menzies, Laurie Williams
Empirical Software Engineering (EMSE) (Under Revision), 2021

The goal of this research is to assist managers and other decision-makers on software projects in making informed choices about the use of software vulnerability detection techniques through an empirical study of the efficiency and effectiveness of four techniques on a Java-based web application called OpenMRS.


Predicting Project Health for Open Source Projects (using the DECART Hyperparameter Optimizer)
Tianpei Xia, Wei Fu, Rui Shu, Tim Menzies
Empirical Software Engineering (EMSE), 2021

Software developed on public platforms are a source of data that can be used to make predictions about those projects. While the activity of a single developer may be random and hard to predict, when large groups of developers work together on software projects, the resulting behavior can be predicted with good accuracy. To demonstrate this, we use 78,455 months of data from 1,628 GitHub projects to make various predictions about the current status of those projects (as of April 2020). We find that traditional estimation algorithms make many mistakes. Algorithms like k-nearest neighbors (KNN), support vector regression (SVR), random forest (RFT), linear regression (LNR), and regression trees (CART) have high error rates (usually more than 50% wrong, sometimes over 130% wrong, median values). But that error rate can be greatly reduced using the DECART hyperparameter optimization. DECART is a differential evolution (DE) algorithm that tunes the CART data mining system to the particular details of a specific project.


Structuring a Comprehensive Software Security Course Around the OWASP Application Security Verification Standard
Sarah Elder, Nusrat Zahan, Val Kozarev, Rui Shu, Tim Menzies, Laurie Williams
43rd International Conference on Software Engineering, Joint Track on Software Engineering Education and Training (ICSE-JSEET), 2021

The goal of this paper is to aid software engineering educators in designing a comprehensive software security course by sharing an experience running a software security course for the eleventh time.

effort estimation

Sequential Model Optimization for Software Effort Estimation
Tianpei Xia, Rui Shu, Xipeng Shen, Tim Menzies
IEEE Transactions on Software Engineering (TSE), 2020

Many methods have been proposed to estimate how much effort is required to build and maintain software. Much of that research assumes a" classic" waterfall-based approach rather than contemporary agile projects. Also, much of that work tries to recommend a single method--an approach that makes the dubious assumption that one method can handle the diversity of software project data. To address these drawbacks, we apply a configuration technique called" ROME"(Rapid Optimizing Methods for Estimation), which uses sequential model-based optimization to find what combination of techniques works best for a particular data set. In this paper, we test this method using data from 1161 classic waterfall projects and 446 contemporary agile projects (from Github).


How to Better Distinguish Security Bug Reports (using Dual Hyperparameter Optimization)
Rui Shu, Tianpei Xia, Jianfeng Chen, Laurie Williams, Tim Menzies
Empirical Software Engineering (EMSE), 2020

The goal of this research is to aid practitioners as they struggle to optimize methods that try to distinguish between rare security bug reports and other bug reports. Our proposed method, called Swift, is a dual optimizer that optimizes both learner and pre-processor options. Since this is a large space of options, Swift uses a technique called epsilon-dominance that learns how to avoid operations which do not significantly improve performance.

This paper is also accepted in ESEC/FSE 2021 as Journal First paper.


A Study of Security Vulnerabilities on Docker Hub
Rui Shu, Xiaohui Gu, William Enck
Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy (CODASPY), 2017

In this paper, we study the state of security vulnerabilities in Docker Hub images. We create a scalable Docker image vulnerability analysis (DIVA) framework that automatically discovers, downloads, and analyzes both official and community images on Docker Hub. Using our framework, we have studied 356,218 images and made several findings.

This paper is also introduced in the morning paper and ACM's official Twitter.


A Study of Security Isolation Techniques
Rui Shu, Peipei Wang, Sigmund A Gorski III, Benjamin Andow, Adwait Nadkarni, Luke Deshotels, Jason Gionta, William Enck, Xiaohui Gu
ACM Computing Surveys (CSUR), 2016

This article seeks to understand existing security isolation techniques by systematically classifying different approaches and analyzing their properties. We provide a hierarchical classification structure for grouping different security isolation techniques.

Professional Services

Computers & Security, 2022 - Reviewer

ASE (Automated Software Engineering, 2022) - Reviewer

TDSC (Transactions on Dependable and Secure Computing, 2021) - Reviewer

TETCI (IEEE Transactions on Emerging Topics in Computational Intelligence, 2021) - Reviewer

IST (Information and Software Technology, 2019, 2021) - Reviewer

TOSEM ( Transactions on Software Engineering and Methodology, 2020) - Reviewer

APSys (ACM SIGOPS Asia-Pacific Workshop on Systems, 2018) - Reviewer

TON (ACM/IEEE Transactions on Network, 2017-2018) - Reviewer

IC2E (IEEE International Conference on Cloud Engineering, 2017) - Reviewer

TPDS (IEEE Transactions on Parallel and Distributed Systems, 2016-2017) - Reviewer

Teaching Assistant

Automated Software Engineering (CSC 591 & 791) - Fall 2019 (NCSU)

Introduction to Computing - Java (CSC 116) - Spring & Summer 2019 (NCSU)

Operating System Principles (CSC 501) - Fall 2018 (NCSU)

Computer Organization and Assembly Language (CSC 236) - Fall 2017 (NCSU)

Design and Analysis of Algorithms (CSC 505) - Spring 2015 (NCSU)

Discrete Mathematics for Computer Scientists (CSC 226) - Spring 2015 (NCSU)

Programming Concept - Java (CSC 216) - Fall 2014 (NCSU)

Introduction to Information Technology - Fall 2012 (PKU)

Website template credit from Jon Barron. Last updated on March 21st, 2022.