Mary Lou Soffa
VerifiedUniversity of Virginia · Computer Science
Active 1977–2024
About
Mary Lou Soffa is a professor with a distinguished career in computer science, focusing on areas related to software testing, performance optimization, and computer architecture. Her research encompasses the development of testing frameworks for neural networks using deep generative models, as well as addressing processor over-provisioning on large-scale multi-core platforms. Throughout her career, she has supervised numerous students and post-doctoral researchers, contributing to advancements in dynamic binary parallelization, resource contention mitigation in warehouse-scale computers, and fault detection frameworks. Her work is characterized by a strong emphasis on improving the reliability, performance, and efficiency of computing systems, and she has been actively involved in mentoring the next generation of computer scientists.
Research topics
- Computer Science
- Artificial Intelligence
- Data Mining
- Machine Learning
- Operating system
- Embedded system
- Programming language
Selected publications
2024-04-12 · 12 citations
articleOpen accessSenior authorDeep neural networks (DNN) are being used in a wide range of applications including safety-critical systems. Several DNN test generation approaches have been proposed to generate fault-revealing test inputs. However, the existing test generation approaches do not systematically cover the input data distribution to test DNNs with diverse inputs, and none of the approaches investigate the relationship between rare inputs and faults. We propose cit4dnn, an automated black-box approach to generate DNN test sets that are feature-diverse and that comprise rare inputs. cit4dnn constructs diverse test sets by applying combinatorial interaction testing to the latent space of generative models and formulates constraints over the geometry of the latent space to generate rare and fault-revealing test inputs. Evaluation on a range of datasets and models shows that cit4dnn generated tests are more feature diverse than the state-of-the-art, and can target rare fault-revealing testing inputs more effectively than existing methods.
Input Distribution Coverage: Measuring Feature Interaction Adequacy in Neural Network Testing
ACM Transactions on Software Engineering and Methodology · 2022 · 20 citations
Senior authorCorresponding- Computer Science
- Computer Science
- Machine Learning
Testing deep neural networks (DNNs) has garnered great interest in the recent years due to their use in many applications. Black-box test adequacy measures are useful for guiding the testing process in covering the input domain. However, the absence of input specifications makes it challenging to apply black-box test adequacy measures in DNN testing. The Input Distribution Coverage (IDC) framework addresses this challenge by using a variational autoencoder to learn a low dimensional latent representation of the input distribution, and then using that latent space as a coverage domain for testing. IDC applies combinatorial interaction testing on a partitioning of the latent space to measure test adequacy. Empirical evaluation demonstrates that IDC is cost-effective, capable of detecting feature diversity in test inputs, and more sensitive than prior work to test inputs generated using different DNN test generation methods. The findings demonstrate that IDC overcomes several limitations of white-box DNN coverage approaches by discounting coverage from unrealistic inputs and enabling the calculation of test adequacy metrics that capture the feature diversity present in the input space of DNNs.
Message from the Program Chairs
2021-02-27
articleOpen access1st authorCorrespondingWe are pleased to welcome you to CGO 2021, the first virtual CGO Conference. In addition, the Program Committee was virtual due to the worldwide infection rate of the coronavirus. On behalf of the Program Committee, we are pleased to present an exciting and stimulating program for the 2021 International Symposium on Code Generation and Optimization Conference.
Artifact: Distribution-Aware Testing of Neural Networks Using Generative Models
2021-05-01
articleSenior authorThe artifact used for the experimental evaluation of Distribution-Aware Testing of Neural Networks Using Generative Models is publicly available on GitHub and it is reusable. The artifact consists of python scripts, trained deep neural network model files and data required for running the experiments. It is also provided as a VirtualBox VM image for reproducing the paper results. Users should be familiar with using VirtualBox software and Linux platform to reproduce or reuse the artifact.
Distribution-Aware Testing of Neural Networks Using Generative Models
2021-05-01 · 3 citations
preprintOpen accessSenior authorThe reliability of software that has a Deep Neural Network (DNN) as a component is urgently important today given the increasing number of critical applications being deployed with DNNs. The need for reliability raises a need for rigorous testing of the safety and trustworthiness of these systems. In the last few years, there have been a number of research efforts focused on testing DNNs. However the test generation techniques proposed so far lack a check to determine whether the test inputs they are generating are valid, and thus invalid inputs are produced. To illustrate this situation, we explored three recent DNN testing techniques. Using deep generative model based input validation, we show that all the three techniques generate significant number of invalid test inputs. We further analyzed the test coverage achieved by the test inputs generated by the DNN testing techniques and showed how invalid test inputs can falsely inflate test coverage metrics. To overcome the inclusion of invalid inputs in testing, we propose a technique to incorporate the valid input space of the DNN model under test in the test generation process. Our technique uses a deep generative model-based algorithm to generate only valid inputs. Results of our empirical studies show that our technique is effective in eliminating invalid tests and boosting the number of valid test inputs generated.
Testing deep neural networks (keynote)
2020-11-15
article1st authorCorrespondingThe reliability of software that has a Deep Neural Network (DNN) as a component is urgently important today given the increasing number of critical applications being deployed with DNNs. The need for reliability raises a need for rigorous testing of the safety and trustworthiness of these systems. In the last few years, there have been a number of research efforts focused on testing DNNs. However, the test generation techniques proposed so far lack a check to determine whether the test inputs they are generating are valid, and thus invalid inputs are produced. To illustrate this situation, we explored three recent DNN testing techniques. Using deep generative model based input validation, we show that all the three techniques generate significant number of invalid test inputs. We further analyzed the test coverage achieved by the test inputs generated by the DNN testing techniques and showed how invalid test inputs can falsely inflate test coverage metrics. To overcome the inclusion of invalid inputs in testing, we propose a technique to incorporate the valid input space of the DNN model under test in the test generation process. Our technique uses a deep generative model-based algorithm to generate only valid inputs. Results of our empirical studies show that our technique is effective in eliminating invalid tests and boosting the number of valid test inputs generated.
A Language for Autonomous Vehicles Testing Oracles
arXiv (Cornell University) · 2020-06-17 · 1 citations
preprintOpen accessTesting autonomous vehicles (AVs) requires complex oracles to determine if the AVs behavior conforms with specifications and humans' expectations. Available open source oracles are tightly embedded in the AV simulation software and are developed and implemented in an ad hoc way. We propose a domain specific language that enables defining oracles independent of the AV solutions and the simulator. A testing analyst can encode safety, liveness, timeliness and temporal properties in our language. To show the expressiveness of our language we implement three different types of available oracles. We find that the same AV solutions may be ranked significantly differently across existing oracles, thus existing oracles do not evaluate AVs in a consistent manner.
Is rust used safely by software developers?
2020 · 57 citations
Senior authorCorresponding- Computer Science
- Computer Science
- Operating system
Rust, an emerging programming language with explosive growth, provides a robust type system that enables programmers to write memory-safe and data-race free code. To allow access to a machine's hardware and to support low-level performance optimizations, a second language, Unsafe Rust, is embedded in Rust. It contains support for operations that are difficult to statically check, such as C-style pointers for access to arbitrary memory locations and mutable global variables. When a program uses these features, the compiler is unable to statically guarantee the safety properties Rust promotes. In this work, we perform a large-scale empirical study to explore how software developers are using Unsafe Rust in real-world Rust libraries and applications. Our results indicate that software engineers use the keyword unsafe in less than 30% of Rust libraries, but more than half cannot be entirely statically checked by the Rust compiler because of Unsafe Rust hidden somewhere in a library's call chain. We conclude that although the use of the keyword unsafe is limited, the propagation of unsafeness offers a challenge to the claim of Rust as a memory-safe language. Furthermore, we recommend changes to the Rust compiler and to the central Rust repository's interface to help Rust software developers be aware of when their Rust code is unsafe.
ESEC/FSE 2019 - A Statistics-based Performance Testing Methodology for Cloud Applications
Figshare · 2019-01-01
articleOpen accessSenior authorThere are the experiment result data sets for ESEC/FSE paper:<br>“<i>A Statistics-based Performance Testing Methodology for Cloud Applications</i>”<br><br>Including source code and dataset<br>For details please refer to Install and Readme
A statistics-based performance testing methodology for cloud applications
2019-08-09 · 60 citations
articleSenior authorThe low cost of resource ownership and flexibility have led users to increasingly port their applications to the clouds. To fully realize the cost benefits of cloud services, users usually need to reliably know the execution performance of their applications. However, due to the random performance fluctuations experienced by cloud applications, the black box nature of public clouds and the cloud usage costs, testing on clouds to acquire accurate performance results is extremely difficult. In this paper, we present a novel cloud performance testing methodology called PT4Cloud. By employing non-parametric statistical approaches of likelihood theory and the bootstrap method, PT4Cloud provides reliable stop conditions to obtain highly accurate performance distributions with confidence bands. These statistical approaches also allow users to specify intuitive accuracy goals and easily trade between accuracy and testing cost. We evaluated PT4Cloud with 33 benchmark configurations on Amazon Web Service and Chameleon clouds. When compared with performance data obtained from extensive performance tests, PT4Cloud provides testing results with 95.4% accuracy on average while reducing the number of test runs by 62%. We also propose two test execution reduction techniques for PT4Cloud, which can reduce the number of test runs by 90.1% while retaining an average accuracy of 91%. We compared our technique to three other techniques and found that our results are much more accurate.
Recent grants
Collaborative Research: CSR--AES--Debugging Dynamic Code Modifications
NSF · $110k · 2005–2007
NSF · $566k · 2008–2013
NSF · $262k · 2010–2014
NSF · $312k · 2016–2020
Frequent coauthors
- 74 shared
Rajiv Gupta
University of California, Riverside
- 30 shared
Bruce R. Childers
University of Pittsburgh
- 19 shared
Jack W. Davidson
- 17 shared
Jason Mars
- 14 shared
David A. Berson
Intel (United Kingdom)
- 14 shared
Wei Wang
- 14 shared
Atif M. Memon
Apple (United States)
- 13 shared
Rastislav Bodík
Google (United States)
Labs
Research in software engineering, computer architecture, and parallel computing
Awards & honors
- Fellow of the Association for Computing Machinery (ACM)
- Fellow of The Institute of Electrical and Electronic Enginee…
- Ken Kennedy Award (2012)
- Anita Borg Technical Leadership Award (2011)
- ACM SIGSOFT Influential Educator Award (2014)
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Mary Lou Soffa
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup