Concolic Program Repair

Automated Program Repair, Program Synthesis, Symbolic Execution

Automated program repair reduces the manual effort in fixing program errors. However, existing repair techniques modify a buggy program such that it passes given tests. Such repair techniques do not discriminate between correct patches and patches that overfit the available tests and break untested but desired functionality. We attempt to solve this problem with a novel solution that make use of the duality of space exploration in Input Space and Program Space. We implemented our technique in the form of a tool called CPR and evaluated its efficacy in reducing the patch space by discarding overfitting patches from a pool of plausible patches. Similar to Cardio-Pulmonary Resuscitation (CPR) does to a patient, our tool CPR resuscitate or recover programs via appropriate fixes. In this work, we therefore propose and implement an integrated approach for detecting and discarding overfitting patches by exploiting the relationship between the patch space and input space. We leverage concolic path exploration to systematically traverse the input space (and generate inputs), while systematically ruling out significant parts of the patch space. Given a long enough time budget, this approach allows a significant reduction in the pool of patch candidates, as shown by our experiments.

Publication

Concolic Program Repair

42nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) 2021, 16 pages

Ridwan Shariffdeen, Yannic Noller, Lars Grunske, Abhik Roychoudhury
(One-line Abstract) Detecting and discarding over-fitting patches via systematic co-exploration of the patch space and input space
Note: Our artifact was awarded the reusable, functional and available badges at PLDI'21
Note: Our paper is now available on ACM digital library, link

Approach

Given the buggy program, a repair budget (time limit, iteration count), the fault location(s), a user specification, the language components for the synthesis, a failing test-case and optionally, a set of initial functional(passing) test cases, CPR generate a refined set of patches that are less over-fitting. The user specification defines the desired behavior of the repaired program (in addition to satisfying the given test cases). If initial tests are available, we assume that at least one failing test is available, which our method seeks to repair, apart from making sure that the user provided specification holds for all paths traversed via concolic exploration. Finally, CPR produces a ranked set of patches based on the explored input space.

Tool

Implemented in Python and C++, this github repository contains the source for CPR and the scripts for the data sets, including test suites, that we used for our experiments.

Dependencies

LLVM - 6.0
Clang - 6.0
Klee - Concolic Support Enabled (Github Repo )
Python - 3.7

Source Files

Github Repository

Docker Image

Dockerhub Repository
Docker image only with the tool can be obtained using the tag 16.04
Docker image with scripts to reproduce the experiments can be obtained using the tag experiments-cpr

Evaluation Subjects

We evaluate our implementation on several benchmarks to illustrate the capabilities of CPR. We make use of three benchmarks:

Security Vulnerabilities: ExtractFix
Logical Programs: SVCOMP
Test-Driven Repair: ManyBugs

Benchmark	Program	Repository/Link
ExtractFix	LibTIFF	Repo
ExtractFix	Binutils	Repo
ExtractFix	LibXML2	Repo
ExtractFix	LibJPEG	Repo
ExtractFix	FFMpeg	Repo
ExtractFix	Jasper	Repo
ExtractFix	Coreutils	Repo
ManyBugs	LibTIFF	Repo
ManyBugs	GZip	Repo
SVCOMP	Insertion Sort	Link
SVCOMP	Linear Search	Link
SVCOMP	String	Link
SVCOMP	Eureka	Link
SVCOMP	Nested Loop	Link
SVCOMP	Sum	Link
SVCOMP	Bubble Sort	Link
SVCOMP	Unique List	Link
SVCOMP	Standard Array	Link
SVCOMP	Recursive Addition	Link

Evaluation Results

We evaluate CPR on three benchmarks for repairing security vulnerabilities, logical errors and test-failures. We analyse the reduction ratio of the input space in terms of the number of paths we discard due to infeasibility and the reduction ratio of the patch space in terms of the number of patches removed due to overfitting. A gist of the results from our evaluation is given below, for full set of details please refer the paper.

#	Program	Bug ID	Paths			Patches
#	Program	Bug ID	Explored	Discarded	Prune Ratio	Synthesised	Removed	Refine Ratio
1	LibTIFF	CVE-2016-5321	67	77	15%	174	70	40%
2	LibTIFF	CVE-2016-9273	10	2	2%	260	119	46%
3	LibTIFF	CVE-2016-3623	102	21	12%	130	30	23%
4	Binutils	CVE-2018-10372	25	1	0.1%	74	35	47%
5	LibXML2	CVE-2012-5134	80	271	10%	260	129	48%
6	LibJPEG	CVE-2018-19664	84	26	4%	130	130	0%
7	Jasper	CVE-2016-8691	69	7	9%	260	164	63%

Artifacts

We provide a replication package for our experiments with CPR on the selected benchmarks in a Docker container, which consists of setup scripts/configuration files to re-run the experiments. The Docker environment consists of all dependencies for CPR, KLEE, and other subjects in the benchmarks. For each experiment, we also provide the logs and other artifacts (i.e., generated patches, rankings) inside an archive named 'results.tar.gz'.

You can access the replication package via Dockerhub at this link (use the tag experiments-cpr) by using the following command:

docker pull rshariffdeen/cpr:experiments-cpr

You can also download the source to build the image on your own using this link, and the following command:

docker build -t experiments-cpr .

For readability, we further provide the list of patches and their details for each benchmark in the following pages

Concolic Program Repair

Publication

Concolic Program Repair

42nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) 2021, 16 pages

Approach

Tool

Dependencies

Source Files

Docker Image

Evaluation Subjects

Evaluation Results

Artifacts

People

Principal Investigators

Developers

Contributors