When Automated Program Repair Meets Regression Testing—An Extensive Study on Two Million Patches
Abstract
1 Introduction
2 Background and Motivation
2.1 RTS
2.2 APR
2.3 RTS in Patch Validation
APR | Time | RTS | APR | Time | RTS | APR | Time | RTS |
---|---|---|---|---|---|---|---|---|
PAR [31] | 2013 | No | jGenProg [53] | 2016 | No | jKali [53] | 2016 | No |
jMutRepair [53] | 2016 | No | DynaMoth [16] | 2016 | No | xPAR [36] | 2016 | No |
HDRepair [36] | 2016 | No | NPEFix [15] | 2017 | No | ACS [80] | 2017 | No |
Genesis [47] | 2017 | No | jFix/S3 [35] | 2017 | No | EXLIR [65] | 2017 | No |
JAID [12] | 2017 | No | ssFix [79] | 2017 | No | SimFix [25] | 2018 | No |
Cardumen [54] | 2018 | No | SketchFix [23] | 2018 | Stmt | LSRepair [44] | 2018 | No |
SOFix [46] | 2018 | No | CapGen [72] | 2018 | Class | ARJA [85] | 2018 | Stmt |
GenProg-A [85] | 2018 | Stmt | Kali-A [85] | 2018 | Stmt | RSRepair-A [85] | 2018 | Stmt |
SequenceR [14] | 2019 | No | kPAR [41] | 2019 | No | DeepRepair [74] | 2019 | No |
PraPR [18] | 2019 | Stmt | Hercules [66] | 2019 | No | GenPat [24] | 2019 | No |
AVATAR [42] | 2019 | No | TBar [43] | 2019 | No | Nopol [81] | 2019 | No |
ConFix [32] | 2019 | No | FixMiner [33] | 2020 | No | CocoNut [52] | 2020 | No |
DLFix [38] | 2020 | No | CURE [27] | 2021 | No | Recoder [89] | 2021 | No |
Reward [82] | 2022 | No | DEAR [39] | 2022 | No | AlphaRepair [78] | 2022 | No |
ARJA-e [86] | 2020 | No | VarFix [76] | 2021 | No |
3 Study Design
3.1 Preliminaries
3.1.1 Patch Validation Matrix.
3.1.2 Studied RTS Strategies.
3.1.3 Efficiency Measurement.
3.2 Research Questions (RQs)
3.3 Benchmark
3.4 Studied APR Systems
3.5 Implementation Details
3.6 Threat to Validity
4 Results Analysis
4.1 RQ1: Revisiting APR Efficiency
Subject | RTS Strategy | PraPR | SimFix | AVATAR | kPar | TBar | FixMiner | Dynamoth | ACS | Arja | Kali-A | GenProg-A | RSRepair-A |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Lang | #Patch | 381.27 | 275.98 | 4.63 | 4.79 | 4.76 | 4.56 | 0.16 | 1.00 | 522.97 | 26.92 | 628.85 | 289.23 |
#Test (\(RTS_{no}\)) | 2,117.61 | 8,802.36 | 321.41 | 163.19 | 437.52 | 233.15 | 1,256.86 | 1,140.80 | 68,448.64 | 799.43 | 86,910.25 | 10,945.44 | |
#Test (\(RTS_{class}\)) | 447.31 | 291.91 | 19.48 | 18.30 | 24.63 | 18.88 | 25.71 | 64.40 | 1,571.25 | 59.54 | 2,746.19 | 620.69 | |
#Test (\(RTS_{method}\)) | 414.15 | 244.86 | 12.11 | 11.89 | 12.63 | 11.88 | 2.86 | 1.80 | 1,196.83 | 51.54 | 1,376.81 | 556.39 | |
#Test (\(RTS_{stmt}\)) | 412.59 | 244.79 | 11.44 | 11.59 | 11.96 | 11.50 | 2.00 | 1.20 | 1,176.53 | 50.63 | 1,369.64 | 534.58 | |
Math | #Patch | 1,483.12 | 469.45 | 5.84 | 5.12 | 5.01 | 4.76 | 0.23 | 1.00 | 804.09 | 10.11 | 858.19 | 411.58 |
#Test (\(RTS_{no}\)) | 11,879.77 | 13,723.01 | 168.78 | 237.05 | 640.65 | 239.38 | 2,042.74 | 3,071.33 | 104,804.90 | 705.64 | 73,174.10 | 9,661.55 | |
#Test (\(RTS_{class}\)) | 1,711.16 | 679.05 | 12.92 | 46.66 | 34.66 | 9.47 | 50.79 | 241.33 | 4,828.21 | 22.00 | 5,231.08 | 828.93 | |
#Test (\(RTS_{method}\)) | 1,573.21 | 403.17 | 8.20 | 7.25 | 8.18 | 7.03 | 32.53 | 14.83 | 2,105.64 | 15.97 | 1,610.38 | 498.28 | |
#Test (\(RTS_{stmt}\)) | 1,557.41 | 379.67 | 7.81 | 6.86 | 7.25 | 6.23 | 27.00 | 6.33 | 1,382.11 | 14.10 | 1,419.85 | 463.63 | |
Time | #Patch | 1,466.54 | 396.50 | 2.50 | 2.00 | 2.23 | 1.08 | 0.23 | 1.00 | 335.92 | 13.23 | 380.62 | 107.46 |
#Test (\(RTS_{no}\)) | 7,782.19 | 3,754.42 | 1,065.00 | 4.73 | 359.09 | 490.00 | 1,303.67 | 3,894.00 | 34,310.82 | 724.00 | 141,456.00 | 2,652.50 | |
#Test (\(RTS_{class}\)) | 2,513.88 | 658.67 | 232.55 | 4.73 | 80.82 | 107.38 | 537.33 | 2,042.00 | 7,337.91 | 325.70 | 29,401.40 | 635.00 | |
#Test (\(RTS_{method}\)) | 1,848.62 | 372.08 | 9.73 | 4.73 | 6.55 | 5.25 | 8.67 | 52.00 | 3,822.82 | 55.30 | 10,596.50 | 413.90 | |
#Test (\(RTS_{stmt}\)) | 1,808.12 | 368.08 | 9.73 | 4.73 | 6.55 | 5.25 | 8.67 | 27.00 | 1,781.91 | 38.50 | 1,903.60 | 291.50 | |
Chart | #Patch | 784.88 | 588.21 | 5.00 | 4.56 | 4.40 | 4.48 | 0.40 | - | 627.76 | 42.36 | 729.72 | 333.16 |
#Test (\(RTS_{no}\)) | 5,517.96 | 2,459.42 | 272.84 | 107.61 | 367.74 | 102.12 | 1,640.80 | - | 202,236.87 | 2,035.27 | 104,090.08 | 9,552.46 | |
#Test (\(RTS_{class}\)) | 1,053.80 | 529.25 | 8.26 | 8.00 | 8.79 | 7.12 | 77.30 | - | 27,627.48 | 133.45 | 5,694.38 | 758.38 | |
#Test (\(RTS_{method}\)) | 868.24 | 485.96 | 6.89 | 6.89 | 6.37 | 6.59 | 16.70 | - | 6,853.48 | 61.45 | 2,645.12 | 450.67 | |
#Test (\(RTS_{stmt}\)) | 863.96 | 485.92 | 6.89 | 6.89 | 6.37 | 6.59 | 16.30 | - | 2,356.57 | 56.82 | 1,746.42 | 386.71 | |
Closure | #Patch | 14,725.00 | 511.03 | 2.96 | 2.61 | 2.94 | 0.42 | - | - | - | - | - | - |
#Test (\(RTS_{no}\)) | 86,651.19 | 7,995.31 | 342.78 | 378.73 | 478.09 | 2.89 | - | - | - | - | - | - | |
#Test (\(RTS_{class}\)) | 47,374.48 | 3,229.28 | 140.81 | 154.94 | 197.02 | 2.89 | - | - | - | - | - | - | |
#Test (\(RTS_{method}\)) | 28,476.71 | 922.37 | 6.17 | 9.90 | 6.67 | 2.89 | - | - | - | - | - | - | |
#Test (\(RTS_{stmt}\)) | 21,825.07 | 662.63 | 4.49 | 3.92 | 4.40 | 2.89 | - | - | - | - | - | - | |
Mockito | #Patch | 2,307.92 | - | 1.67 | 2.00 | 1.81 | 1.67 | - | - | - | - | - | - |
#Test (\(RTS_{no}\)) | 3,753.19 | - | 267.75 | 41.27 | 200.27 | 4.62 | - | - | - | - | - | - | |
#Test (\(RTS_{class}\)) | 2,857.81 | - | 97.56 | 26.00 | 146.47 | 4.62 | - | - | - | - | - | - | |
#Test (\(RTS_{method}\)) | 2,555.61 | - | 7.62 | 5.80 | 48.00 | 4.62 | - | - | - | - | - | - | |
#Test (\(RTS_{stmt}\)) | 2,547.64 | - | 7.38 | 5.80 | 7.33 | 4.62 | - | - | - | - | - | - |
4.2 RQ2: Overall Impact of RTS Strategies
Subject | RTS Strategy | PraPR | SimFix | AVATAR | kPar | TBar | FixMiner | Dynamoth | ACS | Arja | Kali-A | GenProg-A | RSRepair-A |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Lang | \(RTS_{class}\) | 33.33% | 35.66% | 17.92% | 10.57% | 21.48% | 14.80% | 69.99% | 56.37% | 36.67% | 22.16% | 26.11% | 30.12% |
\(RTS_{method}\) | 34.51% | 36.31% | 18.35% | 10.95% | 22.05% | 15.22% | 71.31% | 59.94% | 38.16% | 22.50% | 26.68% | 31.17% | |
\(RTS_{stmt}\) | 34.55% | 36.31% | 18.40% | 10.99% | 22.09% | 15.26% | 71.36% | 59.97% | 38.17% | 22.52% | 26.70% | 31.20% | |
Math | \(RTS_{class}\) | 48.91% | 52.45% | 5.46% | 5.36% | 19.91% | 8.63% | 92.17% | 90.59% | 38.66% | 16.44% | 28.77% | 31.73% |
\(RTS_{method}\) | 50.41% | 55.81% | 6.27% | 6.30% | 21.08% | 8.99% | 93.06% | 98.86% | 40.16% | 16.79% | 30.16% | 32.44% | |
\(RTS_{stmt}\) | 50.50% | 55.89% | 6.28% | 6.31% | 21.11% | 9.05% | 93.33% | 99.63% | 40.42% | 16.85% | 30.35% | 32.68% | |
Time | \(RTS_{class}\) | 31.25% | 29.42% | 7.14% | 0.00% | 7.15% | 9.83% | 19.62% | 47.56% | 19.59% | 17.90% | 7.97% | 16.26% |
\(RTS_{method}\) | 40.12% | 35.60% | 9.06% | 0.00% | 9.06% | 12.45% | 33.15% | 98.66% | 21.41% | 28.27% | 9.31% | 17.38% | |
\(RTS_{stmt}\) | 40.31% | 35.60% | 9.06% | 0.00% | 9.06% | 12.45% | 33.15% | 99.31% | 22.60% | 29.01% | 9.93% | 18.01% | |
Chart | \(RTS_{class}\) | 58.50% | 33.16% | 15.68% | 5.49% | 20.88% | 5.85% | 85.96% | - | 53.39% | 40.87% | 34.93% | 39.33% |
\(RTS_{method}\) | 62.45% | 34.67% | 15.76% | 5.55% | 21.02% | 5.88% | 89.10% | - | 59.10% | 43.78% | 38.81% | 42.02% | |
\(RTS_{stmt}\) | 62.52% | 34.67% | 15.76% | 5.55% | 21.02% | 5.88% | 89.12% | - | 59.67% | 44.10% | 39.22% | 42.45% | |
Closure | \(RTS_{class}\) | 31.73% | 20.27% | 3.22% | 3.79% | 3.90% | 0.00% | - | - | - | - | - | - |
\(RTS_{method}\) | 46.18% | 30.51% | 5.28% | 6.56% | 7.38% | 0.00% | - | - | - | - | - | - | |
\(RTS_{stmt}\) | 51.80% | 32.01% | 5.30% | 6.65% | 7.42% | 0.00% | - | - | - | - | - | - | |
Mockito | \(RTS_{class}\) | 18.03% | - | 21.10% | 2.77% | 5.49% | 0.00% | - | - | - | - | - | - |
\(RTS_{method}\) | 21.17% | - | 30.69% | 6.43% | 14.82% | 0.00% | - | - | - | - | - | - | |
\(RTS_{stmt}\) | 21.30% | - | 30.72% | 6.43% | 19.67% | 0.00% | - | - | - | - | - | - | |
Average | \(RTS_{class}\) | 36.96% | 34.19% | 11.75% | 4.66% | 13.13% | 6.52% | 66.93% | 64.84% | 37.08% | 24.34% | 24.45% | 29.36% |
\(RTS_{method}\) | 42.47% | 38.58% | 14.23% | 5.96% | 15.90% | 7.09% | 71.66% | 85.82% | 39.71% | 27.83% | 26.24% | 30.75% | |
\(RTS_{stmt}\) | 43.50% | 38.90% | 14.25% | 5.99% | 16.73% | 7.11% | 71.74% | 86.30% | 40.22% | 28.12% | 26.55% | 31.09% |
4.3 RQ3: Impact on Different Patches
4.3.1 Impact on Patches of Different Fixing Capabilities.
APR | \(\mathbb{P}_{P2F}\) | \(\mathbb{P}_{{\checkmark}}\) | ||||
---|---|---|---|---|---|---|
\(RTS_{class}\) | \(RTS_{method}\) | \(RTS_{stmt}\) | \(RTS_{class}\) | \(RTS_{method}\) | \(RTS_{stmt}\) | |
PraPR | 73.24% | 85.74% | 87.95% | 87.33% | 97.18% | 97.88% |
SimFix | 82.88% | 99.11% | 99.15% | 84.50% | 96.77% | 98.22% |
AVATAR | 81.16% | 98.95% | 99.17% | 82.27% | 99.36% | 99.53% |
kPar | 66.33% | 98.07% | 98.58% | 83.47% | 99.31% | 99.80% |
TBar | 68.34% | 98.63% | 99.13% | 76.07% | 95.39% | 99.55% |
FixMiner | 85.02% | 96.90% | 99.14% | 93.28% | 99.71% | 99.80% |
Dynamoth | 94.75% | 96.71% | 97.22% | 87.50% | 99.19% | 99.27% |
ACS | - | - | - | 98.34% | 99.66% | 99.95% |
Arja | 90.09% | 95.56% | 97.28% | 93.58% | 99.15% | 99.44% |
Kali | 88.76% | 97.97% | 99.30% | 84.30% | 99.00% | 99.59% |
GenProg | 88.85% | 96.84% | 99.05% | 94.28% | 99.33% | 99.57% |
RSRepair | 92.83% | 97.94% | 99.42% | 95.79% | 99.11% | 99.58% |
4.3.2 Impact on Patches of Different Fixing Scopes.
Subject | \(RTS_{class}\) | \(RTS_{method}\) | \(RTS_{stmt}\) | |||
---|---|---|---|---|---|---|
\(\mathbb{P}_{MC}\) | \(\mathbb{P}_{SC}\) | \(\mathbb{P}_{MM}\) | \(\mathbb{P}_{SM}\) | \(\mathbb{P}_{MS}\) | \(\mathbb{P}_{SS}\) | |
Lang | 21.83% | 27.36% | 21.83% | 27.86% | 15.25% | 25.75% |
Math | 26.18% | 28.82% | 17.67% | 30.35% | 20.45% | 29.75% |
Time | 9.85% | 13.26% | 20.78% | 16.49% | 7.53% | 14.77% |
Chart | 32.92% | 30.99% | 27.95% | 35.14% | 19.27% | 33.64% |
Closure | - | 11.30% | - | 19.48% | 55.06% | 17.34% |
Mockito | - | 9.48% | - | 14.62% | - | 15.62% |
4.4 RQ4: Impact with the Full Matrix
APR | \(RTS_{no}\) | \(RTS_{class}\) | \(RTS_{method}\) | \(RTS_{stmt}\) |
---|---|---|---|---|
PraPR | 20,018,923.45 | 12,495,932.50 | 9,958,460.34 | 9,600,813.31 |
SimFix | 70,588.42 | 11,408.54 | 1,483.51 | 1,039.05 |
AVATAR | 11,065.76 | 2,821.98 | 1,219.36 | 591.52 |
kPar | 9,903.78 | 2,435.97 | 1,088.50 | 509.45 |
TBar | 10,256.57 | 2,856.96 | 1,237.31 | 522.12 |
FixMiner | 6,441.49 | 740.07 | 144.00 | 32.86 |
Dynamoth | 596.65 | 114.32 | 17.75 | 16.86 |
ACS | 93.46 | 5.12 | 1.44 | 0.71 |
Arja | 1,435,025.26 | 177,282.34 | 32,882.52 | 23,861.27 |
Kali | 54,751.70 | 3,999.24 | 915.59 | 755.44 |
GenProg | 1,585,441.03 | 199,088.14 | 33,824.91 | 23,840.29 |
RSRepair | 668,634.75 | 48,785.24 | 7,650.52 | 5,131.89 |
4.5 RQ5: Test Selection \(+\)Prioritization
APR | \(RTS_{no}\) (%) | \(RTS_{class}\) (%) | \(RTS_{method}\) (%) | \(RTS_{stmt}\) (%) | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
\(TP_{APR}\) | \(RTP_{tot}\) | \(RTP_{add}\) | \(TP_{base}\) | \(TP_{APR}\) | \(RTP_{tot}\) | \(RTP_{add}\) | \(TP_{base}\) | \(TP_{APR}\) | \(RTP_{tot}\) | \(RTP_{add}\) | \(TP_{base}\) | \(TP_{APR}\) | \(RTP_{tot}\) | \(RTP_{add}\) | |
PraPR | 14.53 | 5.08 | 15.21 | 36.96 | 41.44 | 35.44 | 40.38 | 42.47 | 44.55 | 41.13 | 44.95 | 43.50 | 44.82 | 42.91 | 44.35 |
SimFix | 20.51 | \(-\)33.17 | 14.21 | 34.19 | 36.76 | 14.99 | 36.02 | 38.58 | 39.25 | 20.86 | 38.74 | 38.90 | 39.33 | 37.20 | 39.03 |
AVATAR | 0.98 | 1.44 | 4.31 | 11.75 | 11.28 | 11.48 | 10.47 | 14.23 | 14.24 | 14.24 | 14.31 | 14.25 | 14.26 | 14.25 | 14.30 |
kPar | \(-\)1.75 | \(-\)0.83 | 0.66 | 4.66 | 3.81 | 3.93 | 4.21 | 5.96 | 5.98 | 5.99 | 4.43 | 5.99 | 6.00 | 6.01 | 5.90 |
TBar | \(-\)1.08 | \(-\)1.49 | 2.79 | 13.13 | 11.96 | 11.66 | 13.86 | 15.90 | 15.90 | 15.89 | 15.44 | 16.73 | 16.73 | 16.72 | 16.14 |
FixMiner | \(-\)1.44 | \(-\)0.62 | 0.08 | 6.52 | 6.54 | 6.51 | 6.07 | 7.09 | 7.10 | 7.10 | 6.64 | 7.11 | 7.11 | 7.11 | 8.53 |
Dynamoth | 2.07 | 1.49 | 5.44 | 66.93 | 67.01 | 66.95 | 66.89 | 71.66 | 71.71 | 71.67 | 71.65 | 71.74 | 71.79 | 71.74 | 71.77 |
ACS | 0.00 | \(-\)33.16 | -4.12 | 64.84 | 64.84 | 64.42 | 65.93 | 85.82 | 85.82 | 85.85 | 85.97 | 86.30 | 86.30 | 86.32 | 86.32 |
Arja | 20.58 | 5.33 | 13.30 | 37.08 | 39.33 | 37.71 | 39.71 | 39.71 | 40.72 | 40.01 | 40.92 | 40.22 | 40.79 | 40.48 | 40.62 |
Kali | 6.80 | 6.90 | 10.29 | 24.34 | 25.81 | 25.29 | 26.16 | 27.83 | 28.10 | 27.87 | 28.98 | 28.12 | 28.17 | 28.13 | 28.14 |
GenProg | 14.03 | 7.85 | 11.51 | 24.45 | 25.79 | 24.57 | 25.87 | 26.24 | 26.62 | 26.11 | 27.14 | 26.55 | 26.66 | 26.49 | 26.56 |
RSRepair | 14.31 | 9.00 | 10.95 | 29.36 | 30.47 | 29.71 | 31.08 | 30.75 | 31.04 | 30.79 | 32.02 | 31.09 | 31.13 | 31.06 | 31.11 |
5 Discussion
6 Related Work
7 Conclusion and Future Work
References
Index Terms
- When Automated Program Repair Meets Regression Testing—An Extensive Study on Two Million Patches
Recommendations
Benchmarking Automated Program Repair: An Extensive Study on Both Real-World and Artificial Bugs
ISSTA 2024: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and AnalysisAs bugs are inevitable and prevalent in real-world programs, many Automated Program Repair (APR) techniques have been proposed to generate patches for them. However, due to the lack of a standard for evaluating APR techniques, prior works tend to use ...
Better test cases for better automated program repair
ESEC/FSE 2017: Proceedings of the 2017 11th Joint Meeting on Foundations of Software EngineeringAutomated generate-and-validate program repair techniques (G&V techniques) suffer from generating many overfitted patches due to in-capabilities of test cases. Such overfitted patches are incor- rect patches, which only make all given test cases pass, ...
Poracle: Testing Patches under Preservation Conditions to Combat the Overfitting Problem of Program Repair
To date, the users of test-driven program repair tools suffer from the overfitting problem; a generated patch may pass all available tests without being correct. In the existing work, users are treated as merely passive consumers of the tests. However, ...
Comments
Information & Contributors
Information
Published In
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Check for updates
Author Tags
Qualifiers
- Research-article
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 658Total Downloads
- Downloads (Last 12 months)658
- Downloads (Last 6 weeks)123
Other Metrics
Citations
View Options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in