Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing Agentless-1.5 Results on SWE-bech lite #39

Open
GCVulnerability opened this issue Nov 12, 2024 · 1 comment
Open

Reproducing Agentless-1.5 Results on SWE-bech lite #39

GCVulnerability opened this issue Nov 12, 2024 · 1 comment

Comments

@GCVulnerability
Copy link

GCVulnerability commented Nov 12, 2024

Thanks for improving Agentless. However, I can't reproduce the performance mentioned in the technical report based on the code you provided.
When I generate the total files in 'repair_samles_1' - 'repair_samples_4' folders, I cannot generate 'all_preds.jsonl' file using all of 40 samples independently in the 4 folders. So, I merge and renamed the output sample files from ‘output_0_normalized.jsonl' to 'output_39_normalized.jsonl'. After merging, I run 'rerank.py' and generate 'all_preds.jsonl'.

Using gpt-4o-08-06 model and following the instructions in 'readme_swebench.md', I only got 26% pass rate (78/300) on SWE-Bench-lite. Moreover, even if I use all the intermediate results you provided in realese 1.5 and only run 'rerank.py', I can still only achieve a pass rate of 29.67% (89/300).

I was wondering if my use of 40 samples in 4 folders is incorrect? And how can I achieve 32% pass rate which you have submitted to SWE-bench through your intermediate results.

@brutalsavage
Copy link
Contributor

Hi @GCVulnerability

You should not merge the output sample files together instead you should use the rerank script this way:

python agentless/repair/rerank.py --patch_folder results/swe-bench-lite/repair_sample_1/,results/swe-bench-lite/repair_sample_2/,results/swe-bench-lite/repair_sample_3/,results/swe-bench-lite/repair_sample_4 \
                                  --num_samples 40 \
                                  --deduplicate \
                                  --regression \
                                  --reproduction

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants