You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for improving Agentless. However, I can't reproduce the performance mentioned in the technical report based on the code you provided.
When I generate the total files in 'repair_samles_1' - 'repair_samples_4' folders, I cannot generate 'all_preds.jsonl' file using all of 40 samples independently in the 4 folders. So, I merge and renamed the output sample files from ‘output_0_normalized.jsonl' to 'output_39_normalized.jsonl'. After merging, I run 'rerank.py' and generate 'all_preds.jsonl'.
Using gpt-4o-08-06 model and following the instructions in 'readme_swebench.md', I only got 26% pass rate (78/300) on SWE-Bench-lite. Moreover, even if I use all the intermediate results you provided in realese 1.5 and only run 'rerank.py', I can still only achieve a pass rate of 29.67% (89/300).
I was wondering if my use of 40 samples in 4 folders is incorrect? And how can I achieve 32% pass rate which you have submitted to SWE-bench through your intermediate results.
The text was updated successfully, but these errors were encountered:
Thanks for improving Agentless. However, I can't reproduce the performance mentioned in the technical report based on the code you provided.
When I generate the total files in 'repair_samles_1' - 'repair_samples_4' folders, I cannot generate 'all_preds.jsonl' file using all of 40 samples independently in the 4 folders. So, I merge and renamed the output sample files from ‘output_0_normalized.jsonl' to 'output_39_normalized.jsonl'. After merging, I run 'rerank.py' and generate 'all_preds.jsonl'.
Using gpt-4o-08-06 model and following the instructions in 'readme_swebench.md', I only got 26% pass rate (78/300) on SWE-Bench-lite. Moreover, even if I use all the intermediate results you provided in realese 1.5 and only run 'rerank.py', I can still only achieve a pass rate of 29.67% (89/300).
I was wondering if my use of 40 samples in 4 folders is incorrect? And how can I achieve 32% pass rate which you have submitted to SWE-bench through your intermediate results.
The text was updated successfully, but these errors were encountered: