Term paper is ready at docs/final/main/pdf
After I rerun the result, it turns out the original result is not very accurate. The new results reveals the randomness of my algorithm. I will conduct further investigation on this problem before the presentation.
| Vanila GPT-4 | Without Assertion | With Assertion | |
|---|---|---|---|
| pass@1 | 83.5 | 86.5 | 83.5 |
| pass@3 | 80.48 | 92.07 | 93.29 |