SirGolan@lemmy.sdf.orgtoTechnology@lemmy.ml•ChatGPT gets code questions wrong 52% of the time
0·
1 year agoWait a second here… I skimmed the paper and GitHub and didn’t find an answer to a very important question: is this GPT3.5 or 4? There’s a huge difference in code quality between the two and either they made a giant accidental omission or they are being intentionally misleading. Please correct me if I missed where they specified that. I’m assuming they were using GPT3.5, so yeah those results would be as expected. On the HumanEval benchmark, GPT4 gets 67% and that goes up to 90% with reflexion prompting. GPT3.5 gets 48.1%, which is exactly what this paper is saying. (source).
Funny story… I switched to Home assistant from custom software I wrote when I realized I was reverse engineering the MyQ API for the 5th time and really didn’t feel like doing it a 6th. Just ordered some ratdgos.