I watched this in a YouTube Shorts format a week ago, where they ask a few models about walking or driving to the car wash.
They have some more funny ask AI shorts.
This is a most excellent place for technology news and articles.
I watched this in a YouTube Shorts format a week ago, where they ask a few models about walking or driving to the car wash.
They have some more funny ask AI shorts.
10 tests per model seems like way too little and they should give confidence intervals…
the 10/10 vs. 8/10 is just as likely due chance than any real difference. But some people will definitely use this to justify model choice.
Did this say whether the reasoning models get this right more than the others? Was curious about that but missed it if it was mentioned.