You must log in or register to comment.
In their human choice benchmarks it was only chosen 59% of the time compared to 4o. That’s a 15-20x cost increase for 9% difference.
Those charts are hilarious: wow, it gives the right answer 62.5% of the time and only makes up completely false answers 37.1% of the time! It’s like Russian roulette, but worse!