ChatGPT 'got absolutely wrecked' by Atari 2600 in beginner's chess match — OpenAI's newest model bamboozled by 1970s logic

@misk@sopuli.xyz · 10 months ago

ChatGPT 'got absolutely wrecked' by Atari 2600 in beginner's chess match — OpenAI's newest model bamboozled by 1970s logic

@thefartographer@lemm.ee · 10 months ago

Atari game programmed to know chess moves: knight to B4

Chat-GPT: many Redditors have credited Chesster A. Pawnington with inventing the game when he chased the queen across the palace before crushing the king with a castle tower. Then he became the king and created his own queen by playing “The Twist” and “Let’s Twist Again” at the same time.

oce 🐆 · 10 months ago

A PE teacher got absolutely wrecked by a former Olympic sprinter at a sprint competition.

@thefartographer@lemm.ee · 10 months ago

Change “PE teacher” to “stack of health magazines” and it’s a more accurate equivalence.

Chozo · 10 months ago

Well… yeah. That’s not what LLMs do. That’s like saying “A leafblower got absolutely wrecked by 1998 Dodge Viper in beginner’s drag race”. It’s only impressive if you don’t understand what a leafblower is.

@misk@sopuli.xyz · edit-2 10 months ago

People write code with LLMs. Programming language is just a language specialised at precise logic. That’s what „AI” is advertised to be good at. How can you do that an not the other?

TimeSquirrel · 10 months ago

It’s not very good at it though, if you’ve ever used it to code. It automates and eases a lot of mundane tasks, but still requires a LOT of supervision and domain knowledge to not have it go off the rails or hallucinate code that’s either full of bugs or will never work. It’s not a “prompt and forget” thing, not by a long shot. It’s just an easier way to steal code it picked up from Stackoverflow and GitHub.

Me as a human will know to check how much data is going into a fixed size buffer somewhere and break out of the code if it exceeds it. The LLM will have no qualms about putting buffer overflow vulnerabilities all over your shit because it doesn’t care, it only wants to fulfill the prompt and get something to work.

@misk@sopuli.xyz · 10 months ago

I’m not saying it’s good at coding, I’m saying it’s specifically advertised as being very good at it.

@MagicShel@lemmy.zip · 10 months ago

“Precise logic” is specifically what AI is not any good at whatsoever.

AI might be able to write a program that beats an A2600 in chess, but it should not be expected to win at chess iteself.

@misk@sopuli.xyz · edit-2 10 months ago

I shall await the moment when AI pretends to be as confident about communicating not being able to do something as it is with the opposite because it looks like it’s my job somehow.

Wytch · 10 months ago

This article makes ChatGPT sound like a deranged blowhard, blaming everything but its own ineptitude for its failure.

So yeah, that tracks.

@Showroom7561@lemmy.ca · 10 months ago

In a quite unexpected turn of events, it is claimed that OpenAI’s ChatGPT “got absolutely wrecked on the beginner level” while playing Atari Chess.

Who the hell thought this was “unexpected”?

What’s next? ChatGPT vs. Microwave to see which can make instant oatmeal the fastest? 😂

@valgarf@discuss.tchncs.de · 10 months ago

Considering how much heat the servers probably generate, ChatGPT might have a decent chance in that competition 😁

@Showroom7561@lemmy.ca · 10 months ago

Air-fried oatmeal, FTW!

Arthur Besse · edit-2 10 months ago

This article buries the lede so much that many readers probably miss it completely: the important takeaway here, which is clearer in The Register’s version of the story, is that ChatGPT cannot actually play chess at all:

“Despite being given a baseline board layout to identify pieces, ChatGPT confused rooks for bishops, missed pawn forks, and repeatedly lost track of where pieces were."

To actually use an LLM as a chess engine without the kind of manual intervention that this person did, I think you would need to combine it with some other software to automate continuing to ask it for a different next move every time it suggests an invalid one. And, if you did that, it would still tend to lose, even to much older chess engines than Atari’s Video Chess.

(note: numerous people have done this; you can play chess against something based on chatgpt, and if you’re any good at chess you can win.)

@MagicShel@lemmy.zip · 10 months ago

You probably could train an AI to play chess and win, but it wouldn’t be an LLM.

In fact, let’s go see…

Stockfish: Open-source and regularly ranks at the top of computer chess tournaments. It uses advanced alpha-beta search and a neural network evaluation (NNUE).
Leela Chess Zero (Lc0): Inspired by DeepMind’s AlphaZero, it uses deep reinforcement learning and plays via a neural network with Monte Carlo tree search.
AlphaZero: Developed by DeepMind, it reached superhuman levels using reinforcement learning and defeated Stockfish in high-profile matches (though not under perfectly fair conditions).

Hmm. neural networks and reinforcement learning. So non-LLM AI.

you can play chess against something based on chatgpt, and if you’re any good at chess you can win

You don’t even have to be good. You can just flat out lie to ChatGPT because fiction and fact are intertwined in language.

“You can’t put me in check because your queen can only move 1d6 squares in a single turn.”

@Michal@programming.dev · 10 months ago

A simple calculator will also beat it at math.