Grandmaster-level chess without search

tzs · 2024-10-17T22:06:27.000000Z

OT: what's the state of the art in non-GM level computer chess?

Say I want to play chess with an opponent that is at about the same skill level as me, or perhaps I want to play with an opponent about 100 rating points above me for training.

Most engines let you dumb them down by cutting search depth, but that usually doesn't work well. Sure, you end up beating them about half the time if you cut the search down enough but it generally feels like they were still outplaying you for much of the game and you won because they made one or two blunders.

What I want is a computer opponent that plays at a level of my choosing but plays a game that feels like that of a typical human player of that level.

Are there such engines?

rococode · 2024-10-17T22:29:44.000000Z

Maia does this reasonably well! You can play against it on Lichess. I have gotten a few "feels like a human" moments when playing against it - for example, getting it to fall into a trap that could trick a human but would easily be seen by a traditional search algorithm. It's not adjustable but there are a few different versions with different ratings (although it's not a very wide range).

https://www.maiachess.com/

https://lichess.org/@/maia1

plaguuuuuu · 2024-10-18T01:16:00.000000Z

Piggy-backing off this - does anyone know of a quick way to evaluate the maia weights from python or js for a single board state? I'm trying to hack something together with my own search func intended for human play and I can't quite figure it out from the cpp in Lc0.

salamo · 2024-10-18T01:29:16.000000Z

I built something like this. It works as long as you're not too high-rated: chessmate.ai. Once players get higher rated it is more difficult to predict their moves because you need to model their search process, not just their intuitive move choice. It's also possible to train on one player's games only so that it is more personalized.

It uses a similar approach to Maia but with a different neural network, so it had a bit better move matching performance. And on top of that it has an expectation maximization algorithm so that the bot will try to exploit your mistakes.

primitivesuave · 2024-10-18T04:41:24.000000Z

Really nice work! The tabs other than "play" don't seem to be working, but I was able to try some novelty openings and it certainly felt like it was responding with human moves. It would be great to have the ability to go back/forth moves to try out different variations.

I'm curious how you combined Stockfish with your own model - but no worries if you're keeping the secret sauce a secret. All the best to you in building out this app!

salamo · 2024-10-18T07:22:07.000000Z

I'm happy you enjoyed it! There are definitely a few rough edges, yes.

Since the whole thing is executed in the browser (including the model) there aren't a ton of secrets for me to keep. Essentially it is expectation maximization: the bot tries to find the move with the highest value. What is "value"? Essentially, it is the dot product between the probability distribution coming out of the model and the centipawn evaluations from Stockfish.

In other words if the model thinks you will blunder with high probability, it will try to steer you towards making that mistake.

primitivesuave · 2024-10-18T23:34:46.000000Z

Thank you for the explanation! I completely understand the rough edges, I have some rough ideas out there myself. Would it be alright if I add a link to your site to our chess club's list of online resources?

I can also make a note of it privately and check back in with you in the future. I found it pretty remarkable that it played a human-like response to some niche openings - I actually ended up checking against Stockfish and it played different moves, which is pretty neat.

pgt · 2024-10-18T09:16:40.000000Z

Hello! I built a Chess AI also named Chessmate in high school (2005) that made it into a Minecraft mod 10 years later: http://petrustheron.com/posts/chessmate.html

Java source code here: https://github.com/theronic/chessmate

ssijak · 2024-10-18T09:26:14.000000Z

"Sure, you end up beating them about half the time if you cut the search down enough but it generally feels like they were still outplaying you for much of the game and you won because they made one or two blunders."

That is what winning in chess is. Minimising blunders.

Sesse__ · 2024-10-18T10:32:24.000000Z

“The winner of a game is the one who makes the second-to-last blunder.”

(Also this has come up in computer chess; it's more important to improve the quality of your worst moves than your average moves)

LanceH · 2024-10-18T04:16:01.000000Z

A long time ago I had the Fritz engine from chessbase. It had a sparring feature where if you maintained good play it would give up a tactical puzzle in the middle of the game. It could either warn you or not. If you didn't play solidly enough, you would just lose.

As far as I can tell, they got rid of this feature. It was the only computer opponent that felt real. Like it made a human mistake when put under pressure, rather than just playing like a computer and randomly deciding to play stupid.

rtlaker · 2024-10-17T22:10:41.000000Z

No, not with adjustable rating. The best human-like engine is fairymax, but its Elo is estimated between 1700-2000.

svachalek · 2024-10-18T21:29:42.000000Z

As a complete amateur I'd love a chess engine that can point out my mistakes, explain its own moves, and suggest better moves, in human terms. Like, this move would pin the white bishop. I don't know if this would be AI based or simply a search-based engine that scores in abstracted ways such as giving points for tactics like pins and forks, controlling the center, protecting the king, etc. (Typical chess search just scores based on piece and position and looks ahead farther than a human brain can handle.)

espadrine · 2024-10-18T17:39:18.000000Z

KataGo has a special model weights release with human-like play at various Elo: https://github.com/lightvector/KataGo/blob/master/docs/Analy...

You can see in the release notes a few screenshot examples where a particular move changes likelihood as you get to higher-level play: https://github.com/lightvector/KataGo/releases/tag/v1.15.0

WhitneyLand · 2024-10-18T02:53:17.000000Z

What’s your rating, have you tried gpt4o?

It’s supposedly good up to about 1300, but aside from that the ability to prompt can make the style of play somewhat tunable for ex aggressive, defensive, etc.

wavemode · 2024-10-18T15:39:50.000000Z

GPT is incapable of playing a full game of chess without making lots of illegal moves, in my experience.

Even correcting for those, it's still horrendously bad at the game. Where are you getting this 1300 number?

anamexis · 2024-10-18T03:44:06.000000Z

Do you know if there are any interfaces to play against got4o? Or is it just typing in algebraic moves back and forth?

stackghost · 2024-10-18T04:10:29.000000Z

It prints an ASCII board, and then yes, you type algebraic moves back and forth and it updates the board for you each turn.

netdevnet · 2024-10-18T08:10:43.000000Z

How would you even go about making a model that can simulate a human chess skillset (saying levels implies that chess skillset is a scalar value while it is more reasonable to think of it as a tree of skills where your abilities might be higher or lower depend ending on the specific skill branch)

anton-107 · 2024-10-18T09:18:26.000000Z

take millions of games of human players of certain rating only as your learning data?

Someone · 2024-10-18T10:23:16.000000Z

In the context of this thread (“non-GM level computer chess”, which I read as also excluding International, FIDE Master, and Candidate Master (https://en.wikipedia.org/wiki/Grandmaster_(chess))), I think it’s more important to not have a good learning algorithm.

Even 10 thousand of such games may already have way more tactics than a player at the targeted level can detect and apply. If so, a learning algorithm that detects and remembers all of them already will be better than the target level.

netdevnet · 2024-10-18T12:17:20.000000Z

Exactly. Level x (whatever scalar thing the user meant by that) doesn't quite work out for the reason you outlined. X Level Players have different tactics and someone that can use all of them will likely be better than most if not all those those players. I got downvoted for saying that. Maybe I didn't phrase it as well as you did

wavemode · 2024-10-18T15:42:21.000000Z

Yeah but, won't it also be learning from the mistakes and missed tactics too? (Assuming its reward function is telling it to predict the human's move, rather than actually trying to win.)

WithinReason · 2024-10-18T12:10:12.000000Z

condition the move on ELO while training

netdevnet · 2024-10-18T09:24:11.000000Z

You are assuming that's going to be a reliable proxy, what would make you think that?

PeterStuer · 2024-10-18T05:32:20.000000Z

I wonder if loweing the 'temperature' level (how rigid they select the max value completion) on these types of models would achieve this? Might be tricky to tune as the outcome is likely non=linear

gcr · 2024-10-18T13:33:30.000000Z

Stockfish is the classic answer, though I’m not sure how well it’s graded. Someone must have a “Stockfish strength”-to-ELO mapping.

tzs · 2024-10-18T14:47:27.000000Z

It's not getting an engine to play in the right rating range that is the problem. It's getting it to play like a human would play in that rating range.

The average rating of tournament chess players in the US is around USCF 1550. I'm not sure what their FIDE rating would be. FIDE ratings are usually 50-100 points lower than USCF ratings but that's based on comparing people that have both ratings which for the most part are strong masters and above.

A human with a USCF 1550 rating will typically be mostly making moves that are suboptimal in a variety of ways: piece coordination, king safety, planning, pawn structure, development, search, and more. Basically they are around 1550 at nearly everything. There will be variations of course. A particular player might be worse at king safety and better at tactics for instance, but it will be something like they handle king safety like a 1400 and tactics like a 1700.

With a GM level engine turned down to 1550 you tend to see aspects of GM level play still in its game. If you are a 1550 playing against it it doesn't feel like you playing the kind of opponent you will play if you enter a local chess tournament and get paired with another 1450-1650 player.

It feels like you are playing someone with a totally different approach to chess than you who just happens to lose or draw to you about the same amount as a 1450-1650 human.

Out_of_Characte · 2024-10-18T18:18:15.000000Z

What I find fasinating is how bad human beings are at chess. Now that we have engines we're finally able to analise every game ever played and they show us everything in chess that we're blind to. Their ability to never blunder in 1 or 3 moves is admirable, and better than most players, and to say nothing about their ability to make you play out the longest possible chain before checkmate. What I found most insulting is when I played against the best bot I could beat, it gave up its rook for free.

Scene_Cast2 · 2024-10-18T02:22:56.000000Z

I'm currently trying to build one, fwiw.

andrelaszlo · 2024-10-18T03:31:44.000000Z

Cool! I've been wondering for s while if it wouldn't be possible to use lichess games for various ratings to make typical mistakes.

I'm also curious about if it would be possible to mimic certain playing styles. Two beginners can have the same rating but one might lose because they have a weak opening, and the other one because they mess upo the end game, for example.

Random mistakes doesn't mimic human play very well.

Scene_Cast2 · 2024-10-18T12:20:58.000000Z

Exactly. My eventual goal is to be able to emulate any single player with a public game history. Maybe even flag unhuman-like moves that also happen to be top stockfish moves as possible cheating.

My current chess engine already hangs its queen sometimes and walks into forks. I'm still experimenting with how to improve personalization.

og_kalu · 2024-10-18T00:49:56.000000Z

GPT-3.5-turbo-instruct has a peak Elo of around 1800 (but of course can be prompted to play with less skill) and is as human like as you'll get at that level.

scotty79 · 2024-10-18T10:05:25.000000Z

> [...] feels like they were still outplaying you for much of the game and you won because they made one or two blunders.

That's why I don't like winning in multiplyer games. Usually when you win you either feel like the opponent just played comically bad on sufficient number of occasions or that they played well but in few instances you got undully lucky and it could have gone either way. Very rarely you get the desired feeling that opponent played well but you just played a little better overall so your win is deserved. It almost always seem like it's not that you are winning but the opponent is losing instead. And none of that is about AI. Making AI that lets you win symmetrical games satisfyingly and teaches you with your losses in a satisfying manner would be a billion dollar business. I don't think it can be done without some serious psychology research.

6510 · 2024-10-18T15:39:25.000000Z

It would be hilarious if they downgraded by being more aggressive. For example: It needs to score n points worth of aggressive moves that are not the best moves. After screwing up the position by n they can go back to playing the best moves again.

Otherwise you wouldn't really be learning anything useful. You would end up with an opening vocabulary that good players would easily punish. If you play crappy gambits leading to positions you know well the better players will think highly of you.

Best way to learn is to play the hardest possible engines and just take back moves when it becomes evident you've screwed up.

danielmarkbruce · 2024-10-17T22:21:20.000000Z

It doesn't seem that difficult to pull off - take one of the existing engines, get the top y moves, choose randomly. For each level down increase y by 1.

ajkjk · 2024-10-17T22:33:23.000000Z

No, it doesn't work at all. Human mistakes are not at all like computer mistakes. Like -- blundering a piece in a 1-2 move combination will straight up never show up in the stockfish move tree, no matter what you set `y` to.

agubelu · 2024-10-17T22:28:00.000000Z

It doesn't work that way. There are many positions with lots of moves that are reasonable, but many others with only 1-2 sensible moves. It would make lots of obvious blunders that an amateur human would never make.

chmod775 · 2024-10-17T22:32:28.000000Z

Also attention. Lower level human players are more likely to make a move close to their own/their opponent's recent move. They're focused on one area of the board.

Basic computer opponents on the other hand can make moves all over the place. They look at the board state holistically. This can be very frustrating to play against as a human who has enough problems just thinking their way through some subset of the board, but is thrown off by the computer again and again.

It's not that bad in chess at least (compared to Go), but still something worth to keep in mind if you're trying to make an AI that is fun to play against as an amateur.

dullcrisp · 2024-10-17T22:29:48.000000Z

Seems this might still have the problem of moves being either extremely good or extremely bad depending on how many good moves are found, rather than playing at a consistent level. Or for example in a degenerate case where there are only two moves and one leads to mate, the computer will be picking randomly.

danielmarkbruce · 2024-10-18T14:45:55.000000Z

I don't know enough about chess but surely amateurs don't play at a consistent level. Like, I play a little and I'm quite certain some of my moves are awful (20%?) and some are the same move magnus makes (10%?) and most are on a spectrum between (70%?).

teo_zero · 2024-10-18T14:18:21.000000Z

Better: take one of the existing engines, sort the moves from the best to the worst, if the top move has score S, randomly choose among the moves with score >= 0.9*S.

You can simulate a better/worse player by increasing/decreasing the factor: 1 plays as well as the chosen engine can do, 0 is typing random (yet valid) moves on the keyboard.

danielmarkbruce · 2024-10-18T14:41:51.000000Z

Yup this is better.

bobmcnamara · 2024-10-17T22:32:19.000000Z

Your engine would only mate once it had y options to mate.

danielmarkbruce · 2024-10-18T14:43:19.000000Z

Any time it had an option to mate it would have a chance to mate. It's choose randomly amongst top n, not take the nth.

dullcrisp · 2024-10-17T23:05:40.000000Z

Or y chances. They did say it’d pick randomly. Still not great though, if a bit less funny.

hlfshell · 2024-10-17T20:36:29.000000Z

I did a talk about this! (And also wrote up about my talk here[1]). This paper is a great example of both knowledge distillation. It's less of a paper about chess and more about how complicated non linear search functions - complete with whatever tuning experts can prepare - can be distilled into a (quasi-linear, if it's a standardized input like chess) transformer model.

[1]: https://hlfshell.ai/posts/deepmind-grandmaster-chess-without...

janalsncm · 2024-10-17T21:29:28.000000Z

I think the vs. humans result should be taken with a huge grain of salt. These are blitz games, and their engine’s elo was far higher against humans than against other bots. So it’s likely that time was a factor, where humans are likely to flag (run out of time) or blunder in low time situations.

It’s still very cool that they could learn a very good eval function that doesn’t require search. I would’ve liked the authors to throw out the games where the Stockfish fallback kicked in though. Even for a human, mate in 2 vs mate in 10 is the difference between a win and a draw/loss on time.

I also would’ve liked to see a head to head with limited search depth Stockfish. That would tell us approximately how much of the search tree their eval function distilled.

hlfshell · 2024-10-17T21:32:29.000000Z

The reason the time (blitz) games make sense is because the distilled functionality is of a 50ms Stockfish eval function. The engine likely would perform worse as only the human would benefit from the additional time.

As for limited search tree I like the idea! I think it's tough to measure, since the time it takes to perform search across various depths vary wildly based on the complexity of the position. I feel like you would have to compile a dataset of specific positions identified to require significant depth of search to find a "good" move.

janalsncm · 2024-10-17T21:47:16.000000Z

My point is that if the computer never flags it will have an inherent advantage in low time controls. If not, why not just test it in hyperbullet games? Games where humans flag in a drawn or winning position need to be excluded, otherwise it’s unclear what this is even measuring.

And limited depth games would not have been difficult to run. You can run a limited search Stockfish on a laptop using the UCI protocol: https://github.com/official-stockfish/Stockfish/wiki/UCI-%26...

codeflo · 2024-10-18T11:13:31.000000Z

But the headline is “GM-level performance without search”, not “computer beats human at mouse movement speed contest”.

Scene_Cast2 · 2024-10-18T02:25:46.000000Z

If anyone is looking to get into chess neural nets, I highly recommend this repo - https://github.com/sgrvinod/chess-transformers

It uses paradigmatic PyTorch with easy to read code, and the architecture is similar to the current best performing chess neural nets.

osti · 2024-10-17T23:05:44.000000Z

https://lczero.org/blog/2024/02/how-well-do-lc0-networks-com...

The best neural network chess engine's authors wrote about this deepminds publication.

stabbles · 2024-10-18T10:30:23.000000Z

LC0 hasn't been the best neural network chess engine since Stockfish added NNUE in 2020.

vlovich123 · 2024-10-18T15:47:33.000000Z

That isn’t quite accurate. Stockfish’s NNUE was trained against Leela evaluated positions.

> Generally considered to be the strongest GPU engine, it continues to provide open data which is essential for training our NNUE networks. They released version 0.31.1 of their engine a few weeks ago, check it out!

The main difference is that Stockfish is targeting to run on the CPU while Leela targets the GPU. That stockfish is able to be competitive with Leela is of course impressive.

https://lichess.org/@/StockfishNews/blog/stockfish-17-is-her...

osti · 2024-10-18T16:58:34.000000Z

True, but since stockfish uses a way way smaller network, I still prefer to think of stockfish as the traditional engine.

chvid · 2024-10-18T08:14:54.000000Z

But the gigantic synthetic dataset that is used for training is created with plenty of traditional search. So it is all a bit silly but I guess cool none the less ...

amunozo · 2024-10-18T09:17:28.000000Z

It's a knowledge distillation. You can then use this smaller, more efficient models instead of the larger one.

chvid · 2024-10-18T11:23:29.000000Z

Or maybe it is just memorizing a very large number of games.

azakai · 2024-10-18T16:24:27.000000Z

They address the possibility of memorization in the PDF:

> This effect cannot be explained by memorization since < 1.41% of the initial puzzle board states appear in our training set.

tech_ken · 2024-10-18T18:58:37.000000Z

Seems more like a 'compression' of the large number of games, or even like an approximate 'index' of the database

alkonaut · 2024-10-18T20:18:17.000000Z

Is this network smaller than stockfish and by what metric is that?

chvid · 2024-10-18T08:16:43.000000Z

If anything it demonstrates the limits of NN. A human brain can learn based on far fewer examples.

jxy · 2024-10-18T18:28:02.000000Z

Nature's evolution algorithm took millions of years to find the architecture and the base model, which then takes decades to be fine tuned to be able to form this opinion.

msoad · 2024-10-18T09:33:27.000000Z

Searched only once. If this can be applied to other knowledge with this efficiency we're onto something

sksxihve · 2024-10-18T16:35:09.000000Z

Isn't generating the training data by running stockfish on all the board positions for all the games just encoding the search tree into the transformer model?

So increasing the number of parameters to the model would allow it to encode more of the search tree and give better performance, which doesn't seem all that interesting.

mewpmewp2 · 2024-10-18T16:46:56.000000Z

How could it be possible to encode a search tree like this though.

timmg · 2024-10-18T19:20:09.000000Z

Imagine you collected a billion unique, feasible board positions (all positions is intractable, but most possible positions are impractical) and the best nest move for each. That "best next move" is the result of a tree search.

Now use a transformer to "compress" that information into its model. It sounds like that is approximately what is going on here. Certainly, the model is likely to generalize some aspects of the data (just like LLMs do). But for the most part, the model encodes the information from the Stockfish evaluation.

(This is just my guess of what we are seeing.)

sksxihve · 2024-10-18T20:08:52.000000Z

Exactly, the title says "without search" but in the paper it says "without explicit search", having a system learn to play chess at a grandmaster-level without any search for play or training would be far more impressive, what this does seems pretty obvious that it would work.

sourcepluck · 2024-10-18T03:09:54.000000Z

I believe GM and chess author (and all-round lovely fellow) Matthew Sadler rigged up Leela Zero to effectively play off intuition and do very little or no search for training games. He could usually beat it, but not always. Think it might have been in The Silicon Road to Chess Improvement.

Scarblac · 2024-10-18T13:17:19.000000Z

He also has very entertaining Youtube videos about the kind of wild opening discoveries Leela comes up with when contempt is set really high (ie it wants to maximally avoid draws), combined with his 2700+ commentary on them.

Davidzheng · 2024-10-18T08:58:36.000000Z

I mean for lczero you can just set the max depth at 1 ply for example

sourcepluck · 2024-10-18T11:12:37.000000Z

Oh, maybe it was that simple!

QuantumGood · 2024-10-18T20:02:30.000000Z

This repository provides an implementation of our paper Grandmaster-Level Chess Without Search. https://arxiv.org/abs/2402.04494

The recent breakthrough successes in machine learning are mainly attributed to scale: namely large-scale attention-based architectures and datasets of unprecedented scale. This paper investigates the impact of training at scale for chess. Unlike traditional chess engines that rely on complex heuristics, explicit search, or a combination of both, we train a 270M parameter transformer model with supervised learning on a dataset of 10 million chess games. We annotate each board in the dataset with action-values provided by the powerful Stockfish 16 engine, leading to roughly 15 billion data points. Our largest model reaches a Lichess blitz Elo of 2895 against humans, and successfully solves a series of challenging chess puzzles, without any domain-specific tweaks or explicit search algorithms. We also show that our model outperforms AlphaZero's policy and value networks (without MCTS) and GPT-3.5-turbo-instruct. A systematic investigation of model and dataset size shows that strong chess performance only arises at sufficient scale. To validate our results, we perform an extensive series of ablations of design choices and hyperparameters.

amoss · 2024-10-18T13:20:20.000000Z

If you solve chess then you have a tree that is too large for us to currently compute (about 10^80 although my memory may be way off). Annotating that tree with win / loss / draw would allow an optimal player without search. The two obvious approaches to compression / optimization are to approximate the tree, and to approximate the annotations. How well those two approaches would work depends a lot on the structure of the tree.

This result seems to tell us less about the power of the training approach (in absolute terms) and more about how amenable the chess game tree is to those two approaches (in relative terms). What I would take away is that a reasonable approximation of that tree can be made in 270M words of data.

timerol · 2024-10-18T13:55:56.000000Z

Note that the exact version of this technique is used in chess for the endgame, referred to as a tablebase. Chess is solved once there are 7 pieces on the board, in an 18.4TB database, described here: https://lichess.org/@/lichess/blog/7-piece-syzygy-tablebases...

7373737373 · 2024-10-18T14:54:15.000000Z

Makes me wonder what % of games end with <=7 pieces

adgjlsfhk1 · 2024-10-18T16:58:46.000000Z

at a high level, almost all of them that aren't draws. if you have pieces in the right place, you should've be checkmateable. Without blunders, wins occur after a small advantage is followed by slightly favorable trades into a winning position

bob1029 · 2024-10-18T16:07:55.000000Z

https://arxiv.org/abs/2402.04494

> Board states s are encoded as FEN strings which we convert to fixed-length strings of 77 characters where the ASCII-code of each character is one token. A FEN string is a description of all pieces on the board, whose turn it is, the castling availability for both players, a potential en passant target, a half-move clock and a full-move counter. We essentially take any variable-length field in the FEN string, and convert it into a fixed-length sub-string by padding with ‘.’ if needed. We never flip the board; the FEN string always starts at rank 1, even when it is the black’s turn. We store the actions in UCI notation (e.g., ‘e2e4’ for the well-known white opening move). To tokenize them we determine all possible legal actions across games, which is 1968, sort them alphanumerically (case-sensitive), and take the action’s index as the token, meaning actions are always described by a single token (all details in Section A.1).

I am starting to notice a pattern in these papers - Writing hyper-specific tokenizers for the target problem.

How would this model perform if we made a small change to the rules of chess and continued using the same tokenizer? If we find we need to rewrite the tokenizer for every problem variant, then I argue this is just ordinary programming in a very expensive disguise.

kidintech · 2024-10-18T16:33:49.000000Z

// personal opinion: I think machine learning as it currently stands is widely overhyped

How is this the top comment?

> I am starting to notice a pattern in these papers - Writing hyper-specific tokenizers for the target problem.

This is merely expressing what they consider as part of a game state, which is entirely needed for what they set out to do.

> I argue this is just ordinary programming

"Ordinary programming" (what does that mean?) for such a task implies extraordinary chess intuition, capable of conjuring rules and heuristics for the task of comparing two game states and saying which one is "better" (what does better mean?).

> How would this model perform if we made a small change to the rules of chess and continued using the same tokenizer?

If by "small change" you are implying i.e. removing the ability to castle, then sure, the tokenizer would need to be rewritten. At the same time, the entire training dataset would need to be changed, such that the games are valid under your new ruleset. How is this controversial or unexpected?

It feels like you are expecting that state of the art technology allows us to input an arbitrary ruleset and the mighty computer immediately plays an arbitrary game optimally. Unfortunately, this is not the case, but that does not take anything away from this paper.

mewpmewp2 · 2024-10-18T16:43:45.000000Z

I don't know much about this space, but it seems like this could be solved by leaving a good amount of empty tokens that you would only start using when they arise. Or leave tokens which you can use together to combine anything for various edge cases. Because if you have all the characters as tokens you can combine them into anything.

BurningFrog · 2024-10-18T16:43:16.000000Z

If you change to the rules, you have a different game than chess.

Since there is no training data for that game, I don't know you get this kind of AI to do anything?

jackmalpo · 2024-10-17T23:55:54.000000Z

what i would love is an engine that thinks more like a human. presumably since this uses stockfish annotated games, it basically ends up thinking like a computer. thinking like a human would be awesome for game reviews to walk through things to note in different positions (tuned to my elo).

levocardia · 2024-10-18T02:55:44.000000Z

Or a model whose performance is measured via its efficiency of learning--in other words, how many games does it need to play to learn to play at X level? The reason Magnus Carlsen is impressive is because he's reached his ability level in chess under enormous time and computation constraints, compared to a computer. His efficiency of learning is extraordinary compared to that of any chess engine.

7373737373 · 2024-10-18T14:57:13.000000Z

Or even the opposite end of the spectrum - with extremely limited resources (memory/program size/computation time): https://rlc-chess.com/ - like demoscene programs. Capable 1kb chess programs exist!

BoardsOfCanada · 2024-10-18T05:38:27.000000Z

It's somewhat telling that they chose Stockfish as the oracle and not AlphaZero.

Davidzheng · 2024-10-18T09:00:44.000000Z

It's well established that SF today is stronger. Btw this is pretty bullish case for open source recreations of RL advancements to be catching up with ~2 year lag. Hopefully alphaproof is also reproduced in that timeframe

epolanski · 2024-10-18T11:08:46.000000Z

Stockfish is stronger than AlphaZero or any other chess engine from quite some time.

Another interesting tidbit, both engines share (or at least, shared) the same lead developer.

vlovich123 · 2024-10-18T15:51:25.000000Z

To be fair though, deep mind hasn’t been working to improve alpha zero for chess playing. Their focus is a bit difference than the dedicated volunteers working on stockfish (+ it’s been a few years)

imranhou · 2024-10-18T00:29:42.000000Z

From the page: "We also show that our model outperforms AlphaZero's policy and value networks (without MCTS) and GPT-3.5-turbo-instruct."

Why compare this to GPT-3.5-turbo-instruct? Is that near SOTA in this space?

og_kalu · 2024-10-18T00:56:44.000000Z

As far as anyone knows, 3.5-turbo-instruct is the best chess playing (certainly it was at the time of the paper) LLM. About 1800 Elo and < 0.1% Illegal move rate. It's unclear why it was so much better than 4 (lack of RLHF?, Data?) and I don't know if anyone has bothered to test 4o similarly but it was pretty big news online at the time.

Davidzheng · 2024-10-18T09:01:29.000000Z

OA definitely purposefully trained its chess strength

og_kalu · 2024-10-18T15:45:00.000000Z

I'm sure they did but there's no reason to believe they pretrained it on chess anymore than 4 so there's some speculation the post training processes mess things up. Turbo instruct does not go through RLHF for instance.

RayVR · 2024-10-18T00:33:48.000000Z

I forget the rough adjustment factors, but it is worth noting that lichess Elo is not the same as chess.com or FIDE. I think lichess is typically ~300 points above chess.com.

This implies the model is around 2500 blitz vs humans. As blitz elo are often much higher than in classical time controls, 2500 elo on chess.com places it firmly in the 'good but not great' level.

I am very curious to know whether the model suffers from the same eval problems vs the well known "anti-bot" openings that stockfish is susceptible to at limited search depths.

gpm · 2024-10-18T00:51:47.000000Z

> I think lichess is typically ~300 points above chess.com.

Yeah, no. They are two different rating systems (not ELO incidentally) with different curves, there isn't a fixed difference you can apply. At the high end of the scale lichess ratings are below, not above, chess.com ratings. E.g. Magnus Carlsen is 3131 blitz on lichess [0], 3294 blitz on chess.com [1].

This website [2] tries to translate between the sites, and figures that a 2925 lichess blitz rating (the closet on the website to the one reported in the paper of 2895) translates to 3000 chess.com.

[0] Multiple accounts but this is the one I found with the most blitz games: https://lichess.org/@/DrNykterstein/perf/blitz

[1] https://www.chess.com/member/magnuscarlsen

[2] https://chessgoals.com/rating-comparison/#lichesschesscom

jdck1326 · 2024-10-18T06:21:49.000000Z

Elo. Not acronym, named for its inventor Arpad Emmerich Elo

https://en.wikipedia.org/wiki/Elo_rating_system

RayVR · 2024-10-18T09:23:07.000000Z

Cool, thanks for the information! I did not realize the curves were that different.

3000 definitely bumps it up but it would still be worse than the top humans. stockfish is better by a lot.

YeGoblynQueenne · 2024-10-17T23:37:30.000000Z

Excellent work but I suggest a slightly different title:

"What would Stockfish Do?"

A more appropriate title; because Stockfish is a search-base system and DeepMind's approach wouldn't work without it.

Oh, btw, this is (yet another) a Neurosymbolic system of the "compiling system 2 to system 1" type.

squidgedcricket · 2024-10-18T02:04:26.000000Z

Would it be feasible to create a complete lookup table of 'best' moves for all given board configurations? I'm not sure how to determine the total number of configurations. Not the same as a tablebase, just a single next move rather than sequence to checkmate.

It wouldn't be competitive against top tier players and AI, but I wouldn't be surprised if it could beat me. 'Instantly' knowing the next move would be a cool trick.

jeremyjh · 2024-10-18T02:21:17.000000Z

There are more possible chess games than there are atoms in the universe. It can't be solved by brute force.

squidgedcricket · 2024-10-18T02:49:06.000000Z

There's a lot of chess configs, but there's a LOT of atoms in the observable universe. I suspect there's a few in the unobservable universe too.

Chess configs = 4.8 x 10^44, Atoms > 10^70

https://tromp.github.io/chess/chess.html https://physics.stackexchange.com/questions/47941/dumbed-dow...

You might be able to pull off a low-resolution lookup table. Take some big but manageable number N (e.g 10^10) and calculate the maximally even distribution of those points over the total space of chessboard configurations. Then make a lookup table for those configs. In play, for configs not in the table, interpolate between nearest points in the table.

jeremyjh · 2024-10-18T10:11:19.000000Z

I didn't say chess positions, I said chess games. That number has a lower-bound of 10^120.

https://en.wikipedia.org/wiki/Shannon_number

Scarblac · 2024-10-18T13:13:42.000000Z

But that's not the relevant thing if we're talking about storing a best move per possible position.

jeremyjh · 2024-10-18T15:12:39.000000Z

Unless you’ve calculated every line to a forced win or draw you don’t actually know the objective evaluation of a position and so you can’t determine “best move”. That’s what a tablebase is.

Scarblac · 2024-10-19T06:51:54.000000Z

Yes, I figured that he would need a tablebase anyway. But that's still a few bits per position, the number of possible games doesn't come in to it.

roenxi · 2024-10-18T03:20:52.000000Z

That is basically what a neural network based chess engine is. The function the neural network is encoding is logically equivalent to "probability this move is the best for this board state".

The resolution isn't great, and adding search to that can be used to develop an implicit measure of how accurate the function is (ie, probability the move suggested in a position remains unchanged after searching the move tree for better alternatives).

k2xl · 2024-10-18T02:08:57.000000Z

The amount of data that would be required for a lookup table for all best moves for every board configuration would be infeasible.

They have managed to create one for 7 pieces. Last update on trying to get to 8 piece database: https://www.chess.com/blog/Rocky64/eight-piece-tablebases-a-...

squidgedcricket · 2024-10-18T02:27:18.000000Z

Yup, and it looks like a complete tablebase from the start of the game won't ever be feasible.

> From May to August 2018 Bojun Guo generated 7-piece tables. The 7-piece tablebase contains 423,836,835,667,331 unique legal positions in about 18 Terabytes.

andrelaszlo · 2024-10-18T03:41:01.000000Z

Almost halfway there ;)

jeremyjh · 2024-10-18T12:53:10.000000Z

There are 32 pieces on the board at the start of the game.

ChrisArchitect · 2024-10-17T21:28:39.000000Z

Associated discussion on the paper:

Grandmaster-Level Chess Without Search

https://news.ycombinator.com/item?id=39301944

dgraph_advocate · 2024-10-17T20:50:25.000000Z

This is missing a makefile to automate the manual installation steps

barelyusable · 2024-10-18T04:53:58.000000Z

What are some of the go-to books/articles for computer chess? I like the game and have a decent understanding of basics, so studying algorithms based on the game would be a good opportunity for me to learn conventional algos, but also RL/ML/MCTS etc. Also I wonder what is the go-to codebase these days?

sinuhe69 · 2024-10-18T09:36:28.000000Z

https://www.chessprogramming.org/

Is the portal to go. From there, you can dig deeper in many relevant themes.

janalsncm · 2024-10-18T07:55:22.000000Z

Stockfish and Leela are the two best engines and they’re both on GitHub. Although I will say they are hyperoptimized to win.

You can also ask any noob questions on their Discords.

rawsh · 2024-10-18T01:42:00.000000Z

You can actually get solid performance with pretrained chat models: https://raw.sh/posts/chess_puzzles

On lichess puzzles gpt4o with the compiled prompt is around 70%, I think the 270M transformer is around 95%

veltas · 2024-10-18T14:58:20.000000Z

Playing chess with a 20TB cerebellum

joelthelion · 2024-10-17T20:42:41.000000Z

I wonder if you could creatively combine this model with search algorithms to advance the state of the art in computer chess? I wouldn't be surprised to see such a bot pop up on tcec in a couple years.

janalsncm · 2024-10-17T21:17:49.000000Z

The advantage of this flavor of engine is that it might make parallel position evaluation extremely efficient. Calculate 1024 leaf positions and batch them to the model, take the top 10% and explore their sub-trees either via further GPU batching or minimax eval.

NNUE already tries to distill a subtree eval into a neural net, but it’s optimized for CPU rather than GPU.

hinkley · 2024-10-18T01:06:24.000000Z

As a game player I want to play an opponent that behaves like a human. Otherwise I’m always looking for the flaw in the design that I can exploit, which wins me the game but is less fun.

What you’re discussing sounds like intuition with checking, which is pretty close to how humans with a moderate degree of skill behave. I haven’t known enough Chess or Go masters to have any claim on how they think. But most of us don’t want an opponent at that level and if we did, we would certainly find a human, or just play against ourselves.

salamo · 2024-10-18T02:49:28.000000Z

The issue is that humans and computers don't evaluate board positions in the same way. A computer will analyze every possible move, and then every possible response to each of those moves, etc. Human grandmasters will typically only analyze a handful of candidate moves, and a few possible replies to those moves. This means human search is much narrower and shallower.

If you want a computer that plays like a human, you will probably need to imitate the way that a human thinks about the game. This means for example thinking about the interactions between pieces and the flow of the game rather than stateless evaluations.

hinkley · 2024-10-18T03:46:20.000000Z

Grandparent was suggesting the hybrid approach where you select a handful of good candidate positions and then explore them (DFS) as far as possible. Which is pretty much how humans work.

alfalfasprout · 2024-10-17T20:52:20.000000Z

The thing is classical chess (unlike eg; go) is essentially "solved" when run on computers capable of extreme depth. Modern chess engines play essentially flawlessly.

KK7NIL · 2024-10-17T20:59:04.000000Z

The developers of stockfish and lc0 (and the many weaker engines around) would disagree, we've seen their strength improve considerably over the last few years.

Currently there's a very interesting war between small neural networks on the CPU with high search depth alpha-beta pruning (stockfish NNUE) and big neural networks on a GPU with Monte Carlo search and lower depth (lc0).

So, while machines beating humans is "solved", chess is very far from solved (just ask the guys who have actually solved chess endgames with 8 or less pieces).

KolmogorovComp · 2024-10-18T15:11:19.000000Z

> ask the guys who have actually solved chess endgames with 8 or less pieces

Source?

GaggiX · 2024-10-17T21:12:22.000000Z

Stockfish and lc0 would always draw if they are not put in unbalanced starting positions, the starting position will be swapped in the next game to make it fair.

KK7NIL · 2024-10-17T21:35:37.000000Z

In classical controls (what TCEC mainly uses), yes. They can play pretty exciting bullet chess without a forced opening though.

janalsncm · 2024-10-17T21:12:48.000000Z

Chess is not “solved”. Solved doesn’t mean computers can beat humans, it means for any chess board position we can tell whether white wins, black wins, or the game is drawn with perfect play. We would know if the starting position was drawn, for example.

No computers now or in the foreseeable future will be capable of solving chess. It has an average branching factor over 30 and games can be over 100 moves.

Davidzheng · 2024-10-18T09:03:45.000000Z

There's strong solved and weak latter only needs to be unbeatable from starting. Now it's definitely not probable that SF isn't beatable from initial but honestly it's not impossible. The drawing margin is pretty big for engines

solveit · 2024-10-17T20:59:36.000000Z

We really have no way to know this. But I would be very surprised if modern chess engines didn't regularly blunder into losing (from the perspective of a hypothetical 32-piece tablebase) positions, and very very surprised if modern chess engines perfectly converted tablebase-winning positions.

janalsncm · 2024-10-17T21:37:15.000000Z

The fact that TCEC games aren’t all draws suggests that computers aren’t perfect. Stockfish loses to Leela sometimes for example.

grumpopotamus · 2024-10-17T22:51:43.000000Z

Tcec games are deliberately played from imbalanced opening positions. The draw rate would be much higher for the top participants if this wasn't forced. However, I agree that engines are not perfect. I have heard this claim many times before a new engine came along that showed just how beatable the state of the art engines still were at the time.

Davidzheng · 2024-10-18T09:07:08.000000Z

Most TCEC starting positions are borderline lost

KK7NIL · 2024-10-17T23:07:15.000000Z

We do know this, there are many positions (primarily sharp middle game one's) where SF/lc0 will significantly change their evaluation as they go deeper. This problem gets better the more time they spend on one position but it's an inevitable consequence of the horizon effect and it's why (except for 8 pieces or less), chess is far from solved.

Davidzheng · 2024-10-18T09:08:15.000000Z

Far from strongly solved but i would wager current SF will not lose half of its white games against any future engine

Davidzheng · 2024-10-18T09:06:49.000000Z

Ofc they do but the more interesting question for weak solved is whether they do in mainline positions (like mainline Berlin, mainline petroff, etc) where you can hold equality in many ways and engines are printing 0.0 everywhere

__s · 2024-10-17T21:01:42.000000Z

not only blunder into losing positions, but also blunder from winning positions into draws

even in human chess people sometimes mistaken draw frequency to reflect both sides playing optimally, but there are many games where a winning advantage slips away into a draw

primitivesuave · 2024-10-17T21:19:04.000000Z

This is accurate for endgames only. In complicated positions, there is still room for improvement - the recent game of lc0 vs stockfish where lc0 forced a draw against an impending checkmate is a good example. There is currently no way for a chess engine searching a massive game tree can see how an innocuous pawn move enables a forced stalemate 40 moves down the line.

Davidzheng · 2024-10-18T09:09:20.000000Z

Honestly SF plays better in middle game positions on average I would guess. I think usually there's a bigger draw margin in middle games

primitivesuave · 2024-10-18T23:39:15.000000Z

I think you are correct as I recall there is a set of middle game chess puzzles where Stockfish outperformed the other chess engines by a wide margin - I can't find the link as it was years ago. Not sure how the state-of-the-art has progressed since then. But I do believe the "horizon effect" plays a role in whether an engine decides to forcibly draw a game, which afforded Stockfish + NNUE a distinct advantage (at least at the time).

__s · 2024-10-17T21:00:55.000000Z

compared to humans yes, but between themselves in TCEC progress continues. TCEC has AIs play both sides of random openings, rather than stick to playing chess's initial position. The same happens for checkers amongst humans, where opening positions are randomized

dougSF70 · 2024-10-18T04:58:48.000000Z

Slightly off topic but I built https://chessladders.com

1bit_e · 2024-10-18T15:21:25.000000Z

Nice! I built https://www.chesspuzzlebot.com/ (Puzzles against stockfish)

vanjajaja1 · 2024-10-18T13:45:21.000000Z

what i like about this is that it implies you can build heuristics good enough to make it to GM level. this is great because i find calculating moves a headache

jvanderbot · 2024-10-18T13:49:40.000000Z

The thing is the heuristic done by a huge network might be insanely complex and doing all kinds of calculations. it's just that it's one function call so we ignore all those calculations. It's not immediately obvious that deploying a transformer to solve for next best move means that a human mind can avoid difficult calculations and just play by gut.

There's just too much wordplay going on with "heuristic"

mewpmewp2 · 2024-10-18T16:57:01.000000Z

Arguably intuition and gut feeling are also insanely complex systems doing all kinds of calculations.

jvanderbot · 2024-10-18T22:05:44.000000Z

Sure, but apples and oranges have a lot in common too, and if we call everything fruit, it's unnecessarily hard to differentiate them ... is all I'm saying.

dudeinjapan · 2024-10-19T02:34:28.000000Z

OK but when will someone make grandmaster-level search without chess?

xiaodai · 2024-10-18T05:37:58.000000Z

this paper is so dumb. so you modeled the output of stockfish? stockfish does use simulation or selfplay or search. so you've outsourced search and dont do it yourself so you can claim to be "without search"

scotty79 · 2024-10-18T10:55:20.000000Z

I think what they say is valid. Same way ChatGPT can answer questions google could answer without doing any search like google does.

hilux · 2024-10-18T09:13:05.000000Z

Do you understand how ML models are trained?

bicsi · 2024-10-18T07:00:34.000000Z

They built a dumber clone of Stockfish, and they call it ‘zero’ for some reason. What is the meaning behind ‘zero’ anyways? It used to refer to zero-shot, but now it seems like it’s just a marketing term.

janalsncm · 2024-10-18T07:52:34.000000Z

I assume you’re referring to AlphaZero and Leela Chess Zero.

AlphaZero was the successor to AlphaGo. AZ was notable because unlike AG, it used zero human games to learn to play: it just played games against itself. Typically in “supervised” machine learning you take human data and train a model to imitate it. AZ used zero human data to learn.

Leela Chess Zero started out as an open source copy of AZ but it’s probably better than AZ now.

ralegh · 2024-10-18T07:06:47.000000Z

I assume it's 'zero' turns lookahead/search, i.e. only look at the current board state.

pertymcpert · 2024-10-18T07:52:53.000000Z

Zero doesn't mean zero shot learning. It was coined by Deepmind for AlphaGo Zero where they used zero human input into the training data. It was trained entirely by playing against itself.

hilux · 2024-10-18T09:14:11.000000Z

And they gave Sir Demis a Nobel Prize!