GeminiChess, an LLM built engine

glav · Post by **glav** » Tue Sep 23, 2025 1:30 am

I spent most of the weekend getting an LLM to build a legal-move UCI chess engine end-to-end. Using Gemini 2.5-pro, I asked it to complete the first two milestones:

Step 1 — Core engine (C++ bitboards):
I prompted the model to produce a C++ bitboard engine that plays fully legal chess and speaks UCI. The model split the work across 151 responses.

Step 2 — Basic heuristics & time management:
I then asked Gemini to add a first pass of heuristics (its own suggestions) and simple time management. This took ~300 additional responses and resulted in the current program build. I’ve uploaded the source and the Linux/Windows binaries (it should compile on other platforms as well).

Automation pipeline:
Both steps were fully automated by a driver script that checked each LLM output for:
- successful compile,
- runs without crashing,
- basic UCI compliance,
- passing a small test suite.
About 15% of submissions were rejected by these checks (early in the run, acceptance may have been higher). The pipeline advanced or halted based on these criteria—no manual edits in the loop.

Current strength & next steps:
Right now it plays around fairymax strength—so nothing groundbreaking yet—but I believe there’s headroom for the LLM to keep improving it. I’m planning a third step to add more heuristics and better code structure.

Compute & cost notes:
I used over 95% of Google’s starter API credit to get this far and the work stopped before the model could reach the end of step 2. I still have some runway, but I may switch to more economical LLMs or explore running locally (which would anyhow require a GPU with decent VRAM).

I’m torn between pride and embarrassment

— but mostly curious what this community thinks. Feedback, testing ideas, or pitfalls I should watch for would be hugely appreciated!

smatovic · Post by **smatovic** » Tue Sep 23, 2025 9:22 am

Nice.

I am impressed that Gemini can now do Bitboards.

The evaluation looks less advanced than selective search.

There is another thread going on:

YATT - Yet Another Turing Test
viewtopic.php?t=83919

--
Srdja

smatovic · Post by **smatovic** » Tue Sep 23, 2025 11:04 am

glav wrote: ↑Tue Sep 23, 2025 1:30 am [...]
I’m torn between pride and embarrassment — but mostly curious what this community thinks. Feedback, testing ideas, or pitfalls I should watch for would be hugely appreciated!
[...]

Re: YATT - Yet Another Turing Test
viewtopic.php?p=982124#p982124

smatovic wrote: ↑Mon Aug 18, 2025 9:24 pm [...]
To reach top-engine level the AI will probably need a test framework feedback loop - update mental model, ponder, come up with ideas, implement idea, test idea (with ~10K games in self-play for 10 Elo steps and ~100K games for 1 Elo steps), commit or reject idea, repeat.
[...]

Re: ChatGPT usage in computer chess?
viewtopic.php?p=981272#p981272

j.t. wrote: ↑Mon Jul 21, 2025 6:38 pm
smatovic wrote: ↑Mon Jul 21, 2025 2:22 pm - Can it contribute to Stockfish?
No, it would need to load a mental model of Stockfish chess engine, then ponder, then come up with ideas, then implement ideas, then test ideas (with ~10K games selfplay) then decide to commit changes or update mental model. <- This pipeline is currently not present, especially testing, but I think doable in general. Ofc, the whole process of mental model and ideas can be done by brute force method, just try every possible permutation of code generation, driven by some NN heuristic.

I'm working a bit on this currently. No LTC passer as of yet unfortunately, however, there have been a few STC passers (of roughly 500 potential LLM patches):
- https://tests.stockfishchess.org/tests/ ... 4f6388c891
- https://tests.stockfishchess.org/tests/ ... 2d74b172ae
- https://tests.stockfishchess.org/tests/ ... 2d74b1559d
Some of the patches I read and the reasoning how they came up with them are quite reasonable.

Of course all the git and OpenBench submitting part in the pipeline is not done by the LLM itself (yet?), that's all implemented in a series of python scripts. The fishtest submitting of successful OpenBench tests is done fully manually by me.

--
Srdja

Werewolf · Post by **Werewolf** » Tue Sep 23, 2025 5:11 pm

glav wrote: ↑Tue Sep 23, 2025 1:30 am I spent most of the weekend getting an LLM to build a legal-move UCI chess engine end-to-end. Using Gemini 2.5-pro, I asked it to complete the first two milestones:

Step 1 — Core engine (C++ bitboards):
I prompted the model to produce a C++ bitboard engine that plays fully legal chess and speaks UCI. The model split the work across 151 responses.

Step 2 — Basic heuristics & time management:
I then asked Gemini to add a first pass of heuristics (its own suggestions) and simple time management. This took ~300 additional responses and resulted in the current program build. I’ve uploaded the source and the Linux/Windows binaries (it should compile on other platforms as well).

Automation pipeline:
Both steps were fully automated by a driver script that checked each LLM output for:
- successful compile,
- runs without crashing,
- basic UCI compliance,
- passing a small test suite.
About 15% of submissions were rejected by these checks (early in the run, acceptance may have been higher). The pipeline advanced or halted based on these criteria—no manual edits in the loop.

Current strength & next steps:
Right now it plays around fairymax strength—so nothing groundbreaking yet—but I believe there’s headroom for the LLM to keep improving it. I’m planning a third step to add more heuristics and better code structure.

Compute & cost notes:
I used over 95% of Google’s starter API credit to get this far and the work stopped before the model could reach the end of step 2. I still have some runway, but I may switch to more economical LLMs or explore running locally (which would anyhow require a GPU with decent VRAM).

I’m torn between pride and embarrassment — but mostly curious what this community thinks. Feedback, testing ideas, or pitfalls I should watch for would be hugely appreciated!

I did exactly this with ChatGPT 5 Pro - also in bitboard, written in C. Did debugging take you ages by any chance?

glav · Post by **glav** » Wed Sep 24, 2025 8:00 am

smatovic wrote: ↑Tue Sep 23, 2025 9:22 am Nice.

I am impressed that Gemini can now do Bitboards.

The evaluation looks less advanced than selective search.

There is another thread going on:

YATT - Yet Another Turing Test
viewtopic.php?t=83919

--
Srdja

Thanks for your positive comments and for drawing my attention to the already existing projects. I really learned a lot from those links.

glav · Post by **glav** » Wed Sep 24, 2025 8:03 am

Werewolf wrote: ↑Tue Sep 23, 2025 5:11 pm I did exactly this with ChatGPT 5 Pro - also in bitboard, written in C.

Nice. Would you have an executable of your program since I was unable to compile it?

Werewolf wrote: ↑Tue Sep 23, 2025 5:11 pm Did debugging take you ages by any chance?

Not really. Although the model submitted several wrong answers (it had allucinations, produced non-compilable code or code that crashed or didn't pass the tests, a couple of times produced even header files with the '.hh' (!) extension, etc.), the driver script was merciless in rejecting these proposals and asking for new ones. It was not too stressful: once started the run, I never had any direct interaction with the LLM, though I was closely watching.

flok · Post by **flok** » Wed Sep 24, 2025 9:52 am

glav wrote: ↑Tue Sep 23, 2025 1:30 am I spent most of the weekend getting an LLM to build a legal-move UCI chess engine end-to-end. Using Gemini 2.5-pro, I asked it to complete the first two milestones:

Step 1 — Core engine (C++ bitboards):
I prompted the model to produce a C++ bitboard engine that plays fully legal chess and speaks UCI. The model split the work across 151 responses.

Step 2 — Basic heuristics & time management:
I then asked Gemini to add a first pass of heuristics (its own suggestions) and simple time management. This took ~300 additional responses and resulted in the current program build. I’ve uploaded the source and the Linux/Windows binaries (it should compile on other platforms as well).

Interesting!
It is amazing that it even compiles, did not expect that.

Dann Corbit · Post by **Dann Corbit** » Wed Sep 24, 2025 2:20 pm

Threads do not work for me.
Changing thread count does nothing for the NPS

glav · Post by **glav** » Wed Sep 24, 2025 3:16 pm

Dann Corbit wrote: ↑Wed Sep 24, 2025 2:20 pm Threads do not work for me.
Changing thread count does nothing for the NPS

Thanks Dann. You are absolutely right. At the start of the project I gave a very broad prompt to Gemini and the LLM put the Threads in the code as a place holder (apparently it is kind of standard to do this for Gemini), but then I consumed up all my starting grant and the API stopped to work *, leaving me alone. Probably I should remove it, but the problem is that I don't know how to do it!

* I am still working with the Gemini API, but now I have taken a more conservative approach: instead of letting the model almost completely free to explore any path, I am requesting only for a limited number of runs with a very targeted request. I will post an update on this later.

glav · Post by **glav** » Thu Sep 25, 2025 11:11 am

smatovic wrote: ↑Tue Sep 23, 2025 9:22 am Nice.

I am impressed that Gemini can now do Bitboards.

The evaluation looks less advanced than selective search.

There is another thread going on:

YATT - Yet Another Turing Test
viewtopic.php?t=83919

--
Srdja

Thanks for your help, Srdja. I took by heart your suggestion to focus on the evaluation rather then the search. Therefore, I examined some GeminiChess games and I found, indeed, a few weak spots in its evaluation. Accordingly, I made a prompt for the LLM asking to fix some of them(see below). As an answer, I got a well thought work plan from Gemini which I almost wholly accepted. However, this time I wanted to keep costs under control and I parametrized the driver script to perform a maximum of 50 iterations. To keep a long story short, at iteration #19, Gemini called it a day, claiming to have applied all the patches needed to comply with my prompt. I checked the new code against some test position I hadn't shown to the model and it, indeed, passed them. Moreover, the code, which I enclose, seems to yield a more performing engine, now consistently beating fairymax and probably playing at a level of about 2000 Elo points. Even this time the driver script refusd some LLM submissions, but things went moslty smoothly. As always, any feedback would be greatly appreciated.

########################### Prompt ######################################

I went through some games played by a Gemini LLM built chess engine (see the enclosed source code) and I found some possible improvements. Could please give me your opinion about it?

1) Not great willingness to castle. Here are is an example positions where the engine (Black) should have castled and it played instead Rg8
[d]4k2r/p1pn1pp1/Prn1p2p/3qP3/3PN3/R4PBP/6P1/Q2R2K1 b k - 2 38

2) The engines seems to like to keep the pawns on their starting squares, since the are moved with great parsimony. I suspect that PST could be giving a too much higher premium for keeping them in place. Could you please report the PST for pawns?

Please, do no not write any code yet. Just suggest a work plan to fix the above issues.
########################

[pgn][Event "?"]
[Site "?"]
[Date "2025.09.25"]
[Round "1"]
[White "fairymax"]
[Black "GeminiChess"]
[Result "0-1"]
[ECO "A04"]
[GameDuration "00:11:50"]
[GameEndTime "2025-09-25T11:09:14.237 CEST"]
[GameStartTime "2025-09-25T10:57:23.856 CEST"]
[Opening "Reti Opening"]
[PlyCount "104"]
[TimeControl "40/180+3"]

1. Nf3 {+0.11/9 8.3s} e6 {-1.87/11 6.2s} 2. d4 {+0.05/9 7.9s} c5 {-1.96/11 6.2s}
3. c4 {+0.03/8 3.9s} cxd4 {-1.42/12 6.2s} 4. Nxd4 {-0.12/9 11s}
Bb4+ {-1.43/10 6.2s} 5. Nc3 {0.00/8 4.3s} Bxc3+ {-1.14/12 6.3s}
6. bxc3 {+0.16/9 6.0s} Nf6 {-1.42/11 6.3s} 7. Nb5 {+0.65/10 6.2s}
Qa5 {-1.43/12 6.3s} 8. Bf4 {+0.99/10 6.6s} O-O {-1.51/12 6.4s}
9. Bc7 {+1.68/10 4.3s} b6 {-1.81/15 6.4s} 10. Bd6 {+1.86/10 4.2s}
Nc6 {-1.93/13 6.5s} 11. Bxf8 {+1.96/10 9.3s} Kxf8 {-1.88/12 6.5s}
12. Qb3 {+1.87/10 15s} Ne4 {-1.54/14 6.5s} 13. f3 {+1.84/10 6.3s}
a6 {-1.66/15 6.6s} 14. fxe4 {+1.40/11 4.6s} axb5 {-1.23/17 6.6s}
15. h4 {+1.32/11 8.7s} Nd4 {-0.50/16 6.7s} 16. Qb2 {+1.32/11 6.0s}
bxc4 {-0.52/16 6.7s} 17. O-O-O {+1.27/11 8.2s} Nc6 {-0.33/13 6.7s}
18. Rd2 {+0.87/10 4.1s} Qe5 {+0.15/13 6.8s} 19. e3 {+0.86/9 4.1s}
b5 {+0.27/13 6.8s} 20. Rh3 {+0.36/10 9.8s} b4 {+1.40/16 6.9s}
21. Rf2 {-0.34/10 4.8s} bxc3 {+2.07/13 6.9s} 22. Qa1 {-0.39/10 9.4s}
Na5 {+4.28/15 7.0s} 23. Qb1 {-1.15/10 5.6s} Rb8 {+4.07/16 7.1s}
24. Qa1 {-2.36/12 8.5s} Bb7 {+4.16/14 7.3s} 25. Rhf3 {-2.70/11 4.3s}
f6 {+5.39/14 7.2s} 26. Rf4 {-3.68/11 4.6s} Bxe4 {+7.39/15 7.3s}
27. Rxe4 {-4.07/10 4.1s} Qxe4 {+9.05/16 7.3s} 28. Rc2 {-8.42/12 4.9s}
Qxe3+ {+9.73/17 7.4s} 29. Kd1 {-8.60/13 9.5s} Rb5 {+9.71/17 7.5s}
30. Qxc3 {-11.42/14 5.7s} Rb1+ {+11.47/18 7.6s} 31. Rc1 {-11.73/14 5.4s}
Qxc3 {+13.10/18 7.7s} 32. Rxb1 {-10.89/17 4.1s} Qd4+ {+13.25/18 7.8s}
33. Kc2 {-10.92/17 5.6s} Qe4+ {+13.57/18 7.9s} 34. Kb2 {-11.07/16 6.7s}
c3+ {+13.87/20 8.0s} 35. Kxc3 {-12.47/16 4.3s} Qxb1 {+13.57/17 8.2s}
36. Be2 {-11.11/11 7.2s} Qxa2 {+14.27/16 8.4s} 37. Bf3 {-11.35/11 4.5s}
Qb3+ {+18.37/18 8.6s} 38. Kd2 {-13.44/12 7.9s} Nc4+ {+18.66/18 8.8s}
39. Ke1 {-14.02/12 11s} Qe3+ {+18.76/18 9.1s} 40. Be2 {-14.71/12 11s}
Qg1+ {+19.12/18 9.5s} 41. Bf1 {-15.26/12 4.9s} Ne3 {+18.89/16 6.5s}
42. Ke2 {-14.70/12 9.2s} Nxf1 {+18.89/14 6.5s} 43. h5 {-15.41/12 16s}
Qxg2+ {+18.96/12 6.5s} 44. Kd3 {-21.80/13 9.2s} Qf3+ {+19.41/11 6.6s}
45. Kc4 {-21.78/12 6.0s} Qxh5 {+19.47/11 6.6s} 46. Kd4 {-M14/11 9.1s}
Qg4+ {+25.82/12 6.6s} 47. Kd3 {-M12/11 12s} d5 {+M11/13 6.7s}
48. Kc2 {-M10/13 4.2s} Qb4 {+M1/13 6.7s} 49. Kc1 {-M8/28 2.8s}
Qc3+ {+M7/13 6.7s} 50. Kb1 {-M6/28 0.086s} Nd2+ {+M5/14 6.8s}
51. Ka2 {-M4/28 0.052s} Qc2+ {+M3/14 6.8s} 52. Ka1 {-M2/28 0.15s}
Nb3# {+M1/15 6.9s, Black mates} 0-1
[/pgn]

GeminiChess, an LLM built engine

GeminiChess, an LLM built engine

Re: GeminiChess, an LLM built engine

Re: GeminiChess, an LLM built engine

Re: GeminiChess, an LLM built engine

Re: GeminiChess, an LLM built engine

Re: GeminiChess, an LLM built engine

Re: GeminiChess, an LLM built engine

Re: GeminiChess, an LLM built engine

Re: GeminiChess, an LLM built engine

Re: GeminiChess, an LLM built engine