GeminiChess, an LLM built engine

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

glav
Posts: 72
Joined: Sun Apr 07, 2019 1:10 am
Full name: Giovanni Lavorgna

GeminiChess, an LLM built engine

Post by glav »

I spent most of the weekend getting an LLM to build a legal-move UCI chess engine end-to-end. Using Gemini 2.5-pro, I asked it to complete the first two milestones:

Step 1 — Core engine (C++ bitboards):
I prompted the model to produce a C++ bitboard engine that plays fully legal chess and speaks UCI. The model split the work across 151 responses.

Step 2 — Basic heuristics & time management:
I then asked Gemini to add a first pass of heuristics (its own suggestions) and simple time management. This took ~300 additional responses and resulted in the current program build. I’ve uploaded the source and the Linux/Windows binaries (it should compile on other platforms as well).

Automation pipeline:
Both steps were fully automated by a driver script that checked each LLM output for:
- successful compile,
- runs without crashing,
- basic UCI compliance,
- passing a small test suite.
About 15% of submissions were rejected by these checks (early in the run, acceptance may have been higher). The pipeline advanced or halted based on these criteria—no manual edits in the loop.

Current strength & next steps:
Right now it plays around fairymax strength—so nothing groundbreaking yet—but I believe there’s headroom for the LLM to keep improving it. I’m planning a third step to add more heuristics and better code structure.

Compute & cost notes:
I used over 95% of Google’s starter API credit to get this far and the work stopped before the model could reach the end of step 2. I still have some runway, but I may switch to more economical LLMs or explore running locally (which would anyhow require a GPU with decent VRAM).

I’m torn between pride and embarrassment 😄 — but mostly curious what this community thinks. Feedback, testing ideas, or pitfalls I should watch for would be hugely appreciated!

smatovic
Posts: 3330
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: GeminiChess, an LLM built engine

Post by smatovic »

Nice.

I am impressed that Gemini can now do Bitboards.

The evaluation looks less advanced than selective search.

There is another thread going on:

YATT - Yet Another Turing Test
viewtopic.php?t=83919

--
Srdja
smatovic
Posts: 3330
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: GeminiChess, an LLM built engine

Post by smatovic »

glav wrote: Tue Sep 23, 2025 1:30 am [...]
I’m torn between pride and embarrassment 😄 — but mostly curious what this community thinks. Feedback, testing ideas, or pitfalls I should watch for would be hugely appreciated!
[...]
Re: YATT - Yet Another Turing Test
viewtopic.php?p=982124#p982124
smatovic wrote: Mon Aug 18, 2025 9:24 pm [...]
To reach top-engine level the AI will probably need a test framework feedback loop - update mental model, ponder, come up with ideas, implement idea, test idea (with ~10K games in self-play for 10 Elo steps and ~100K games for 1 Elo steps), commit or reject idea, repeat.
[...]
Re: ChatGPT usage in computer chess?
viewtopic.php?p=981272#p981272
j.t. wrote: Mon Jul 21, 2025 6:38 pm
smatovic wrote: Mon Jul 21, 2025 2:22 pm - Can it contribute to Stockfish?
No, it would need to load a mental model of Stockfish chess engine, then ponder, then come up with ideas, then implement ideas, then test ideas (with ~10K games selfplay) then decide to commit changes or update mental model. <- This pipeline is currently not present, especially testing, but I think doable in general. Ofc, the whole process of mental model and ideas can be done by brute force method, just try every possible permutation of code generation, driven by some NN heuristic.

I'm working a bit on this currently. No LTC passer as of yet unfortunately, however, there have been a few STC passers (of roughly 500 potential LLM patches):
- https://tests.stockfishchess.org/tests/ ... 4f6388c891
- https://tests.stockfishchess.org/tests/ ... 2d74b172ae
- https://tests.stockfishchess.org/tests/ ... 2d74b1559d
Some of the patches I read and the reasoning how they came up with them are quite reasonable.

Of course all the git and OpenBench submitting part in the pipeline is not done by the LLM itself (yet?), that's all implemented in a series of python scripts. The fishtest submitting of successful OpenBench tests is done fully manually by me.
--
Srdja
Werewolf
Posts: 2030
Joined: Thu Sep 18, 2008 10:24 pm

Re: GeminiChess, an LLM built engine

Post by Werewolf »

glav wrote: Tue Sep 23, 2025 1:30 am I spent most of the weekend getting an LLM to build a legal-move UCI chess engine end-to-end. Using Gemini 2.5-pro, I asked it to complete the first two milestones:

Step 1 — Core engine (C++ bitboards):
I prompted the model to produce a C++ bitboard engine that plays fully legal chess and speaks UCI. The model split the work across 151 responses.

Step 2 — Basic heuristics & time management:
I then asked Gemini to add a first pass of heuristics (its own suggestions) and simple time management. This took ~300 additional responses and resulted in the current program build. I’ve uploaded the source and the Linux/Windows binaries (it should compile on other platforms as well).

Automation pipeline:
Both steps were fully automated by a driver script that checked each LLM output for:
- successful compile,
- runs without crashing,
- basic UCI compliance,
- passing a small test suite.
About 15% of submissions were rejected by these checks (early in the run, acceptance may have been higher). The pipeline advanced or halted based on these criteria—no manual edits in the loop.

Current strength & next steps:
Right now it plays around fairymax strength—so nothing groundbreaking yet—but I believe there’s headroom for the LLM to keep improving it. I’m planning a third step to add more heuristics and better code structure.

Compute & cost notes:
I used over 95% of Google’s starter API credit to get this far and the work stopped before the model could reach the end of step 2. I still have some runway, but I may switch to more economical LLMs or explore running locally (which would anyhow require a GPU with decent VRAM).

I’m torn between pride and embarrassment 😄 — but mostly curious what this community thinks. Feedback, testing ideas, or pitfalls I should watch for would be hugely appreciated!


I did exactly this with ChatGPT 5 Pro - also in bitboard, written in C. Did debugging take you ages by any chance?
glav
Posts: 72
Joined: Sun Apr 07, 2019 1:10 am
Full name: Giovanni Lavorgna

Re: GeminiChess, an LLM built engine

Post by glav »

smatovic wrote: Tue Sep 23, 2025 9:22 am Nice.

I am impressed that Gemini can now do Bitboards.

The evaluation looks less advanced than selective search.

There is another thread going on:

YATT - Yet Another Turing Test
viewtopic.php?t=83919

--
Srdja
Thanks for your positive comments and for drawing my attention to the already existing projects. I really learned a lot from those links.
glav
Posts: 72
Joined: Sun Apr 07, 2019 1:10 am
Full name: Giovanni Lavorgna

Re: GeminiChess, an LLM built engine

Post by glav »

Werewolf wrote: Tue Sep 23, 2025 5:11 pm I did exactly this with ChatGPT 5 Pro - also in bitboard, written in C.
Nice. Would you have an executable of your program since I was unable to compile it?
Werewolf wrote: Tue Sep 23, 2025 5:11 pm Did debugging take you ages by any chance?
Not really. Although the model submitted several wrong answers (it had allucinations, produced non-compilable code or code that crashed or didn't pass the tests, a couple of times produced even header files with the '.hh' (!) extension, etc.), the driver script was merciless in rejecting these proposals and asking for new ones. It was not too stressful: once started the run, I never had any direct interaction with the LLM, though I was closely watching.
User avatar
flok
Posts: 596
Joined: Tue Jul 03, 2018 10:19 am
Full name: Folkert van Heusden

Re: GeminiChess, an LLM built engine

Post by flok »

glav wrote: Tue Sep 23, 2025 1:30 am I spent most of the weekend getting an LLM to build a legal-move UCI chess engine end-to-end. Using Gemini 2.5-pro, I asked it to complete the first two milestones:

Step 1 — Core engine (C++ bitboards):
I prompted the model to produce a C++ bitboard engine that plays fully legal chess and speaks UCI. The model split the work across 151 responses.

Step 2 — Basic heuristics & time management:
I then asked Gemini to add a first pass of heuristics (its own suggestions) and simple time management. This took ~300 additional responses and resulted in the current program build. I’ve uploaded the source and the Linux/Windows binaries (it should compile on other platforms as well).
Interesting!
It is amazing that it even compiles, did not expect that.
Dann Corbit
Posts: 12792
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: GeminiChess, an LLM built engine

Post by Dann Corbit »

Threads do not work for me.
Changing thread count does nothing for the NPS
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
glav
Posts: 72
Joined: Sun Apr 07, 2019 1:10 am
Full name: Giovanni Lavorgna

Re: GeminiChess, an LLM built engine

Post by glav »

Dann Corbit wrote: Wed Sep 24, 2025 2:20 pm Threads do not work for me.
Changing thread count does nothing for the NPS
Thanks Dann. You are absolutely right. At the start of the project I gave a very broad prompt to Gemini and the LLM put the Threads in the code as a place holder (apparently it is kind of standard to do this for Gemini), but then I consumed up all my starting grant and the API stopped to work *, leaving me alone. Probably I should remove it, but the problem is that I don't know how to do it! :D

* I am still working with the Gemini API, but now I have taken a more conservative approach: instead of letting the model almost completely free to explore any path, I am requesting only for a limited number of runs with a very targeted request. I will post an update on this later.
glav
Posts: 72
Joined: Sun Apr 07, 2019 1:10 am
Full name: Giovanni Lavorgna

Re: GeminiChess, an LLM built engine

Post by glav »

smatovic wrote: Tue Sep 23, 2025 9:22 am Nice.

I am impressed that Gemini can now do Bitboards.

The evaluation looks less advanced than selective search.

There is another thread going on:

YATT - Yet Another Turing Test
viewtopic.php?t=83919

--
Srdja


Thanks for your help, Srdja. I took by heart your suggestion to focus on the evaluation rather then the search. Therefore, I examined some GeminiChess games and I found, indeed, a few weak spots in its evaluation. Accordingly, I made a prompt for the LLM asking to fix some of them(see below). As an answer, I got a well thought work plan from Gemini which I almost wholly accepted. However, this time I wanted to keep costs under control and I parametrized the driver script to perform a maximum of 50 iterations. To keep a long story short, at iteration #19, Gemini called it a day, claiming to have applied all the patches needed to comply with my prompt. I checked the new code against some test position I hadn't shown to the model and it, indeed, passed them. Moreover, the code, which I enclose, seems to yield a more performing engine, now consistently beating fairymax and probably playing at a level of about 2000 Elo points. Even this time the driver script refusd some LLM submissions, but things went moslty smoothly. As always, any feedback would be greatly appreciated.


########################### Prompt ######################################

I went through some games played by a Gemini LLM built chess engine (see the enclosed source code) and I found some possible improvements. Could please give me your opinion about it?

1) Not great willingness to castle. Here are is an example positions where the engine (Black) should have castled and it played instead Rg8
[d]4k2r/p1pn1pp1/Prn1p2p/3qP3/3PN3/R4PBP/6P1/Q2R2K1 b k - 2 38

2) The engines seems to like to keep the pawns on their starting squares, since the are moved with great parsimony. I suspect that PST could be giving a too much higher premium for keeping them in place. Could you please report the PST for pawns?

Please, do no not write any code yet. Just suggest a work plan to fix the above issues.
########################

[pgn][Event "?"]
[Site "?"]
[Date "2025.09.25"]
[Round "1"]
[White "fairymax"]
[Black "GeminiChess"]
[Result "0-1"]
[ECO "A04"]
[GameDuration "00:11:50"]
[GameEndTime "2025-09-25T11:09:14.237 CEST"]
[GameStartTime "2025-09-25T10:57:23.856 CEST"]
[Opening "Reti Opening"]
[PlyCount "104"]
[TimeControl "40/180+3"]

1. Nf3 {+0.11/9 8.3s} e6 {-1.87/11 6.2s} 2. d4 {+0.05/9 7.9s} c5 {-1.96/11 6.2s}
3. c4 {+0.03/8 3.9s} cxd4 {-1.42/12 6.2s} 4. Nxd4 {-0.12/9 11s}
Bb4+ {-1.43/10 6.2s} 5. Nc3 {0.00/8 4.3s} Bxc3+ {-1.14/12 6.3s}
6. bxc3 {+0.16/9 6.0s} Nf6 {-1.42/11 6.3s} 7. Nb5 {+0.65/10 6.2s}
Qa5 {-1.43/12 6.3s} 8. Bf4 {+0.99/10 6.6s} O-O {-1.51/12 6.4s}
9. Bc7 {+1.68/10 4.3s} b6 {-1.81/15 6.4s} 10. Bd6 {+1.86/10 4.2s}
Nc6 {-1.93/13 6.5s} 11. Bxf8 {+1.96/10 9.3s} Kxf8 {-1.88/12 6.5s}
12. Qb3 {+1.87/10 15s} Ne4 {-1.54/14 6.5s} 13. f3 {+1.84/10 6.3s}
a6 {-1.66/15 6.6s} 14. fxe4 {+1.40/11 4.6s} axb5 {-1.23/17 6.6s}
15. h4 {+1.32/11 8.7s} Nd4 {-0.50/16 6.7s} 16. Qb2 {+1.32/11 6.0s}
bxc4 {-0.52/16 6.7s} 17. O-O-O {+1.27/11 8.2s} Nc6 {-0.33/13 6.7s}
18. Rd2 {+0.87/10 4.1s} Qe5 {+0.15/13 6.8s} 19. e3 {+0.86/9 4.1s}
b5 {+0.27/13 6.8s} 20. Rh3 {+0.36/10 9.8s} b4 {+1.40/16 6.9s}
21. Rf2 {-0.34/10 4.8s} bxc3 {+2.07/13 6.9s} 22. Qa1 {-0.39/10 9.4s}
Na5 {+4.28/15 7.0s} 23. Qb1 {-1.15/10 5.6s} Rb8 {+4.07/16 7.1s}
24. Qa1 {-2.36/12 8.5s} Bb7 {+4.16/14 7.3s} 25. Rhf3 {-2.70/11 4.3s}
f6 {+5.39/14 7.2s} 26. Rf4 {-3.68/11 4.6s} Bxe4 {+7.39/15 7.3s}
27. Rxe4 {-4.07/10 4.1s} Qxe4 {+9.05/16 7.3s} 28. Rc2 {-8.42/12 4.9s}
Qxe3+ {+9.73/17 7.4s} 29. Kd1 {-8.60/13 9.5s} Rb5 {+9.71/17 7.5s}
30. Qxc3 {-11.42/14 5.7s} Rb1+ {+11.47/18 7.6s} 31. Rc1 {-11.73/14 5.4s}
Qxc3 {+13.10/18 7.7s} 32. Rxb1 {-10.89/17 4.1s} Qd4+ {+13.25/18 7.8s}
33. Kc2 {-10.92/17 5.6s} Qe4+ {+13.57/18 7.9s} 34. Kb2 {-11.07/16 6.7s}
c3+ {+13.87/20 8.0s} 35. Kxc3 {-12.47/16 4.3s} Qxb1 {+13.57/17 8.2s}
36. Be2 {-11.11/11 7.2s} Qxa2 {+14.27/16 8.4s} 37. Bf3 {-11.35/11 4.5s}
Qb3+ {+18.37/18 8.6s} 38. Kd2 {-13.44/12 7.9s} Nc4+ {+18.66/18 8.8s}
39. Ke1 {-14.02/12 11s} Qe3+ {+18.76/18 9.1s} 40. Be2 {-14.71/12 11s}
Qg1+ {+19.12/18 9.5s} 41. Bf1 {-15.26/12 4.9s} Ne3 {+18.89/16 6.5s}
42. Ke2 {-14.70/12 9.2s} Nxf1 {+18.89/14 6.5s} 43. h5 {-15.41/12 16s}
Qxg2+ {+18.96/12 6.5s} 44. Kd3 {-21.80/13 9.2s} Qf3+ {+19.41/11 6.6s}
45. Kc4 {-21.78/12 6.0s} Qxh5 {+19.47/11 6.6s} 46. Kd4 {-M14/11 9.1s}
Qg4+ {+25.82/12 6.6s} 47. Kd3 {-M12/11 12s} d5 {+M11/13 6.7s}
48. Kc2 {-M10/13 4.2s} Qb4 {+M1/13 6.7s} 49. Kc1 {-M8/28 2.8s}
Qc3+ {+M7/13 6.7s} 50. Kb1 {-M6/28 0.086s} Nd2+ {+M5/14 6.8s}
51. Ka2 {-M4/28 0.052s} Qc2+ {+M3/14 6.8s} 52. Ka1 {-M2/28 0.15s}
Nb3# {+M1/15 6.9s, Black mates} 0-1
[/pgn]