YATT - Yet Another Turing Test

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

smatovic
Posts: 3451
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: YATT - Yet Another Turing Test

Post by smatovic »

Let me cross post this from another thread:

Re: ChatGPT usage in computer chess?
viewtopic.php?p=981757#p981757
j.t. wrote: Wed Aug 06, 2025 2:23 pm
j.t. wrote: Mon Jul 21, 2025 6:38 pm
smatovic wrote: Mon Jul 21, 2025 2:22 pm - Can it contribute to Stockfish?
No, it would need to load a mental model of Stockfish chess engine, then ponder, then come up with ideas, then implement ideas, then test ideas (with ~10K games selfplay) then decide to commit changes or update mental model. <- This pipeline is currently not present, especially testing, but I think doable in general. Ofc, the whole process of mental model and ideas can be done by brute force method, just try every possible permutation of code generation, driven by some NN heuristic.

I'm working a bit on this currently. No LTC passer as of yet unfortunately, however, there have been a few STC passers (of roughly 500 potential LLM patches):
- https://tests.stockfishchess.org/tests/ ... 4f6388c891
- https://tests.stockfishchess.org/tests/ ... 2d74b172ae
- https://tests.stockfishchess.org/tests/ ... 2d74b1559d
Some of the patches I read and the reasoning how they came up with them are quite reasonable.

Of course all the git and OpenBench submitting part in the pipeline is not done by the LLM itself (yet?), that's all implemented in a series of python scripts. The fishtest submitting of successful OpenBench tests is done fully manually by me.
Finally a test also passed LTC: https://github.com/official-stockfish/S ... /pull/6210
Hugging Face offers a new tool called Hugging Face Skills:

We Got Claude to Fine-Tune an Open Source LLM
https://huggingface.co/blog/hf-skills-training

You can fully automate LLM jobs, maybe an idea or blueprint for an LLM driven Stockfish dev cycle?

--
Srdja