Which LLM does understand chess best?
Moderators: hgm, Rebel, chrisw
-
BeyondCritics
- Posts: 416
- Joined: Sat May 05, 2012 2:48 pm
- Full name: Oliver Roese
Which LLM does understand chess best?
I am interested in using LLMs to work with chess related content, e.g. chess history, important games, basic chess strategy. As of January 2026, is there a model you could suggest? Does it even work?
-
FireDragon761138
- Posts: 84
- Joined: Sun Dec 28, 2025 7:25 am
- Full name: Aaron Munn
Re: Which LLM does understand chess best?
I've used ChatGPT and Claude, particularly with Retrieval Augmented Generation (Projects, CustomGPT's based on classic texts like Silman or Stean's books). DeepSeek is also surprisingly good, and it's completely free for the moment.BeyondCritics wrote: ↑Tue Jan 27, 2026 3:10 pm I am interested in using LLMs to work with chess related content, e.g. chess history, important games, basic chess strategy. As of January 2026, is there a model you could suggest? Does it even work?
It's educational and amusing to just paste in a game an engine has analyzed. I recommend using a more positional chess engine for the initial analysis (Dragon, Komodo, or Lc0, etc.), it will help cue in the language model towards positional features- Stockfish isn't so good as it tends to tilt the model towards tactics that don't necessarily parse so well into strategic concepts. It's been a big help in transitioning into a more positional style. The LLM's can also be instructed to balance its own judgement with the engine output, which is helpful in identifying play that is practically good rather than merely engine-approved.
-
towforce
- Posts: 12963
- Joined: Thu Mar 09, 2006 12:57 am
- Location: Birmingham UK
- Full name: Graham Laight
Re: Which LLM does understand chess best?
BeyondCritics wrote: ↑Tue Jan 27, 2026 3:10 pm I am interested in using LLMs to work with chess related content, e.g. chess history, important games, basic chess strategy. As of January 2026, is there a model you could suggest? Does it even work?
General advice: get the best chatbot you can afford. However, different chatbots very clearly have different strengths, so you'll need to experiment to find out which one is best at this specific task. I'm guessing you were hoping that somebody else had already done that.
Human chess is partly about tactics and strategy, but mostly about memory
-
ehenkes
- Posts: 23
- Joined: Sun Feb 14, 2021 4:58 pm
- Location: Germany
- Full name: Erhard Henkes
Re: Which LLM does understand chess best?
I use ChatGPT-5.4 for explaning positions and static and dynamic data related to that. The process is:BeyondCritics wrote: ↑Tue Jan 27, 2026 3:10 pm I am interested in using LLMs to work with chess related content, e.g. chess history, important games, basic chess strategy. As of January 2026, is there a model you could suggest? Does it even work?
Stockfish 15.1 dev (has classical evaluation besides NNUE) modified (export these data) --> Chess Cockpit --> prompt to a WebChat:
Example: Position after the moves 1.e4 e5 2.Sf3 Sc6 3.Lb5 Sf6
The system is explained and available (open source code, binaries) in my Ebook:ehenkes wrote: ↑Tue Apr 14, 2026 2:16 pm
My Chess Cockpit sends: (prompt in the program is in German, but answer can be in different languages)
... and ChatGPT answers:Code: Select all
WICHTIG: Dieser Prompt ist fuer ChatGPT gedacht. Du bist ein Schachcoach und erklaerst Engine-Ausgaben fuer Menschen. Du darfst KEINE eigenen Zuege erfinden. Fixe Bedeutung: - cp ist Centipawn-Wertung aus Sicht der ZIEHENDEN FARBE (siehe NextToMove). Positiv = Vorteil fuer die ziehende Farbe, negativ = Vorteil fuer den Gegner. - wdl A B C bedeutet erwartete Ergebnisse aus Sicht der ZIEHENDEN FARBE: A=Win, B=Draw, C=Loss (in Promille). Beispiel: wdl 29 970 1 ~ 2.9% Gewinn, 97.0% Remis, 0.1% Verlust. - Wenn NextToMove=black und score cp -17, dann ist das gut fuer Weiss (Schwarz steht leicht schlechter). - wdl ist KEIN Hinweis auf Suchtiefe. NNUE derived piece values (nnueDiff.txt): - nnueDiff.txt enthaelt pro Feld eine Figurkennung und einen Wert wie im ASCII-Brett "NNUE derived piece values". - Diese Werte sind KEINE klassischen Materialwerte. - Bedeutung: positionsabhaengiger marginaler Beitrag der jeweiligen Figur zur aktuellen NNUE-Eval. - Berechnungsidee: Die Engine entfernt eine Figur (ausser Koenig) kurz vom Brett, bewertet neu und bildet v = base - eval. - Technischer Hinweis: base/eval werden intern mit einer Side-to-move-Normierung berechnet. Fuer die Ausgabe in nnueDiff.txt gilt jedoch die empirisch beobachtete Vorzeichenstabilitaet als massgeblich. - FUER DIE INTERPRETATION gilt jedoch empirisch stabil: Weisse Figuren haben typischerweise positive Werte, schwarze Figuren negative. - WICHTIG: Ein negativer Wert bei einer SCHWARZEN Figur ist normalerweise KEIN Minus fuer Schwarz, sondern markiert den (positionsabhaengigen) Beitrag dieser schwarzen Figur in dieser Stellung. Je staerker negativ (weiter weg von 0), desto staerker/tragender wirkt diese schwarze Figur. Vermeide Formulierungen wie "belastet", "gehemmt", "drueckt die schwarze Stellung". - Leseregel: Weiss = groessere positive Werte koennen ueberdurchschnittliche Wirkung anzeigen. Schwarz = staerker negative Werte koennen ueberdurchschnittliche Wirkung anzeigen. - Kurzregel fuer nnueDiff: Weiss-Plus und Schwarz-Minus sind beides STAEKREN ihrer jeweiligen Seite, nicht Schwaechen. - Achtung: Rohe Werte sind materialgetrieben (Dame/Tuerme wirken sonst immer "am wichtigsten"). - Daher fuer Erklaerungen das Konzept "AUFFAELLIGKEIT" nutzen: Vergleiche den Wert mit einem typischen Erwartungsbereich desselben Figurtyps. Fuer Schwarz gilt: "auffaellig" heisst meist staerker negativ als der grobe Referenzbereich. Fuer Weiss gilt: "auffaellig" heisst meist staerker positiv als der grobe Referenzbereich. Nenne eine Figur 'auffaellig', wenn sie klar ausserhalb oder am Rand dieses Bereichs liegt. - Speziell bei auffaelligen BAUERN muss die Begruendung konkret 1 Hauptrolle nennen: Koenigsschild, zentraler Hebel fuer Bauernvorstoss, oder Scharnier-Feldkontrolle/Strukturanker. - Heuristische Referenz (grober Erwartungsrahmen, anpassbar): Weiss: P ~ +0.5..+1.2, N ~ +3.5, B ~ +3.6..+4.0, R ~ +4.6..+5.0, Q ~ +6.9. Schwarz: P ~ -0.5..-1.2, N ~ -3.5, B ~ -3.6..-4.0, R ~ -4.6..-5.0, Q ~ -6.9 (staerker negativ = oft staerkerer schwarzer Beitrag). - Die Zahlenskalierung in nnueDiff.txt ist identisch zur Brettdarstellung (format_cp_compact). Kandidatenzuege (harte Definition): - Nimm aus dynamicOutput.txt die ERSTEN Zuege nach "pv" aus MultiPV 1-3. - Genau diese drei Startzuege sind die einzigen Kandidaten. - Nenne keine anderen Zuege. - MultiPV 1 ist dabei immer der beste Kandidat laut Engine-Rangfolge. Hinweis zu output.txt: - "Spielzug" kann der zuletzt gespielte Zug sein. - Massgeblich fuer die Perspektive ist "NextToMove". - Interpretiere cp und wdl streng aus dieser Perspektive. Harte Regeln: 1) Nenne keine Zuege ausser den Engine-Kandidaten aus MultiPV 1-3. 2) Wenn MultiPV vorhanden ist, verwende die drei Startzuege direkt nach "pv". 3) Nutze legalMoves.txt nur als Sicherheitsnetz fuer Zulaessigkeit. 4) Keine allgemeinen Lexikon-Erklaerungen. 5) Beziehe dich auf die konkreten Werte. 6) Werte aus nnueDiff.txt nur interpretieren, nicht in Materialwerte umdeuten. 7) nnueDiff-Ranking: Bestimme je Seite die Top-3 AUFFAELLIGSTEN Figuren relativ zu ihrem Figurtyp (nicht nach rohen Maximalwerten). 8) Bei schwarzen Figuren darf ein staerker negativer nnueDiff-Wert NICHT als Schwaeche oder Belastung formuliert werden. 9) Bei auffaelligen Bauern aus nnueDiff.txt muss die Erklaerung explizit 1 Hauptrolle nennen: Koenigsschild, zentraler Hebel fuer Bauernvorstoss, oder Scharnier-Feldkontrolle/Strukturanker. Aufgabe: - Liste die 3 Kandidatenzuege aus MultiPV 1-3 mit cp und wdl. - Erklaere pro Kandidat in 1-2 Saetzen den Plan aus Sicht der ZIEHENDEN FARBE. - Ein kurzer Vergleich in 2-3 Saetzen. - Kurzes Fazit in maximal 4 Saetzen. Gehe auf die klassischen Bewertungen, jedoch nur King Safety, Mobility, Pawn Structure, Threats, Space ein. Keine Spekulationen. Erwähne den Wert des klassischen PSQ Scores. - Nenne zusaetzlich die Top-3 AUFFAELLIGSTEN weissen und Top-3 AUFFAELLIGSTEN schwarzen Figuren aus nnueDiff.txt. Grundlage: Vergleich des Wertes mit dem typischen Erwartungsbereich des jeweiligen Figurtyps. Gib jeweils Feld, Figur und Wert an und erklaere in 1 Satz kurz die Abweichung. Nutze die heuristischen Referenzbereiche als groben Rahmen, keine exakten Formeln. ============================================================ [dynamicOutput.txt] ============================================================ info depth 34 seldepth 42 multipv 1 score cp 21 wdl 17 981 2 nodes 183126659 nps 1727626 hashfull 629 tbhits 0 time 105999 pv e1g1 f6e4 f1e1 e4d6 f3e5 f8e7 b5f1 c6e5 e1e5 e8g8 d2d4 e7f6 e5e1 f8e8 c2c3 e8e1 d1e1 d6e8 c1f4 d7d5 b1d2 c8f5 e1e3 h7h6 a1e1 c7c6 e3g3 f6g5 h2h4 g5f4 g3f4 f5e6 e1e5 e8f6 info depth 33 seldepth 43 multipv 2 score cp 20 wdl 16 982 2 nodes 183126659 nps 1727626 hashfull 629 tbhits 0 time 105999 pv d2d3 f8c5 c2c3 d7d5 e4d5 d8d5 b5c4 d5d6 e1g1 a7a5 d1e2 c8e6 c4b5 e8g8 c1g5 f6d7 b1d2 e6d5 g5e3 c5e3 f2e3 d7f6 b5c6 b7c6 e3e4 d5e6 d3d4 f6d7 d2c4 e6c4 e2c4 c6c5 a1d1 e5d4 c3d4 c5d4 f3d4 d6b6 info depth 33 seldepth 35 multipv 3 score cp 11 wdl 10 987 3 nodes 183126659 nps 1727626 hashfull 629 tbhits 0 time 105999 pv b1c3 c6d4 f3d4 e5d4 e4e5 d4c3 e5f6 d8f6 d2c3 f8c5 d1e2 f6e7 a2a4 e7e2 e1e2 e8g8 h1d1 f8e8 e2f1 a7a5 c1f4 c7c6 b5d3 d7d6 h2h4 c8e6 d1e1 h7h6 f2f3 h6h5 c3c4 d6d5 ============================================================ [chessboard.txt] ============================================================ +---+---+---+---+---+---+---+---+ | r | | b | q | k | b | | r | 8 +---+---+---+---+---+---+---+---+ | p | p | p | p | | p | p | p | 7 +---+---+---+---+---+---+---+---+ | | | n | | | n | | | 6 +---+---+---+---+---+---+---+---+ | | B | | | p | | | | 5 +---+---+---+---+---+---+---+---+ | | | | | P | | | | 4 +---+---+---+---+---+---+---+---+ | | | | | | N | | | 3 +---+---+---+---+---+---+---+---+ | P | P | P | P | | P | P | P | 2 +---+---+---+---+---+---+---+---+ | R | N | B | Q | K | | | R | 1 +---+---+---+---+---+---+---+---+ a b c d e f g h Fen: r1bqkb1r/pppp1ppp/2n2n2/1B2p3/4P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 4 4 Key: 3EDF54C08589B482 Checkers: ============================================================ [output.txt] ============================================================ Spielzug: 4 Phase of the Game (128 ...0): 128 PSQ_Score: -0.183 King Safety White: 0.34 King Safety Black: 0.36 Mobility White: -0.0888 Mobility Black: -0.00254 Space White: 0.284 Space Black: 0.284 Threats White: 0.223 Threats Black: 0.381 Passed White: 0 Passed Black: 0 Pawns White: 0.107 Pawns Black: 0.107 NNUE evaluation: 0.0102 Final evaluation: 0.0127 Next Ply: 7 FEN: r1bqkb1r/pppp1ppp/2n2n2/1B2p3/4P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 4 4 NextToMove: white ============================================================ [legalMoves.txt] ============================================================ a2a3 b2b3 c2c3 d2d3 g2g3 h2h3 a2a4 b2b4 c2c4 d2d4 g2g4 h2h4 b1a3 b1c3 f3g1 f3d4 f3h4 f3e5 f3g5 b5f1 b5e2 b5d3 b5a4 b5c4 b5a6 b5c6 h1f1 h1g1 d1e2 e1f1 e1e2 e1g1 ============================================================ [nnueDiff.txt] ============================================================ NNUE_DPV_BEGIN a1 R +4.44 a2 P +0.35 a7 p -0.68 a8 r -4.87 b1 N +3.58 b2 P +0.86 b5 B +4.34 b7 p -1.07 c1 B +3.95 c2 P +0.98 c6 n -3.94 c7 p -0.89 c8 b -4.56 d1 Q +7.14 d2 P +0.58 d7 p -0.75 d8 q -8.35 e4 P +1.34 e5 p -1.25 f2 P +0.88 f3 N +4.05 f6 n -4.41 f7 p -0.93 f8 b -4.19 g2 P +1.25 g7 p -1.02 h1 R +4.85 h2 P +0.61 h7 p -0.41 h8 r -4.91 NNUE_DPV_END
Candidates from MultiPV 1 to 3, from White’s point of view:
e1g1 with cp +21, wdl 17 / 981 / 2
White castles kingside and immediately improves king safety. The plan is clear: move the king out of the center, activate the rook, and then continue with development and central pressure.
d2d3 with cp +20, wdl 16 / 982 / 2
White first stabilizes the center and keeps the position compact. This is a quieter plan than castling: support e4, complete development, and only then bring the king to safety.
b1c3 with cp +11, wdl 10 / 987 / 3
White develops the knight with direct influence over the center and the d5 and e4 squares. The plan is active piece development, but the engine evaluates this continuation a bit more cautiously than the first two.
The comparison is very close between e1g1 and d2d3. Both are practically equal, but e1g1 is the engine’s clear number one candidate and immediately addresses the most urgent developmental task, namely king safety. b1c3 remains playable, but it falls somewhat behind because White develops a piece without yet resolving the king issue.
As for the classical evaluation terms of the starting position, the PSQ Score is -0.183, so in classical PSQ terms that is a small positional point against White. In King Safety, White at 0.34 is slightly behind Black at 0.36, so e1g1 fits the position very well. In Mobility, White at -0.0888 is worse than Black at -0.00254. Space is equal at 0.284 to 0.284, Pawn Structure is also equal at 0.107 to 0.107, and in Threats Black has the stronger value with 0.381 against 0.223. That supports the engine’s tendency: White should primarily improve safety and coordination rather than aim for anything speculative.
The top 3 most notable white pieces from nnueDiff.txt, relative to their own piece type:
g2, Pawn, +1.25
For a white pawn this is at the upper edge of, or slightly above, the rough expectation range, so it stands out; its main role here is king shield.
e4, Pawn, +1.34
This is clearly above the usual pawn range and therefore very notable; its main role is a central lever for pawn advance.
f3, Knight, +4.05
A white knight is roughly expected around +3.5, so this is clearly above average; the knight is especially active and structurally important here.
The top 3 most notable black pieces from nnueDiff.txt, relative to their own piece type:
d8, Queen, -8.35
For a black queen this is much more negative than the rough reference around -6.9; that indicates an especially strong position-dependent contribution for Black.
f6, Knight, -4.41
For a black knight this is clearly more negative than the rough reference around -3.5; the knight is therefore notably effective for the black position.
e5, Pawn, -1.25
For a black pawn this sits at the strong end of the expectation range; its main role is a central lever for pawn advance.
Conclusion: The engine sees White as slightly better, but only by a very small margin. e1g1 is the best candidate because it improves king safety immediately and makes the position more harmonious. d2d3 follows the same general idea in a quieter way. b1c3 is sensible development, but it remains somewhat behind the two more structurally solid choices in the engine ranking.
https://www.amazon.de/Chess-Thinking-Ag ... B0GSH54B9F (US-engl.)
https://www.amazon.de/Schachcoach-Schac ... B0GM7M5Y21 (German)
My idea for the future of this "chess coach" is to use more and more the abilities of the AI.