I wonder if chess is a draw with less ranks

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

jkominek
Posts: 120
Joined: Tue Sep 04, 2018 5:33 am
Full name: John Kominek

Re: I wonder if chess is a draw with less ranks

Post by jkominek »

A couple adjunct points.

1) I re-partitioned my Chess324 games under Armageddon scoring, grouping together the castling rights of KQ/q, KQ/-, K/q, and K/-. Thus a single match between players consists of 4 x 648 = 2592 games. The total stats came out to 5067320 / 10238400 = 49.5% White wins, with ordo reporting a White advantage = -11.15 +/- 0.21 Elo. This makes for a strikingly balanced competition in computer play, with the benefit that the engines are entirely responsible for all piece development. By construction, no more draw death!

2) I did not elaborate on my spicy comment that Komodo Dragon's Monte Carlo evaluations are "suspicious crap that I can't take seriously." I'll illustrate what caught me by surprise. From a 1 million node search:

Code: Select all

position fen rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQ - 0 1
setoption name Use MCTS value true
setoption name MCTS Hash value 32768
info string MCTS Hash table size is now 32768 meg
setoption name MultiPV value 6
...
info multipv 1 depth 29 seldepth 69 time 2797416 nodes 1039051 nps 371 score cp 108 tbhits 0 pv d2d4 d7d5 c2c4 c7c6 g1f3 e7e6 b1c3 g8f6 e2e3 f6e4 f3d2 f7f5 d2e4 f5e4 f2f3 e4f3 d1f3 d8f6 e3e4 f6f3 g2f3 f8e7 h2h4 h8f8 c4d5 e6d5 e4d5 b8d7 c1g5 d7f6 e1d2 e8d8 a1e1 a7a6 d5c6 b7c6 d2c1 h7h6 g5f4 f6d5 c3d5 c6d5 f4c7 d8d7 c7e5 g7g5 f1g2 d7d8 e5c7 d8d7 h4g5 e7g5 f3f4 d7c7
info multipv 2 depth 29 seldepth 69 time 2797416 nodes 1039051 nps 371 score cp 81 tbhits 0 pv g1f3 d7d5 d2d4 c8f5 c2c4 e7e6 d1b3 b8c6 c1d2 a8b8 e2e3 a7a6 b1c3 g8f6 a1c1 f8b4 f1e2 h7h5 e1g1 e8f8 a2a3 b4c3 d2c3 f5g4 h2h3 f6e4 g1h1 h8h6 c3e1
info multipv 3 depth 29 seldepth 69 time 2797416 nodes 1039051 nps 371 score cp 63 tbhits 0 pv c2c4 e7e5 g2g3 g8f6 f1g2 f8c5
info multipv 4 depth 29 seldepth 69 time 2797416 nodes 1039051 nps 371 score cp 57 tbhits 0 pv e2e4 c7c5 g1f3 b8c6 f1b5 e7e6
info multipv 5 depth 29 seldepth 69 time 2797416 nodes 1039051 nps 371 score cp 58 tbhits 0 pv e2e3 c7c6 g1f3 d7d5 c2c4 c8g4
info multipv 6 depth 29 seldepth 69 time 2797416 nodes 1039051 nps 371 score cp 46 tbhits 0 pv g2g3 c7c5 g1f3 b8c6
stop
info nodes 1039545
info depth 29 seldepth 69 time 2799028 nodes 1039545 nps 371 score cp 181 tbhits 0 hashfull 863 pv d2d4 d7d5 c2c4 c7c6 g1f3 e7e6 b1c3 g8f6 e2e3 f6e4 f3d2 f7f5 d2e4 f5e4 f2f3 e4f3 d1f3 d8f6 e3e4 f6f3 g2f3 f8e7 h2h4 h8f8 c4d5 e6d5 e4d5 b8d7 c1g5 d7f6 e1d2 e8d8 a1e1 a7a6 d5c6 b7c6 d2c1 h7h6 g5f4 f6d5 c3d5 c6d5 f4c7 d8d7 c7e5 g7g5 f1g2 d7d8 e5c7 d8d7 h4g5 e7g5 f3f4 d7c7
bestmove d2d4 ponder d7d5
quit
At this point in building its search tree Komodo Dragon 3.3 reported 1.d4 as having an evaluation of 108, which is consistent with an alpha-beta search run for the same amount of time. Yet when I stopped the search Komodo's final evaluation is given as 181. Why the jump? To me this does not make mathematical sense. Note that the PV is unchanged.
lkaufman
Posts: 6284
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: I wonder if chess is a draw with less ranks

Post by lkaufman »

jkominek wrote: Sun Jan 18, 2026 11:31 pm
lkaufman wrote: Sun Jan 18, 2026 4:15 pm If Black cannot castle gives evals around 1.2 it is clearly unsuitable for Armageddon, a 75 to 25 expected score is totally unacceptable. That's why I proposed Black cannot castle short, that should be much closer to 1.0. Even if your fast stats suggest it's not as good a fit as no Black castling, I would trust the eval more.
The WDL model of Stockfish, Leela, and others including I presume Komodo Dragon, are based on computer-computer self-play measurements at bullet time controls. It is a guide but not one we should over-reach.

The question in my mind, and one I am interested in your opinion on, is How do you know? By that I mean how do you know that computer results translate directly to equivalent human results without further empirical studies? Absent abundant evidence from human play I treat the measurements I've made (and yours) as suggestive but not definitive.

What stands out to me as the confounding variable is the ability of human grandmasters to semi-reliably secure a draw when needed, in this case by Black. We also know that computer ratings lists are dilated with respect to FIDE ratings, e.g. as expressed in Komodo Dragon's UCI Elo settings, a fact I attribute to human's greater tendency to play for draws, especially when the players have a large rating separation. (Versus engines fighting it out for the win for many more moves.)

Breaking it down,
i) Under Armageddon rules the ideal opening position score 50% between equally rated players.
ii) Modern engines are calibrated such that an evaluation of 100 centipawns equates to a 50% win/draw ratio - in computer-computer play. This puts us in the ballpark but should not be considered definitive.
iii) Observation: human grandmasters have the ability to successfully play for a draw when needed. Also, placing White in a must-win situation adds clutch-play pressure.
iv) Together this tilts the advantage further in Black's favor.
v) To compensate the opening position needs to more strongly favor White that what computer WDL models consider balanced.
vi) This argues for a value somewhat above 100 centipawns being the target point. A value of 120 may seem too high but should not be ruled out as suitable in human play.
vii) Likewise I would not rule out a position with an evaluation of 80, either. Because maybe I'm wrong about statement iv).

In short I advocate for a broader range of consideration (appreciation of uncertainty) until thoroughly put to the test.

I'm limited by not having a database of human Armageddon games at my disposal. Do you have a collection of games to study from your connections to either chess.com or Lichess? Maybe you have analysis that you could share. Even if only for the standard opening position it would provide a useful point of comparison.

More ideal of course a large experiment on either of those playing sites to study variant castling rights.
I wouldn't put any weight on the idea that human grandmasters have the ability to play for a draw when needed. Sure, they have memorized many defenses for Black like the Berlin with the goal to reach a drawable ending, but this is irrelevant for play with alternate rules such as limited castling rights, which throws all opening theory out the window. I don't know whether Armageddon play with such rules is more favorable for one color or the other with human players than with equal strength bots, but I suspect there is very little difference. Also, I am interested in Armageddon rules that are as fair as possible both for human grandmasters and for engines (which means effectively for human correspondence play); if there is some difference in the eval margin needed I would advocate an intermediate value. But as I recall the difference between the eval needed for 2800 engines and for 3600 engines is very small. It is probably roughly as difficult to find "only" winning moves and "only" drawing moves, highly position-dependent.

I don't have an Armageddon database. For computer games with various castlin rights and Armageddon rules, Stephan Pohl's website should be the place to look, he did a lot of work on that some years ago.
Komodo rules!