Seer 2.0.0

chrisw · Post by **chrisw** » Sat May 01, 2021 2:42 pm

connor_mcmonigle wrote: ↑Fri Apr 30, 2021 5:13 pm
chrisw wrote: ↑Fri Apr 30, 2021 1:33 pm
connor_mcmonigle wrote: ↑Thu Apr 29, 2021 10:50 pm I've released a new minor version of Seer which hopefully resolves the cyclical TM issues people were experiencing: https://github.com/connormcmonigle/seer ... tag/v2.0.1. More information can be found in the release description. Sorry testers.

The TM problems seemed to only affect Windows which explains why I didn't experience any issues in my testing. Guenther has confirmed that Seer no longer loses on time in his limited testing.

chrisw wrote: ↑Wed Apr 28, 2021 11:16 am Cutechess, 40/10 games. Note timing problems and plenty of "illegal move" terminations when Seer2 actually winning (I assume because of time problems). I had to delete many PGNs because of talkchess post limits, the one's deleted came from the middle part of the run. Final part of run will contain 40+ unterminated games because of my interrupting CuteChess.

...
Can you confirm that the TM issues are now resolved? Thanks in advance.
500 40/10 games without timing issues says it's fixed.

First impressions from a quick look at the games, Seer2 very aggressive style, possibly a little premature, eg it likes king attacks (a lot), but launches them a little too quickly, maybe, often giving material or a pawn. Sometimes this works and sometimes not.

To delve a little deeper, I set up game in 15 and watched. First game shows immediately the king side "attack" tendency. At first I thought this was premature, not yet developed, white would get the centre etc. In fact at one point it turned into something looking quite dangerous, but defences were adequate. Very coffee-house style, which is really nice. Maybe that comes from the training set? Humans on LiChess?

[pgn][Event "My Tournament"]
[Site "?"]
[Date "2021.04.30"]
[Round "1"]
[White "Corona-Virus-Chess-1.018"]
[Black "seer_znver201"]
[Result "1-0"]
[ECO "A05"]
[GameDuration "00:39:39"]
[GameEndTime "2021-04-30T12:53:20.606 Romance Summer Time"]
[GameStartTime "2021-04-30T12:13:40.774 Romance Summer Time"]
[Opening "Reti Opening"]
[PlyCount "93"]
[Termination "adjudication"]
[TimeControl "40/900+2"]

1. Nf3 {book} Nf6 {book} 2. c4 {book} d6 {book} 3. Nc3 {book} Nbd7 {book}
4. g3 {book} e5 {book} 5. Bg2 {book} c6 {book} 6. e4 {book} Be7 {-0.21/19 19s}
7. O-O {+0.14/25 24s} h5 {-0.19/20 19s} 8. h3 {+0.18/25 24s} Nh7 {+0.17/21 20s}
9. d4 {+0.19/29 24s} g5 {+0.29/21 20s} 10. d5 {+0.36/27 24s} g4 {-0.12/21 20s}
11. hxg4 {+0.36/29 25s} hxg4 {-0.22/21 20s} 12. Nh2 {+0.28/30 26s}
Nhf6 {-0.16/20 20s} 13. Nxg4 {+0.66/31 27s} Nxg4 {-0.96/23 95s}
14. Qxg4 {+0.61/30 26s} Kf8 {-0.53/21 23s} 15. Qe2 {+0.66/29 24s}
Bg5 {-0.67/23 87s} 16. f4 {+0.88/26 24s} exf4 {-0.98/22 79s}
17. gxf4 {+0.91/28 28s} Bh4 {-1.66/21 16s} 18. Rf3 {+0.58/26 29s}
Nf6 {-1.29/19 16s} 19. e5 {+0.63/30 27s} Ng4 {-0.65/21 23s}
20. e6 {+0.58/29 28s} Bf6 {-1.62/19 16s} 21. Rd3 {+0.55/29 32s}
Qb6+ {-1.55/21 33s} 22. Kf1 {+0.48/29 24s} Nh2+ {-1.15/23 73s}
23. Ke1 {+0.51/24 2.1s} fxe6 {-0.69/20 64s} 24. dxc6 {+0.59/29 28s}
bxc6 {-0.28/23 56s} 25. Ne4 {+0.51/28 29s} Be7 {-0.40/23 48s}
26. Nxd6 {+0.49/30 25s} Ba6 {0.00/22 41s} 27. Be3 {+0.87/31 30s}
Qa5+ {-0.92/20 12s} 28. Qd2 {+1.00/31 29s} Qh5 {-0.71/19 8.4s}
29. Qc3 {+1.00/32 31s} Rd8 {-0.13/19 8.6s} 30. Bc5 {+1.13/30 33s}
Rh7 {-1.79/22 36s} 31. Ne4 {+1.48/31 29s} Rxd3 {-2.82/23 28s}
32. Bxe7+ {+1.72/28 25s} Rxe7 {-2.37/18 5.9s} 33. Qxd3 {+1.66/29 29s}
e5 {-2.93/19 22s} 34. f5 {+2.74/29 30s} Bc8 {-4.71/20 15s}
35. Qd8+ {+3.64/33 33s} Re8 {-3.79/18 3.6s} 36. Qf6+ {+3.95/35 30s}
Kg8 {-7.06/22 11s} 37. Qxc6 {+4.18/36 37s} Qh4+ {-4.62/19 3.7s}
38. Kd2 {+4.03/34 34s} Rd8+ {-1.92/16 2.3s} 39. Kc3 {+4.58/37 32s}
Bxf5 {-5.22/16 2.1s} 40. Rg1 {+4.80/36 29s} Kf8 {-3.80/17 2.0s}
41. Qc5+ {+5.44/32 29s} Qe7 {-13.55/24 35s} 42. Qf2 {+5.64/31 28s}
Qf7 {-14.42/23 17s} 43. Qh4 {+8.02/30 27s} Rd4 {-15.83/26 48s}
44. Qh8+ {+8.56/32 28s} Qg8 {-16.17/27 75s} 45. Qh6+ {+8.87/33 27s}
Qg7 {-19.95/24 25s} 46. Qxh2 {+9.07/33 25s} Bg4 {-16.20/24 70s}
47. Bf3 {+12.70/33 29s, White wins by adjudication: user decision} 1-0

[/pgn]

Second game, Seer2 attacking, again one pawn less, then another pawn, then another pawn, has centre, bishop pair and development. Looks good. Trades to QRPPP vs RRBNPPPPP, Seer alternating between seeing a draw, or seeing +3. Mine thinks draw. And draw it is ...

Very nice, and really quite a coffee-house style, albeit anecdotally on two games only. Like it.
....
Thanks for testing chris! I'm quite happy to see I've resolved the TM issues. What's CoronavirusChess' estimated strength and what was the result of the match? Seer is expected to underperform at STC somewhat, but in my testing at 1m+1s against a variety of opponents it looks to be around 3180-3220 elo CCRL.

CoronaVirusChess has a blitz rating on the nightmare.nl server of 2777, from the monthly blitz tourneys. Strength is of the same order as Arasan, Wasp and similar, I think. It seems to generally hold its own against the HCE’s (not including SF of course), but can’t handle the NNUE’s.
The 500 fast test games (40/10) with Seer2.01 came out about 52% winrate to mine, but that could go either way after so few games, so, maybe roughly equal?? Draw rate was 21%, iirc, I think that is quite low, which might reflect something or other (both engines a bit manic?).
I think I still have the pgns of the longer time control games, five altogether, again iirc.

I believe the attacking style might have something to do with the distribution of positions Seer was trained on (maybe an excessive number of imbalanced positions), but the labels assigned to the positions are completely independent of the originating games' outcomes.

Yes, the labels are more absolutely accurate, by definition, at least at the low man-count training iterations, but the actual positions are likely more manic. And then each iteration adds labels from the manic world above itself, and so on. It’s progressive and it must have an effect I’ld have thought.

I think Seer tends to be much looser with material as the playout continuations to <N man position it's trained on can sometimes go up to 30 moves deep which means that the network tends to look more for long term advantages than immediate compensation. This definitely backfires sometimes, though

Using a shallow search as a target biases the network towards looking for immediate compensation. Training a network on the WDL outcome of partially randomized self play games also induces a "short term bias" as positions requiring precise, long continuations (such as a king attack or tricky endgame), where a random move can't be afforded, will be undervalued.

I’m not sure I understood that last bit! A question though. You said you used 0.7 billion training positions, I guess the collective 3-4-5-6 bin for initial train would have 50 million or so of those? Unless I misunderstood, the Seer NN is using king-piece-square as per nodchip-SF NNUE, but with fewer accumulator neurons and a different structure above that (which kind of cross checks with the 60Mb weights file size). Is 50 million enough to train a big initial NNUE, or is the intention to increase the position mass for future runs?

mclane · Post by **mclane** » Sat May 01, 2021 3:26 pm

If it plays coffeehouse chess, i can only congratulate you for making it this way.

connor_mcmonigle · Post by **connor_mcmonigle** » Sat May 01, 2021 6:50 pm

chrisw wrote: ↑Sat May 01, 2021 2:42 pm CoronaVirusChess has a blitz rating on the nightmare.nl server of 2777, from the monthly blitz tourneys. Strength is of the same order as Arasan, Wasp and similar, I think. It seems to generally hold its own against the HCE’s (not including SF of course), but can’t handle the NNUE’s.
The 500 fast test games (40/10) with Seer2.01 came out about 52% winrate to mine, but that could go either way after so few games, so, maybe roughly equal?? Draw rate was 21%, iirc, I think that is quite low, which might reflect something or other (both engines a bit manic?).
I think I still have the pgns of the longer time control games, five altogether, again iirc.

I believe the attacking style might have something to do with the distribution of positions Seer was trained on (maybe an excessive number of imbalanced positions), but the labels assigned to the positions are completely independent of the originating games' outcomes.

Yes, the labels are more absolutely accurate, by definition, at least at the low man-count training iterations, but the actual positions are likely more manic. And then each iteration adds labels from the manic world above itself, and so on. It’s progressive and it must have an effect I’ld have thought.

I think Seer tends to be much looser with material as the playout continuations to <N man position it's trained on can sometimes go up to 30 moves deep which means that the network tends to look more for long term advantages than immediate compensation. This definitely backfires sometimes, though

Using a shallow search as a target biases the network towards looking for immediate compensation. Training a network on the WDL outcome of partially randomized self play games also induces a "short term bias" as positions requiring precise, long continuations (such as a king attack or tricky endgame), where a random move can't be afforded, will be undervalued.
I’m not sure I understood that last bit! A question though. You said you used 0.7 billion training positions, I guess the collective 3-4-5-6 bin for initial train would have 50 million or so of those? Unless I misunderstood, the Seer NN is using king-piece-square as per nodchip-SF NNUE, but with fewer accumulator neurons and a different structure above that (which kind of cross checks with the 60Mb weights file size). Is 50 million enough to train a big initial NNUE, or is the intention to increase the position mass for future runs?

Awesome. So not too shabby a result given Coronavirus Chess is so competitive. Thanks.

While the distribution over chess positions my dataset was sampled from being biased towards more imbalanced/manic positions relative to others is likely relevant to Seer's style, I don't know that this fully explains it. We could conceive of a set of labels for even a dataset consisting exclusively of highly imbalanced/manic positions such that the network exhibits a less aggressive playing style.

Here's my rough theory:
To clarify what I meant by "short term bias", consider a delicate winning position requiring 15 precise half moves to convert. Training with the eval of a low depth search is unlikely to result in the correct evaluation for this type of position as it simply won't be deep enough (multiple iterations of TD learning aren't feasible due to TD learning quickly becoming unstable). Training on the game's outcome might seem like a solution, but with random move insertion, precise lines are impossible to follow. Either training strategy results in the position getting underestimated. Less concretely, both strategies result in a preference for "wider" positions with many good moves. If I remember correctly, Stoofvlees was trained on self play games quite similar to Lc0, but with no or very minimal random move insertion (temperature). While this is only a sample size of two, my theory is that training on the results of long, precise, playouts results in a sharper playing style and less of a preference between "narrow" and "wide" position types.

You are correct about the architecture. Seer uses a 160x2 asymmetric feature transformer (first layer) with HalfKA input features. The topology of the subnetwork above the feature transformer is fairly different. The network also differs as it predicts 3 scalars which are softmaxed to get WDL probabilities.

There were about ~20M <=6 man positions in my training set iirc. Perhaps using more positions would have produced better results. Given there are massively more <=32 man positions relative to <=6 man positions, 20M seemed adequate to produce a network with near perfect endgame play in my testing. I also use dropout regularization and factorization both of which eliminate a lot of the issues associated with training on so few positions.

connor_mcmonigle · Post by **connor_mcmonigle** » Sat May 01, 2021 6:56 pm

mclane wrote: ↑Sat May 01, 2021 3:26 pm If it plays coffeehouse chess, i can only congratulate you for making it this way.

I don't know whether it is possible to get away with playing coffeehouse style chess when you're >3000 elo, but Seer definitely explores the boundary between sound and unsound in many games for better or worse

Guenther · Post by **Guenther** » Sat May 01, 2021 7:27 pm

Rebel wrote: ↑Wed Apr 28, 2021 8:40 am Division Three

40 moves in 2 minutes

Code: Select all

No. Engine              1     2     3     4     5     6     7     8  Score  Games  Perc 
-----------------------------------------------------------------------------------------
 1 Seer 2.0.0        xxxx  52.0  61.5  62.0  70.5  60.0  64.0  66.5  436.5 / 700 (62.36%)
 2 Weiss 1.3         48.0  xxxx  61.5  68.5  53.5  58.0  68.5  71.5  429.5 / 700 (61.36%)
 3 Topple 0.8.0      38.5  38.5  xxxx  52.5  48.5  52.5  54.5  60.0  345.0 / 700 (49.29%)
 4 Counter 3.7       38.0  31.5  47.5  xxxx  56.0  56.5  47.0  60.0  336.5 / 700 (48.07%)
 5 Seer 1.2.1        29.5  46.5  51.5  44.0  xxxx  51.0  52.5  57.5  332.5 / 700 (47.50%)
 6 Cheng 4.41        40.0  42.0  47.5  43.5  49.0  xxxx  51.5  58.5  332.0 / 700 (47.43%)
 7 FabChess 1.16     36.0  31.5  45.5  53.0  47.5  48.5  xxxx  56.0  318.0 / 700 (45.43%)
 8 Cheng 4.40        33.5  28.5  40.0  40.0  42.5  41.5  44.0  xxxx  270.0 / 700 (38.57%)

Nice progress Connor!

Thanks for the games Ed.

Note that version 2.01 should do even better in your tournament, because it fixed some tc issues for mps you experienced.
I checked the games now and as expected it had dozens of illegal moves (29) (very low on time bug repeats previous move).

two examples (column style and number tag added by script)
You can see how low on time it was already around move 68-70 and ofc didn't manage to get to move 80
The second of both examples demonstrates that it doesn't only happen in already lost positions,
but also in already won positions.

Code: Select all

[Date "2021.04.28"]
[Round "5"]
[Number "8529"]
[White "Cheng_4.41"]
[Black "Seer_2.0.0"]
[Result "1-0"]

[Termination "illegal move"]
[TimeControl "40/150"]

62. d7 {+5.02/13 1.9} f1=Q {-2.58/14 1.4}
63. d8=Q+ {+5.24/12 2.2} Nd5 {-6.69/14 0.96}
64. Rf7 {+5.15/14 5.2} Qc1+ {-3.63/14 0.65}
65. Kg4 {+5.42/14 3.4} Rxb7 {-2.64/11 0.42}
66. Nf3+ {+5.72/14 3.4} Ke4 {-2.91/12 0.26}
67. Ng5+ {+5.72/14 3.4} Kd4 {-4.33/12 0.17}
68. Rxb7 {+6.32/13 2.9} Qg1+ {-1.91/11 0.11}
69. Kh5 {+6.15/17 3.4} Qxh2+ {-7.00/10 0.049}
70. Kg6 {+6.55/16 3.4} Qc2+ {-7.83/9 0.031}
71. Kg7 {+7.15/16 2.9} Qg2 {-9.17/4 0.004}
72. Kf8 {+10.70/14 3.0} Kd3 {-9.79/9 0.026}
73. Rd7 {+119.31/17 3.0} Qf2+ {-6.64/4 0.004}
74. Kg8 {+147.12/17 3.6}
{Black makes an illegal move: g2f2}
1-0

Code: Select all

[Date "2021.04.28"]
[Round "69"]
[Number "8977"]
[White "Cheng_4.41"]
[Black "Seer_2.0.0"]
[Result "1-0"]

[Termination "illegal move"]
[TimeControl "40/150"]

65. Ra8 {-3.03/15 3.4} Ng3+ {+11.27/11 0.061}
66. Rxg3+ {-3.03/2 0} Rxg3 {+14.04/14 0.59}
67. Ne4+ {-3.51/20 3.1} Kf4 {+13.92/15 0.38}
68. Nxg3 {-3.68/21 3.0} Kxg3 {+15.14/10 0.024}
69. Ra3+ {-3.72/21 3.4} Kf4 {+15.44/15 0.22}
70. Rd3 {-3.72/21 3.7} Rb7 {+16.19/13 0.12}
71. Kh2 {-3.97/19 3.7} Kxe5 {+16.37/11 0.038}
72. Re3+ {-4.25/18 3.7} Kf5 {+17.76/10 0.056}
73. Kg3 {-4.20/18 3.7} h5 {+17.81/9 0.026}
74. Kh4 {-4.15/19 3.4} Rb1 {+19.75/8 0.014}
75. Rh3 {-4.34/19 3.3}
{Black makes an illegal move: b7b1}
1-0

Other illegal moves came from Bersek 3.30 (3) and FoxSEE 7.8 (3). Two time losses when ProDeo 3.0 refused to move after book end.

Also I wonder a bit why you changed the tc a few times?
I see games with 40/120, 40/150 and 120+2.

Rebel · Post by **Rebel** » Sat May 01, 2021 9:40 pm

Guenther wrote: ↑Sat May 01, 2021 7:27 pm Also I wonder a bit why you changed the tc a few times?
I see games with 40/120, 40/150 and 120+2.

It runs on 2 PC's, the 3.6Ghz runs 40/120, the 3.2Ghz 40/150.

The 120+2 time control is only with Lc0 games due to the TC bug in Lc0.

I will run Seer again with version 2.0.1.

And thanks for the debug work

connor_mcmonigle · Post by **connor_mcmonigle** » Sat May 01, 2021 9:43 pm

Rebel wrote: ↑Sat May 01, 2021 9:40 pm
Guenther wrote: ↑Sat May 01, 2021 7:27 pm Also I wonder a bit why you changed the tc a few times?
I see games with 40/120, 40/150 and 120+2.
It runs on 2 PC's, the 3.6Ghz runs 40/120, the 3.2Ghz 40/150.

The 120+2 time control is only with Lc0 games due to the TC bug in Lc0.

I will run Seer again with version 2.0.1.

And thanks for the debug work

Thanks Ed and my apologies for the buggy release.

carldaman · Post by **carldaman** » Sat May 01, 2021 10:26 pm

connor_mcmonigle wrote: ↑Sat May 01, 2021 6:56 pm
mclane wrote: ↑Sat May 01, 2021 3:26 pm If it plays coffeehouse chess, i can only congratulate you for making it this way.
I don't know whether it is possible to get away with playing coffeehouse style chess when you're >3000 elo, but Seer definitely explores the boundary between sound and unsound in many games for better or worse

But then again, no human player is 3000+ Elo OTB, so we'll take any coffeehouse chess that approaches or surpasses that level.

For more on the subject:

http://talkchess.com/forum3/viewtopic.p ... se#p785035

Rebel · Post by **Rebel** » Sun May 02, 2021 3:17 pm

connor_mcmonigle wrote: ↑Sat May 01, 2021 9:43 pm
Rebel wrote: ↑Sat May 01, 2021 9:40 pm
Guenther wrote: ↑Sat May 01, 2021 7:27 pm Also I wonder a bit why you changed the tc a few times?
I see games with 40/120, 40/150 and 120+2.
It runs on 2 PC's, the 3.6Ghz runs 40/120, the 3.2Ghz 40/150.

The 120+2 time control is only with Lc0 games due to the TC bug in Lc0.

I will run Seer again with version 2.0.1.

And thanks for the debug work
Thanks Ed and my apologies for the buggy release.

No problem at all, bugs makes us better programmers. Seer 2.0.1 results at the end of the GRL.

Rebel · Post by **Rebel** » Thu May 06, 2021 12:15 pm

From the new GRL:

Code: Select all

  # PLAYER            :  RATING  ERROR  POINTS  PLAYED   (%)     LOS    W    D    L  DRAWS
  16 Seer 2.0.1       :  3097.1   27.2   476.0     700    68      95  375  202  123   29% 
  21 Seer 2.0.0       :  3029.8   28.9   552.0     900    61      52  425  254  221   28%

I think you owe Guenther a good bottle of wine

Seer 2.0.0

Re: Seer 2.0.0

Re: Seer 2.0.0

Re: Seer 2.0.0

Re: Seer 2.0.0

Re: Seer 2.0.0

Re: Seer 2.0.0

Re: Seer 2.0.0

Re: Seer 2.0.0

Re: Seer 2.0.0

Re: Seer 2.0.0