Time handicap tournament. LC0 and SF11

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

Alayan
Posts: 550
Joined: Tue Nov 19, 2019 8:48 pm
Full name: Alayan Feh

Re: Time handicap tournament. LC0 and SF11

Post by Alayan »

The number of plies in an opening is much less important than the exit position. A long line can produce high elo-spread because it forces engines into complications ; conversely a short line with something like the slav exchange will have a poor elo-spread. Obviously, forced wins and forced 3-folds lower the elo-spread and are pretty much pointless.

Different books will have a different elo-spread ; but there is no standard as to what the "correct" elo-spread is. As soon as you do something else than pure start position testing (which is also flawed because it explores only a tiny subset of lines and of the engine abilities), you're introducing subjective choices.
mmt
Posts: 343
Joined: Sun Aug 25, 2019 8:33 am
Full name: .

Re: Time handicap tournament. LC0 and SF11

Post by mmt »

Guenther wrote: Tue Feb 18, 2020 1:58 pm This is wrong and will defeat the whole purpose of the test. Bad openings cannot be cured by playing them for both sides.
That is a widely spread illogical opinion. The only thing it does, is to help the suspected weaker program to push its score
further to equality.
No, at worst it makes weaker opening result in one win for each side, which has as big of an effect as playing drawish openings. I don't think any of these openings are lost and the initial score shown by the program is not the truth (or it'd be 0.0 or mate in N). How the program plays weaker openings from both sides is important. If you only play openings that are as even as possible, your tests will give you too many draws and you won't learn much about how the program plays.
Guenther wrote: Tue Feb 18, 2020 1:58 pm Also long openings generally make those statistically tests unreliable for various reasons.
I noticed e.g. 120+ and 140+ games with early 3 time reps below move 30! in your pgn files. (same effect, pushing the weaker one towards equality)
Sometimes even directly after book end. IMO long books should be abolished at all for serious and statistical tests.
Guenther wrote: Tue Feb 18, 2020 1:58 pm No, see above, I would use a 6-12 (at max!) plies opening file.
I think the opposite is true. Leaving openings too early results in repeated games, which messes up the stats, and is unrealistic because it won't happen in real games.
Guenther wrote: Tue Feb 18, 2020 1:58 pm Moreover I have some doubts now how to set the tc at all. Most programs now have too clever time management and I noticed that
in crucial positions sometimes the program, which should use half of the time actually used more time than the other.

Just for curiosity I did a match myself until today between SF and SFx2 (half time) on my slow hardware with 1 cpu each, at a very fast tc with given time per move. 1move/0.5s vs. 1move/0.25s (128MB) in cutechess-cli with a 6 plies general book and the diff was around 160 rating points.
(Need to calculate average depth for midgame to compare)

Even here I find artifacts of assymmetric time usage and I am not sure how much noise this adds to the outcome.
I don't understand the problem you're seeing. A program doesn't get more time than it has unless something is really broken on both sides.
Guenther wrote: Tue Feb 18, 2020 1:58 pm There is also an effect I completely forgot (but it is very rare) and we talked about long ago here.
Sometimes it is even negative to see further than your opponent, because you see more and more how worse your position might become
and defend against something the other won't see at all and play suboptimal against the time handicapped program until it even wins.
Yes, of course, it can be negative to see further (unless you reach a mate). It's just normal behavior that happens sometimes with all programs.
Guenther wrote: Tue Feb 18, 2020 1:58 pm This and the assymmetric time usage explains most of the wins the handicapped version can make against the non-handicapped one.
(plus bad book lines ofc)
Yes, that's the whole idea for these time-handicapped self-play tests. One thing I'm hoping to see is how the Elo (Ordo) curve flattens and where for SF and LC0.
Guenther wrote: Tue Feb 18, 2020 1:58 pm Moreover there is contempt in SF too, which also will influence this test...
Contempt is set 0.
User avatar
Guenther
Posts: 4610
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: Time handicap tournament. LC0 and SF11

Post by Guenther »

mmt wrote: Wed Feb 19, 2020 2:20 am
Guenther wrote: Tue Feb 18, 2020 1:58 pm This is wrong and will defeat the whole purpose of the test. Bad openings cannot be cured by playing them for both sides.
That is a widely spread illogical opinion. The only thing it does, is to help the suspected weaker program to push its score
further to equality.
(1)No, at worst it makes weaker opening result in one win for each side, which has as big of an effect as playing drawish openings. I don't think any of these openings are lost and the initial score shown by the program is not the truth (or it'd be 0.0 or mate in N). How the program plays weaker openings from both sides is important. If you only play openings that are as even as possible, your tests will give you too many draws and you won't learn much about how the program plays.
Guenther wrote: Tue Feb 18, 2020 1:58 pm Also long openings generally make those statistically tests unreliable for various reasons.
I noticed e.g. 120+ and 140+ games with early 3 time reps below move 30! in your pgn files. (same effect, pushing the weaker one towards equality)
Sometimes even directly after book end. IMO long books should be abolished at all for serious and statistical tests.
Guenther wrote: Tue Feb 18, 2020 1:58 pm No, see above, I would use a 6-12 (at max!) plies opening file.
(2)I think the opposite is true. Leaving openings too early results in repeated games, which messes up the stats, and is unrealistic because it won't happen in real games.
Guenther wrote: Tue Feb 18, 2020 1:58 pm Moreover I have some doubts now how to set the tc at all. Most programs now have too clever time management and I noticed that
in crucial positions sometimes the program, which should use half of the time actually used more time than the other.

Just for curiosity I did a match myself until today between SF and SFx2 (half time) on my slow hardware with 1 cpu each, at a very fast tc with given time per move. 1move/0.5s vs. 1move/0.25s (128MB) in cutechess-cli with a 6 plies general book and the diff was around 160 rating points.
(Need to calculate average depth for midgame to compare)

Even here I find artifacts of assymmetric time usage and I am not sure how much noise this adds to the outcome.
(3)I don't understand the problem you're seeing. A program doesn't get more time than it has unless something is really broken on both sides.

...snip...
You did not understand most of what I had written, I don't know why I should invest my pretty time in more explaining.
BTW do you ever look at the games? Just a few corrections on your wrong assumptions above.

(1) I can show you dozens of examples of lost openings out of that opening file.

(2) This is complete nonsense, if the opening file contains enough lines there are no repeated games.
(actually it is even funny you claimed this, as you had several repeated games in your test matches,
it seems your opening file besides lost positions, also contains just 761 openings which puts some pressure
on randomization for 500 start positions.)

(3)You don't understand, probably you never looked at the games at all. Even if you get twice as time over the whole game in average,
time can be accumulated completely different (and will! - also depending on the exact type of tc).

Some games will have a very few crucial moments, which decide the outcome and it is not nice, if the side with half of the time
spends here suddenly more time than the other (which for whatever reason now has less time saved or did not grasp
the crucial situation) for a very few moves and wins just because of this.

I guess you'll try an ovyron now, but don't hold your breath for another answer from my side.

appendix (1) some examples why SF/2 could win due to lopsided openings in your test

Code: Select all

[Event "SF vs SF half q5"]
[Site "MAIN"]
[Date "2020.02.12"]
[Round "31"]
[White "SF 2"]
[Black "SF"]
[Result "1-0"]
[ECO "B99"]
[Opening "Sicilian"]
[Time "05:11:27"]
[TimeControl "20+2"]
[Termination "normal"]
[PlyCount "94"]

1. e4 c5
2. Nf3 d6
3. d4 cxd4
4. Nxd4 Nf6
5. Nc3 a6
6. Bg5 e6
7. f4 Be7
8. Qf3 Qc7
9. O-O-O Nbd7
10. Bd3 h6
11. Bh4 b5
12. e5 {+2.69/29 7} Bb7 {-3.02/28 4}

[Event "SF vs SF half q5"]
[Site "MAIN"]
[Date "2020.02.12"]
[Round "139"]
[White "SF 2"]
[Black "SF"]
[Result "1-0"]
[ECO "B99"]
[Opening "Sicilian"]
[Time "11:07:09"]
[TimeControl "20+2"]
[Termination "normal"]
[PlyCount "104"]

1. e4 c5
2. Nf3 d6
3. d4 cxd4
4. Nxd4 Nf6
5. Nc3 a6
6. Bg5 e6
7. f4 Be7
8. Qf3 Qc7
9. O-O-O Nbd7
10. Bd3 h6
11. Bh4 b5
12. e5 {+2.35/30 7} Bb7 {-2.77/27 4}

[Event "SF vs SF half q5"]
[Site "MAIN"]
[Date "2020.02.13"]
[Round "369"]
[White "SF 2"]
[Black "SF"]
[Result "1-0"]
[ECO "A40"]
[Opening "Englund   Gambit"]
[Time "12:26:26"]
[TimeControl "20+2"]
[Termination "normal"]
[PlyCount "132"]

1. d4 e5
2. dxe5 Nc6
3. Nf3 Qe7
4. Bg5 {+1.68/23 2} f6 {-2.23/26 7}

[Event "SF vs SF half q5"]
[Site "MAIN"]
[Date "2020.02.13"]
[Round "443"]
[White "SF 2"]
[Black "SF"]
[Result "1-0"]
[ECO "E94"]
[Opening "King's Indian"]
[Time "17:08:26"]
[TimeControl "20+2"]
[Termination "normal"]
[PlyCount "108"]

1. d4 Nf6
2. c4 g6
3. Nc3 Bg7
4. e4 d6
5. Nf3 O-O
6. Be2 e5
7. O-O Nbd7
8. Be3 c6
9. d5 {+1.29/23 1} cxd5 {-1.61/29 16}

[Event "SF vs SF half q5"]
[Site "MAIN"]
[Date "2020.02.13"]
[Round "449"]
[White "SF 2"]
[Black "SF"]
[Result "1-0"]
[ECO "B01"]
[Opening "Scandinavian"]
[Time "17:28:49"]
[TimeControl "20+2"]
[Termination "normal"]
[PlyCount "82"]

1. e4 d5
2. exd5 Qxd5
3. Nc3 Qd6
4. d4 Nf6
5. Nf3 a6
6. g3 Bg4 {-1.56/27 11}

[Event "SF vs SF half q5"]
[Site "MAIN"]
[Date "2020.02.13"]
[Round "473"]
[White "SF 2"]
[Black "SF"]
[Result "1-0"]
[ECO "E97"]
[Opening "King's Indian"]
[Time "19:04:05"]
[TimeControl "20+2"]
[Termination "normal"]
[PlyCount "136"]

1. d4 Nf6
2. c4 g6
3. Nc3 Bg7
4. e4 d6
5. Nf3 O-O
6. Be2 e5
7. O-O Nc6
8. d5 Ne7
9. b4 Nh5
10. Re1 f5
11. Ng5 Nf6 {-1.00/26 5}

[Event "SF vs SF half q5"]
[Site "MAIN"]
[Date "2020.02.14"]
[Round "585"]
[White "SF 2"]
[Black "SF"]
[Result "1-0"]
[ECO "C41"]
[Opening "Philidor"]
[Time "04:41:52"]
[TimeControl "20+2"]
[Termination "normal"]
[PlyCount "110"]

1. e4 e5
2. Nf3 d6
3. d4 f5
4. Bc4 {+2.32/21 3} Nc6 {-3.07/27 16}

[Event "SF vs SF half q5"]
[Site "MAIN"]
[Date "2020.02.14"]
[Round "609"]
[White "SF 2"]
[Black "SF"]
[Result "1-0"]
[ECO "C63"]
[Opening "Spanish"]
[Time "06:01:56"]
[TimeControl "20+2"]
[Termination "normal"]
[PlyCount "172"]

1. e4 e5
2. Nf3 Nc6
3. Bb5 f5
4. d3 {+0.53/20 1} fxe4 {-1.30/22 2}

[Event "SF vs SF half q5"]
[Site "MAIN"]
[Date "2020.02.14"]
[Round "645"]
[White "SF 2"]
[Black "SF"]
[Result "1-0"]
[ECO "B92"]
[Opening "Sicilian"]
[Time "08:25:51"]
[TimeControl "20+2"]
[Termination "normal"]
[PlyCount "148"]

1. e4 c5
2. Nf3 d6
3. d4 cxd4
4. Nxd4 Nf6
5. Nc3 a6
6. Be2 e5
7. Nb3 Be7
8. Be3 O-O
9. g4 Be6
10. g5 Nfd7 {-1.08/24 2}

[Event "SF vs SF half q5"]
[Site "MAIN"]
[Date "2020.02.14"]
[Round "679"]
[White "SF 2"]
[Black "SF"]
[Result "1-0"]
[ECO "B03"]
[Opening "Alekhine"]
[Time "10:50:49"]
[TimeControl "20+2"]
[Termination "normal"]
[PlyCount "108"]

1. e4 Nf6
2. e5 Nd5
3. d4 d6
4. c4 Nb6
5. f4 dxe5
6. fxe5 c5
7. d5 e6
8. Nc3 exd5
9. cxd5 c4
10. d6 {+1.77/21 1} Nc6 {-2.27/24 3}

[Event "SF vs SF half q5"]
[Site "MAIN"]
[Date "2020.02.14"]
[Round "721"]
[White "SF 2"]
[Black "SF"]
[Result "1-0"]
[ECO "C61"]
[Opening "Spanish"]
[Time "13:44:52"]
[TimeControl "20+2"]
[Termination "normal"]
[PlyCount "106"]

1. e4 e5
2. Nf3 Nc6
3. Bb5 Nd4
4. Nxd4 exd4
5. O-O c6 {-1.19/24 2}

[Event "SF vs SF half q5"]
[Site "MAIN"]
[Date "2020.02.14"]
[Round "805"]
[White "SF 2"]
[Black "SF"]
[Result "1-0"]
[ECO "C11"]
[Opening "French"]
[Time "18:41:29"]
[TimeControl "20+2"]
[Termination "normal"]
[PlyCount "98"]

1. e4 e6
2. d4 d5
3. Nc3 Nf6
4. e5 Nfd7
5. f4 c5
6. Nf3 Nc6
7. Be3 Qb6
8. Na4 Qa5+
9. c3 cxd4
10. b4 Nxb4 {-1.20/26 3}

[Event "SF vs SF half q5"]
[Site "MAIN"]
[Date "2020.02.14"]
[Round "853"]
[White "SF 2"]
[Black "SF"]
[Result "1-0"]
[ECO "B03"]
[Opening "Alekhine"]
[Time "21:24:53"]
[TimeControl "20+2"]
[Termination "normal"]
[PlyCount "122"]

1. e4 Nf6
2. e5 Nd5
3. d4 d6
4. c4 Nb6
5. f4 dxe5
6. fxe5 c5
7. d5 e6
8. Nc3 exd5
9. cxd5 c4
10. d6 {+1.47/20 2} Nc6 {-2.16/25 7}

[Event "SF vs SF half q5"]
[Site "MAIN"]
[Date "2020.02.15"]
[Round "915"]
[White "SF 2"]
[Black "SF"]
[Result "1-0"]
[ECO "A02"]
[Opening "Bird Opening"]
[Time "01:26:12"]
[TimeControl "20+2"]
[Termination "normal"]
[PlyCount "130"]

1. f4 e5
2. fxe5 d6 {-1.10/24 4}

[Event "SF vs SF half q5"]
[Site "MAIN"]
[Date "2020.02.12"]
[Round "112"]
[White "SF"]
[Black "SF 2"]
[Result "0-1"]
[ECO "C47"]
[Opening "Four Knights"]
[Time "09:43:05"]
[TimeControl "20+2"]
[Termination "normal"]
[PlyCount "111"]

1. e4 e5
2. Nf3 Nc6
3. Nc3 Nf6
4. Nxe5 Nxe5 {+1.69/22 1}

[Event "SF vs SF half q5"]
[Site "MAIN"]
[Date "2020.02.13"]
[Round "462"]
[White "SF"]
[Black "SF 2"]
[Result "0-1"]
[ECO "C37"]
[Opening "KGA"]
[Time "18:32:01"]
[TimeControl "20+2"]
[Termination "normal"]
[PlyCount "123"]

1. e4 e5
2. f4 exf4
3. Nf3 g5
4. Bc4 g4 {+1.03/23 1}
5. Ne5 {-1.39/27 9} Qh4+ {+1.36/18 0}
https://rwbc-chess.de

trollwatch:
Talkchess nowadays is a joke - it is full of trolls/idiots/people stuck in the pleistocene > 80% of the posts fall into this category...
Alayan
Posts: 550
Joined: Tue Nov 19, 2019 8:48 pm
Full name: Alayan Feh

Re: Time handicap tournament. LC0 and SF11

Post by Alayan »

Guenther wrote: Wed Feb 19, 2020 8:29 am (3)You don't understand, probably you never looked at the games at all. Even if you get twice as time over the whole game in average,
time can be accumulated completely different (and will! - also depending on the exact type of tc).

Some games will have a very few crucial moments, which decide the outcome and it is not nice, if the side with half of the time
spends here suddenly more time than the other (which for whatever reason now has less time saved or did not grasp
the crucial situation) for a very few moves and wins just because of this.
Time management is doing what it can with what it has. There is nothing to fix there, how the engine deals with its time is its business and this doesn't taint the comparison of how the version with more time do against the one with less time. Each manage its time just like it would against any other opponent.

That being said, I'd avoid a too high increment-to-base ratio.
User avatar
Guenther
Posts: 4610
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: Time handicap tournament. LC0 and SF11

Post by Guenther »

Alayan wrote: Wed Feb 19, 2020 8:48 am
Guenther wrote: Wed Feb 19, 2020 8:29 am (3)You don't understand, probably you never looked at the games at all. Even if you get twice as time over the whole game in average,
time can be accumulated completely different (and will! - also depending on the exact type of tc).

Some games will have a very few crucial moments, which decide the outcome and it is not nice, if the side with half of the time
spends here suddenly more time than the other (which for whatever reason now has less time saved or did not grasp
the crucial situation) for a very few moves and wins just because of this.
Time management is doing what it can with what it has. There is nothing to fix there, how the engine deals with its time is its business and this doesn't taint the comparison of how the version with more time do against the one with less time. Each manage its time just like it would against any other opponent.
Yawn, yes, but you can choose time controls which guarantee more or less of exact time proportions you wanna have tested.
https://rwbc-chess.de

trollwatch:
Talkchess nowadays is a joke - it is full of trolls/idiots/people stuck in the pleistocene > 80% of the posts fall into this category...
mmt
Posts: 343
Joined: Sun Aug 25, 2019 8:33 am
Full name: .

Re: Time handicap tournament. LC0 and SF11

Post by mmt »

Guenther wrote: Wed Feb 19, 2020 8:29 am You did not understand most of what I had written, I don't know why I should invest my pretty time in more explaining.
BTW do you ever look at the games? Just a few corrections on your wrong assumptions above.
Yep, that's the answer when you don't have an argument.
Guenther wrote: Wed Feb 19, 2020 8:29 am (1) I can show you dozens of examples of lost openings out of that opening file.
First, even if there were some losing ones it doesn't matter much, like I explained and is probably beneficial. Second, go for it. Let's see the winning lines .

Guenther wrote: Wed Feb 19, 2020 8:29 am (2) This is complete nonsense, if the opening file contains enough lines there are no repeated games.
(actually it is even funny you claimed this, as you had several repeated games in your test matches,
it seems your opening file besides lost positions, also contains just 761 openings which puts some pressure
on randomization for 500 start positions.)
You've missed the point. If you _leave the opening book early_, you'll obviously have fewer positions than if you stay with it longer.
Guenther wrote: Wed Feb 19, 2020 8:29 am (3)You don't understand, probably you never looked at the games at all. Even if you get twice as time over the whole game in average,
time can be accumulated completely different (and will! - also depending on the exact type of tc).

Some games will have a very few crucial moments, which decide the outcome and it is not nice, if the side with half of the time
spends here suddenly more time than the other (which for whatever reason now has less time saved or did not grasp
the crucial situation) for a very few moves and wins just because of this.
Sorry, but this makes absolutely no sense. If the engine wastes time where it shouldn't have wasted time, it's its own fault. Who cares if it's nice or not.