OKE - Opening Knowledge Engines

Rebel · Post by **Rebel** » Wed Jun 12, 2019 12:51 am

http://rebel13.nl/2-moves-new.html (6.728 positions vs 2-moves-epd (404) positions. Not much different.

Remarkable - Lc0 still tops at 1 and 2 seconds only doing 30-35 NPS I also noticed that even at 100ms Lc0 would top.

Remarkable, and that's softly speaking.

Next 2 list, 4=5 and 6-moves,

Dann Corbit · Post by **Dann Corbit** » Wed Jun 12, 2019 9:01 am

Rebel wrote: ↑Tue Jun 11, 2019 9:38 pm
Laskos wrote: ↑Tue Jun 11, 2019 9:17 pm Well Ed, why bother? We have the excellent "Strategic Test Suite" which "consists of series of themed test suites designed to evaluate chess engine's long term understanding of strategical and positional concepts" (self-description). Dann and his Rybka were the main contributors to it. Here are some current standings on "strategical and positional concepts" (have no patience testing more engines):
Code: Select all
Stuck to the solution from 1s to 2s of thinking
4 i7 cores at 3.80GHz
RTX 2070 GPU

Houdini 1.5a        score=1339/1500 [averages on correct positions: depth=8.4 time=0.15 nodes=1320734]
Komodo 13.02        score=1319/1500 [averages on correct positions: depth=10.6 time=0.16 nodes=1064120]
Stockfish dev       score=1284/1500 [averages on correct positions: depth=10.8 time=0.18 nodes=1111414]  
Texel 1.07          score=1241/1500 [averages on correct positions: depth=8.9 time=0.22 nodes=1379498]
Arasan 21.0         score=1195/1500 [averages on correct positions: depth=9.5 time=0.24 nodes=1017178]
Lc0 v21.2 ID42524   score=1177/1500 [averages on correct positions: depth=3.8 time=0.14 nodes=1580]
Fruit 2.1           score= 993/1500 [averages on correct positions: depth=5.2 time=0.23 nodes=505132]
I guess your suite might converge to STS as standings go. Good!
You can't compare STS with OKE. STS was created with the help of the top engines of that time, all the Rybka derivatives will top the STS list not only Houdini 1.5

I am given too much credit. The bulk of the work was by Swaminathan. I just pointed engines at it and let them pound. Of course the hour per position (and there were far more positions rejected than accepted) can literally be reproduced in one second with modern hardware and software. My current estimate is that ten percent of the STS positions are wrong. Be that as it may, I am still proud of the achievement. I did a calculation recently. From start to end we averaged one position per day.

Dann Corbit · Post by **Dann Corbit** » Wed Jun 12, 2019 9:04 am

I should mention that I always used the three strongest engines (not just Rybka). By the end of the test set Rybka was long retired.

Rebel · Post by **Rebel** » Wed Jun 12, 2019 9:16 am

Laskos wrote: ↑Tue Jun 11, 2019 9:59 pm Well, after seeing your methodology of building the suite, I think I will stick to my opening suite, which I didn't appreciated too much, maybe I lack a sense of being an "expert" (I always feel a patzer in Chess).

The opening suite was built manually and pretty slowly almost three years ago, before any Lc0, and the boost of confidence came with Lc0 too:
Code: Select all
4 i7 cores at 3.80GHz
RTX 2070 GPU

Openings1000.epd test-suite (stuck to the solution from 1s to 2s of thinking):

Lc0 v21.2 ID42524  757/1000
Lc0 v21.2 ID32930  727/1000
Lc0 v21.2 ID11261  723/1000
Stockfish_dev      574/1000
Houdini 6.03       558/1000
Komodo 13.02       556/1000
Xiphos 0.5         513/1000
Booot 6.3.1        494/1000
Andscacs 0.95      484/1000
Laser 1.7          480/1000
Ethereal 11.25     467/1000
Texel 1.07         419/1000
Rodent III         376/1000
Fruit 2.1          348/1000
BikJump 2.01       276/1000
Predateur 2.2.1    265/1000
Here is the suite:
http://s000.tinyupload.com/?file_id=854 ... 6503996473

The number of positions is 1000, 5 times the same 200 positions, for less noise. My methodology hardly allows for more than 200 positions in some reasonable amount of time (several weekends spent).

Thanks for sharing. We are defenitely pioneering in different areas. My focus is strictly on the first 6 moves. In your 200 many positions finished already the development phase or are close to that. Nevertheless the results are very interesting. With ChessPartner -> Analyze EPD I ran your 200 positions with some engines.

I7 cores at 2.80GHz
one second per move.

Code: Select all

Engine: Lc0 v0.21.2-rc1         125/200
Engine: Stockfish 10             93/200
Engine: Ethereal 11.25           86/200
Engine: Senpai 1.0               78/200
Engine: Mephisto Gideon          76/200
Engine: Xiphos 0.5               72/200
Engine: Laser 1.6                71/200
Engine: Rebel Century            66/200
Engine: Sting SF 9.6             66/200
Engine: Texel 1.06a45            65/200
Engine: Rodent III               64/200
Engine: Rybka 4.1                61/200

I would say it's hard to believe a 2400-2500 rated engine (Gideon) can outperform many 3000+ elo rated engines without admitting something is missing in modern engines. The pattern with OKE remains except for Rodent III.

Dann Corbit · Post by **Dann Corbit** » Wed Jun 12, 2019 9:22 am

The engines that ere interesting here are those that out kicked their coverage. Rodent and Gideon stand out like sore thumbs. How do the differ in opening eval?

Rebel · Post by **Rebel** » Wed Jun 12, 2019 9:31 am

Dann Corbit wrote: ↑Wed Jun 12, 2019 9:01 am
Rebel wrote: ↑Tue Jun 11, 2019 9:38 pm
Laskos wrote: ↑Tue Jun 11, 2019 9:17 pm Well Ed, why bother? We have the excellent "Strategic Test Suite" which "consists of series of themed test suites designed to evaluate chess engine's long term understanding of strategical and positional concepts" (self-description). Dann and his Rybka were the main contributors to it. Here are some current standings on "strategical and positional concepts" (have no patience testing more engines):
Code: Select all
Stuck to the solution from 1s to 2s of thinking
4 i7 cores at 3.80GHz
RTX 2070 GPU

Houdini 1.5a        score=1339/1500 [averages on correct positions: depth=8.4 time=0.15 nodes=1320734]
Komodo 13.02        score=1319/1500 [averages on correct positions: depth=10.6 time=0.16 nodes=1064120]
Stockfish dev       score=1284/1500 [averages on correct positions: depth=10.8 time=0.18 nodes=1111414]  
Texel 1.07          score=1241/1500 [averages on correct positions: depth=8.9 time=0.22 nodes=1379498]
Arasan 21.0         score=1195/1500 [averages on correct positions: depth=9.5 time=0.24 nodes=1017178]
Lc0 v21.2 ID42524   score=1177/1500 [averages on correct positions: depth=3.8 time=0.14 nodes=1580]
Fruit 2.1           score= 993/1500 [averages on correct positions: depth=5.2 time=0.23 nodes=505132]
I guess your suite might converge to STS as standings go. Good!
You can't compare STS with OKE. STS was created with the help of the top engines of that time, all the Rybka derivatives will top the STS list not only Houdini 1.5
I am given too much credit. The bulk of the work was by Swaminathan. I just pointed engines at it and let them pound. Of course the hour per position (and there were far more positions rejected than accepted) can literally be reproduced in one second with modern hardware and software. My current estimate is that ten percent of the STS positions are wrong. Be that as it may, I am still proud of the achievement. I did a calculation recently. From start to end we averaged one position per day.

You did the best you could and likely (or hopefully) many engines profited and starters still can. Whatver OKE will be surely it will be outdated 5-10 years after. My hope is on Lc0.

Rebel · Post by **Rebel** » Wed Jun 12, 2019 9:33 am

Dann Corbit wrote: ↑Wed Jun 12, 2019 9:22 am The engines that ere interesting here are those that out kicked their coverage. Rodent and Gideon stand out like sore thumbs. How do the differ in opening eval?

For Gideon one reason would be, as already stated earlier, no reductions, no futily pruning etc. --> no loss in the quality of eval.

Dann Corbit · Post by **Dann Corbit** » Wed Jun 12, 2019 9:42 am

Rebel wrote: ↑Wed Jun 12, 2019 9:33 am
Dann Corbit wrote: ↑Wed Jun 12, 2019 9:22 am The engines that ere interesting here are those that out kicked their coverage. Rodent and Gideon stand out like sore thumbs. How do the differ in opening eval?
For Gideon one reason would be, as already stated earlier, no reductions, no futily pruning etc. --> no loss in the quality of eval.

That does not explain it for me.
Quite the opposite.
Early on, their are not many striking tactical shots to filter out. And why does the enormous depth increase not overshadow those few tactical oversights that do occur.

I think there is something that does not meet the eye.
Tragically we cannot study what lc0 does since it is a black box that spits out brilliant chess moves. And his turk keeps mum.

PK · Post by PK » Wed Jun 12, 2019 10:15 am

Rodent's score might well be an artifact caused by the fact that it likes fianchetto and finds/completes it more often than other engines.

Other opening-related bits and pieces are penalty for developing queen before minor pieces (but it is quite low and unlikely to influence the result), some code for pawn chains from King's Indian Defence/Attack (unlikely to kick in on move 4) and piece/square table asymmetry causing poor little mouse to dislike pawn on c2 (this one can have some influence).

Laskos · Post by **Laskos** » Wed Jun 12, 2019 10:54 am

Rebel wrote: ↑Wed Jun 12, 2019 9:16 am
Laskos wrote: ↑Tue Jun 11, 2019 9:59 pm Well, after seeing your methodology of building the suite, I think I will stick to my opening suite, which I didn't appreciated too much, maybe I lack a sense of being an "expert" (I always feel a patzer in Chess).

The opening suite was built manually and pretty slowly almost three years ago, before any Lc0, and the boost of confidence came with Lc0 too:
Code: Select all
4 i7 cores at 3.80GHz
RTX 2070 GPU

Openings1000.epd test-suite (stuck to the solution from 1s to 2s of thinking):

Lc0 v21.2 ID42524  757/1000
Lc0 v21.2 ID32930  727/1000
Lc0 v21.2 ID11261  723/1000
Stockfish_dev      574/1000
Houdini 6.03       558/1000
Komodo 13.02       556/1000
Xiphos 0.5         513/1000
Booot 6.3.1        494/1000
Andscacs 0.95      484/1000
Laser 1.7          480/1000
Ethereal 11.25     467/1000
Texel 1.07         419/1000
Rodent III         376/1000
Fruit 2.1          348/1000
BikJump 2.01       276/1000
Predateur 2.2.1    265/1000
Here is the suite:
http://s000.tinyupload.com/?file_id=854 ... 6503996473

The number of positions is 1000, 5 times the same 200 positions, for less noise. My methodology hardly allows for more than 200 positions in some reasonable amount of time (several weekends spent).
Thanks for sharing. We are defenitely pioneering in different areas. My focus is strictly on the first 6 moves. In your 200 many positions finished already the development phase or are close to that. Nevertheless the results are very interesting. With ChessPartner -> Analyze EPD I ran your 200 positions with some engines.

I7 cores at 2.80GHz
one second per move.
Code: Select all
Engine: Lc0 v0.21.2-rc1         125/200
Engine: Stockfish 10             93/200
Engine: Ethereal 11.25           86/200
Engine: Senpai 1.0               78/200
Engine: Mephisto Gideon          76/200
Engine: Xiphos 0.5               72/200
Engine: Laser 1.6                71/200
Engine: Rebel Century            66/200
Engine: Sting SF 9.6             66/200
Engine: Texel 1.06a45            65/200
Engine: Rodent III               64/200
Engine: Rybka 4.1                61/200
I would say it's hard to believe a 2400-2500 rated engine (Gideon) can outperform many 3000+ elo rated engines without admitting something is missing in modern engines. The pattern with OKE remains except for Rodent III.

Yes, Mephisto Gideon seems to perform well here too, but is not ranked as high as in your test:

Code: Select all

4 i7 cores at 3.80GHz
RTX 2070 GPU

Openings1000.epd test-suite (stuck to the solution from 1s to 2s of thinking):

Lc0 v21.2 ID42524  757/1000
Lc0 v21.2 ID32930  727/1000
Lc0 v21.2 ID11261  723/1000
Stockfish_dev      574/1000
Houdini 6.03       558/1000
Komodo 13.02       556/1000
Xiphos 0.5         513/1000
Booot 6.3.1        494/1000
Andscacs 0.95      484/1000
Laser 1.7          480/1000
Ethereal 11.25     467/1000
Texel 1.07         419/1000
Mephisto Gideon    387/1000
Rodent III         376/1000
Fruit 2.1          348/1000
BikJump 2.01       276/1000
Predateur 2.2.1    265/1000

Maybe the testing methodology is a bit different, I am using a bit peculiar one in Polyglot.

Yes, my 200 positions are well into the openings, but only there I found in databases some clear-cut positional "shots" or "decisions" (I have a single "bm" entry in each EPD line, it's either a hit or a miss). In your case, focusing on at most 6 initial moves, and giving these moves "grades" of positional play is beyond my competency. Anyway, databases of strong human games are important in your case too, and engine's analysis is useful only to trim out unwanted tactics. Positionally, aside maybe Lc0 case by case, no engine can help in these early opening moves (no matter how long the engine "thinks").

OKE - Opening Knowledge Engines

Re: OKE - Opening Knowledge Engines

Re: OKE - Opening Knowledge Engines

Re: OKE - Opening Knowledge Engines

Re: OKE - Opening Knowledge Engines

Re: OKE - Opening Knowledge Engines

Re: OKE - Opening Knowledge Engines

Re: OKE - Opening Knowledge Engines

Re: OKE - Opening Knowledge Engines

Re: OKE - Opening Knowledge Engines

Re: OKE - Opening Knowledge Engines