Demolito 20180301 released

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
CMCanavessi
Posts: 1142
Joined: Thu Dec 28, 2017 4:06 pm
Location: Argentina

Re: Demolito 20180301 released

Post by CMCanavessi »

Cool, now I can finally replace Discocheck 5.2.1 as this version should be much stronger. Let's see if this version can hold it in the super league of my tournament (discocheck promoted twice, but got demoted instantly, too weak).
Follow my tournament and some Leela gauntlets live at http://twitch.tv/ccls
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Demolito 20180301 released

Post by lucasart »

Graham Banks wrote:Thanks Lucas. Just in time for my next tournament. :)
Thanks. I've noticed that Demolito always under performs in CCRL 40/40 compared to what my local (ultra bullet) testing indicates. I think the reason is the hash pressure: 256mb hash is very low for such a long tc. This version finally improves the hash replacement scheme, and should perform better with hash pressure. Let's wait and see if this gets confirmed.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
tpoppins
Posts: 919
Joined: Tue Nov 24, 2015 9:11 pm
Location: upstate

Re: Demolito 20180301 released

Post by tpoppins »

lucasart wrote:I've noticed that Demolito always under performs in CCRL 40/40 compared to what my local (ultra bullet) testing indicates. I think the reason is the hash pressure: 256mb hash is very low for such a long tc.
I agree. Perhaps we could review the single-core 40/40 testing procedure and raise the hash size to at least 512 MB. I would recommend 1024 MB, but then 4CPU testing would need to be upped to 4096 MB, which might be a problem for testers running systems with 8GB RAM or less.

Another reason for Demolito's results could be that I did both 4CPU gauntlets for the 2017-08-26 version and in retrospect they were rather top-heavy vs. Demolito, for the simple reason that I didn't have that many engines under 3000 Elo installed at the time, viz:

Code: Select all

CCRL 40/4 Rating List - Custom engine selection

              Engine                Elo   +    -   Score  AvOp  Games
Demolito 2017-08-26 64-bit 4CPU    3002  +26  -26  34.1% +111.9   522

Code: Select all

CCRL 40/40 Rating List - Custom engine selection

              Engine                Elo   +    -   Score  AvOp  Games
Demolito 2017-08-26 64-bit 4CPU    2975  +21  -22  40.6%  +61.0   709
Note the average opponents' Elo (AvOp). The new version will meet a more balanced lineup.
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Demolito 20180301 released

Post by lucasart »

tpoppins wrote:
lucasart wrote:I've noticed that Demolito always under performs in CCRL 40/40 compared to what my local (ultra bullet) testing indicates. I think the reason is the hash pressure: 256mb hash is very low for such a long tc.
I agree. Perhaps we could review the single-core 40/40 testing procedure and raise the hash size to at least 512 MB. I would recommend 1024 MB, but then 4CPU testing would need to be upped to 4096 MB, which might be a problem for testers running systems with 8GB RAM or less.
512 would be a good improvement already.
1024 should be ok with 8GB RAM (that leaves 4GB = half the ram for the OS and other processes).

Regarding 4 CPU testing, the main problem of CCRL 40/40 is that none of the testers do any 4 CPU vs. 1 CPU. So you just have a 1 CPU list and a 4 CPU list, with no link between the two. So it's impossible to compare 4 CPU and 1 CPU elo on this list, as the data points used by Ordo for inferring this are extremely few, and also very old.

You guys need to run some 4 CPU vs. 1 CPU. At least a few pivot points to tie the two lists together at the various levels. For example, you could do matches like:
* Stockfish 9 - 1 CPU vs. Shredder 13 - 4 CPU
* Shredder 13 - 1 CPU vs. Nirvana 2.4 - 4 CPU
* Nirvana 2.4 - 1 CPU vs. Arasan 20.3 - 4 CPU
* and a few more down the list in this vein
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
tpoppins
Posts: 919
Joined: Tue Nov 24, 2015 9:11 pm
Location: upstate

Re: Demolito 20180301 released

Post by tpoppins »

lucasart wrote:
tpoppins wrote:
lucasart wrote:I've noticed that Demolito always under performs in CCRL 40/40 compared to what my local (ultra bullet) testing indicates. I think the reason is the hash pressure: 256mb hash is very low for such a long tc.
I agree. Perhaps we could review the single-core 40/40 testing procedure and raise the hash size to at least 512 MB. I would recommend 1024 MB, but then 4CPU testing would need to be upped to 4096 MB, which might be a problem for testers running systems with 8GB RAM or less.
512 would be a good improvement already.
1024 should be ok with 8GB RAM (that leaves 4GB = half the ram for the OS and other processes).
Yes, this would work for 1CPU 40/40 testing, even allowing two concurrent games on such a system.
However, CCRL rules stipulate 4x hash for 4CPU tests, so even a single game would consume 8GB of RAM. thus ruling out any 4CPU testing on boxes with <=8GB of RAM.
lucasart wrote:Regarding 4 CPU testing, the main problem of CCRL 40/40 is that none of the testers do any 4 CPU vs. 1 CPU. So you just have a 1 CPU list and a 4 CPU list, with no link between the two.
Actually, there is such a link: Strelka 5.5 64-bit. That's our anchor that ties the single-core and the 4CPU lists together, in blitz and in 40/40. Of the 200+ of its opponents on that 40/40 page more than 50 are 4CPU entries. I've been also including an occasional non-SMP engine or two other than Strelka in various 4CPU gauntlets lately, as Gabor points out; initially this was prompted by purely pragmatic reasons (not having to trackdown and install SMP engines in a specific rating range), later it occurred to me that such "cross-pollination" may help both lists (1CPU and 4CPU) by balancing them out.
lucasart wrote:You guys need to run some 4 CPU vs. 1 CPU. At least a few pivot points to tie the two lists together at the various levels. For example, you could do matches like:
* Stockfish 9 - 1 CPU vs. Shredder 13 - 4 CPU
* Shredder 13 - 1 CPU vs. Nirvana 2.4 - 4 CPU
* Nirvana 2.4 - 1 CPU vs. Arasan 20.3 - 4 CPU
* and a few more down the list in this vein
An excellent idea! So far what I described doing above has been rather random; your proposal puts it on systematic basis and should provide for some interesting individual results, possibly leading to some surprises. Not to mention the beneficial effect on both lists in the long run -- but that will take months to produce and may be not easy to measure.

Thank you for the suggestions, Lucas.
tpoppins
Posts: 919
Joined: Tue Nov 24, 2015 9:11 pm
Location: upstate

Re: Demolito 20180301 released

Post by tpoppins »

SzG wrote:PS. Anyway, a 4CPU list can be obtained by adding 100 Elo the each of the engines on the 1CPU list. :wink:
Simply brilliant, Gabor! We should just dispense with 4CPU lists altogether and add an extra column to the 1CPU lists where a script would automatically fill in the "1CPU rating + 100" value.

We no longer could call it Elo, though. How 'bout SzElo? Rhymes with Jell-O and is pronounced almost the same to boot -- mmm, yummy! :D
tpoppins
Posts: 919
Joined: Tue Nov 24, 2015 9:11 pm
Location: upstate

Re: Demolito 20180301 released

Post by tpoppins »

I was only half-kidding as well. That magic formula, if it indeed exists, could free up a tremendous amount of testing resources; instead of 500-600 games per engine we could have thousands with much better error margins.

Perhaps if you started a new thread in the Technical subforum and pool together the efforts of the best authorities on the subject (Kai, Andreas, the Komodo guys, HGM, Peter Osterlund et al) you could really get it worked out.

For now what I see is an average of 50-70 Elo difference, with extremes as high as 112 (Fritz 16) and as low as 29 (Fruit Reloaded 3.2.1)
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Demolito 20180301 released

Post by lucasart »

tpoppins wrote:I was only half-kidding as well. That magic formula, if it indeed exists, could free up a tremendous amount of testing resources; instead of 500-600 games per engine we could have thousands with much better error margins.

Perhaps if you started a new thread in the Technical subforum and pool together the efforts of the best authorities on the subject (Kai, Andreas, the Komodo guys, HGM, Peter Osterlund et al) you could really get it worked out.

For now what I see is an average of 50-70 Elo difference, with extremes as high as 112 (Fritz 16) and as low as 29 (Fruit Reloaded 3.2.1)
Same with long time controls. You can simply do your tests in some faster tc, like 2'+2" per game or so. Then just rescale to long time control. The only thing increasing the time control does is to shrink the elo scale by increasing the draw rate. That would definitely free up resources.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
tpoppins
Posts: 919
Joined: Tue Nov 24, 2015 9:11 pm
Location: upstate

Re: Demolito 20180301 released

Post by tpoppins »

I like your sense of humor, Lucas. You definitely have the feeling for the subtle.
carldaman
Posts: 2283
Joined: Sat Jun 02, 2012 2:13 am

Re: Demolito 20180301 released

Post by carldaman »

tpoppins wrote:
lucasart wrote:
tpoppins wrote:
lucasart wrote:I've noticed that Demolito always under performs in CCRL 40/40 compared to what my local (ultra bullet) testing indicates. I think the reason is the hash pressure: 256mb hash is very low for such a long tc.
I agree. Perhaps we could review the single-core 40/40 testing procedure and raise the hash size to at least 512 MB. I would recommend 1024 MB, but then 4CPU testing would need to be upped to 4096 MB, which might be a problem for testers running systems with 8GB RAM or less.
512 would be a good improvement already.
1024 should be ok with 8GB RAM (that leaves 4GB = half the ram for the OS and other processes).
Yes, this would work for 1CPU 40/40 testing, even allowing two concurrent games on such a system.
However, CCRL rules stipulate 4x hash for 4CPU tests, so even a single game would consume 8GB of RAM. thus ruling out any 4CPU testing on boxes with <=8GB of RAM.
lucasart wrote:Regarding 4 CPU testing, the main problem of CCRL 40/40 is that none of the testers do any 4 CPU vs. 1 CPU. So you just have a 1 CPU list and a 4 CPU list, with no link between the two.
Actually, there is such a link: Strelka 5.5 64-bit. That's our anchor that ties the single-core and the 4CPU lists together, in blitz and in 40/40. Of the 200+ of its opponents on that 40/40 page more than 50 are 4CPU entries. I've been also including an occasional non-SMP engine or two other than Strelka in various 4CPU gauntlets lately, as Gabor points out; initially this was prompted by purely pragmatic reasons (not having to trackdown and install SMP engines in a specific rating range), later it occurred to me that such "cross-pollination" may help both lists (1CPU and 4CPU) by balancing them out.
lucasart wrote:You guys need to run some 4 CPU vs. 1 CPU. At least a few pivot points to tie the two lists together at the various levels. For example, you could do matches like:
* Stockfish 9 - 1 CPU vs. Shredder 13 - 4 CPU
* Shredder 13 - 1 CPU vs. Nirvana 2.4 - 4 CPU
* Nirvana 2.4 - 1 CPU vs. Arasan 20.3 - 4 CPU
* and a few more down the list in this vein
An excellent idea! So far what I described doing above has been rather random; your proposal puts it on systematic basis and should provide for some interesting individual results, possibly leading to some surprises. Not to mention the beneficial effect on both lists in the long run -- but that will take months to produce and may be not easy to measure.

Thank you for the suggestions, Lucas.
You could also have Stockfish 1 CPU vs Houdini or Komodo 4 CPU, so that the top 1 core engine can find some stronger opponents, and so on.