Demolito 20180301 released
Moderators: hgm, Rebel, chrisw
-
- Posts: 1142
- Joined: Thu Dec 28, 2017 4:06 pm
- Location: Argentina
Re: Demolito 20180301 released
Cool, now I can finally replace Discocheck 5.2.1 as this version should be much stronger. Let's see if this version can hold it in the super league of my tournament (discocheck promoted twice, but got demoted instantly, too weak).
Follow my tournament and some Leela gauntlets live at http://twitch.tv/ccls
-
- Posts: 3232
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: Demolito 20180301 released
Thanks. I've noticed that Demolito always under performs in CCRL 40/40 compared to what my local (ultra bullet) testing indicates. I think the reason is the hash pressure: 256mb hash is very low for such a long tc. This version finally improves the hash replacement scheme, and should perform better with hash pressure. Let's wait and see if this gets confirmed.Graham Banks wrote:Thanks Lucas. Just in time for my next tournament.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
-
- Posts: 919
- Joined: Tue Nov 24, 2015 9:11 pm
- Location: upstate
Re: Demolito 20180301 released
I agree. Perhaps we could review the single-core 40/40 testing procedure and raise the hash size to at least 512 MB. I would recommend 1024 MB, but then 4CPU testing would need to be upped to 4096 MB, which might be a problem for testers running systems with 8GB RAM or less.lucasart wrote:I've noticed that Demolito always under performs in CCRL 40/40 compared to what my local (ultra bullet) testing indicates. I think the reason is the hash pressure: 256mb hash is very low for such a long tc.
Another reason for Demolito's results could be that I did both 4CPU gauntlets for the 2017-08-26 version and in retrospect they were rather top-heavy vs. Demolito, for the simple reason that I didn't have that many engines under 3000 Elo installed at the time, viz:
Code: Select all
CCRL 40/4 Rating List - Custom engine selection
Engine Elo + - Score AvOp Games
Demolito 2017-08-26 64-bit 4CPU 3002 +26 -26 34.1% +111.9 522
Code: Select all
CCRL 40/40 Rating List - Custom engine selection
Engine Elo + - Score AvOp Games
Demolito 2017-08-26 64-bit 4CPU 2975 +21 -22 40.6% +61.0 709
-
- Posts: 3232
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: Demolito 20180301 released
512 would be a good improvement already.tpoppins wrote:I agree. Perhaps we could review the single-core 40/40 testing procedure and raise the hash size to at least 512 MB. I would recommend 1024 MB, but then 4CPU testing would need to be upped to 4096 MB, which might be a problem for testers running systems with 8GB RAM or less.lucasart wrote:I've noticed that Demolito always under performs in CCRL 40/40 compared to what my local (ultra bullet) testing indicates. I think the reason is the hash pressure: 256mb hash is very low for such a long tc.
1024 should be ok with 8GB RAM (that leaves 4GB = half the ram for the OS and other processes).
Regarding 4 CPU testing, the main problem of CCRL 40/40 is that none of the testers do any 4 CPU vs. 1 CPU. So you just have a 1 CPU list and a 4 CPU list, with no link between the two. So it's impossible to compare 4 CPU and 1 CPU elo on this list, as the data points used by Ordo for inferring this are extremely few, and also very old.
You guys need to run some 4 CPU vs. 1 CPU. At least a few pivot points to tie the two lists together at the various levels. For example, you could do matches like:
* Stockfish 9 - 1 CPU vs. Shredder 13 - 4 CPU
* Shredder 13 - 1 CPU vs. Nirvana 2.4 - 4 CPU
* Nirvana 2.4 - 1 CPU vs. Arasan 20.3 - 4 CPU
* and a few more down the list in this vein
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
-
- Posts: 919
- Joined: Tue Nov 24, 2015 9:11 pm
- Location: upstate
Re: Demolito 20180301 released
Yes, this would work for 1CPU 40/40 testing, even allowing two concurrent games on such a system.lucasart wrote:512 would be a good improvement already.tpoppins wrote:I agree. Perhaps we could review the single-core 40/40 testing procedure and raise the hash size to at least 512 MB. I would recommend 1024 MB, but then 4CPU testing would need to be upped to 4096 MB, which might be a problem for testers running systems with 8GB RAM or less.lucasart wrote:I've noticed that Demolito always under performs in CCRL 40/40 compared to what my local (ultra bullet) testing indicates. I think the reason is the hash pressure: 256mb hash is very low for such a long tc.
1024 should be ok with 8GB RAM (that leaves 4GB = half the ram for the OS and other processes).
However, CCRL rules stipulate 4x hash for 4CPU tests, so even a single game would consume 8GB of RAM. thus ruling out any 4CPU testing on boxes with <=8GB of RAM.
Actually, there is such a link: Strelka 5.5 64-bit. That's our anchor that ties the single-core and the 4CPU lists together, in blitz and in 40/40. Of the 200+ of its opponents on that 40/40 page more than 50 are 4CPU entries. I've been also including an occasional non-SMP engine or two other than Strelka in various 4CPU gauntlets lately, as Gabor points out; initially this was prompted by purely pragmatic reasons (not having to trackdown and install SMP engines in a specific rating range), later it occurred to me that such "cross-pollination" may help both lists (1CPU and 4CPU) by balancing them out.lucasart wrote:Regarding 4 CPU testing, the main problem of CCRL 40/40 is that none of the testers do any 4 CPU vs. 1 CPU. So you just have a 1 CPU list and a 4 CPU list, with no link between the two.
An excellent idea! So far what I described doing above has been rather random; your proposal puts it on systematic basis and should provide for some interesting individual results, possibly leading to some surprises. Not to mention the beneficial effect on both lists in the long run -- but that will take months to produce and may be not easy to measure.lucasart wrote:You guys need to run some 4 CPU vs. 1 CPU. At least a few pivot points to tie the two lists together at the various levels. For example, you could do matches like:
* Stockfish 9 - 1 CPU vs. Shredder 13 - 4 CPU
* Shredder 13 - 1 CPU vs. Nirvana 2.4 - 4 CPU
* Nirvana 2.4 - 1 CPU vs. Arasan 20.3 - 4 CPU
* and a few more down the list in this vein
Thank you for the suggestions, Lucas.
-
- Posts: 919
- Joined: Tue Nov 24, 2015 9:11 pm
- Location: upstate
Re: Demolito 20180301 released
Simply brilliant, Gabor! We should just dispense with 4CPU lists altogether and add an extra column to the 1CPU lists where a script would automatically fill in the "1CPU rating + 100" value.SzG wrote:PS. Anyway, a 4CPU list can be obtained by adding 100 Elo the each of the engines on the 1CPU list. :wink:
We no longer could call it Elo, though. How 'bout SzElo? Rhymes with Jell-O and is pronounced almost the same to boot -- mmm, yummy! :D
-
- Posts: 919
- Joined: Tue Nov 24, 2015 9:11 pm
- Location: upstate
Re: Demolito 20180301 released
I was only half-kidding as well. That magic formula, if it indeed exists, could free up a tremendous amount of testing resources; instead of 500-600 games per engine we could have thousands with much better error margins.
Perhaps if you started a new thread in the Technical subforum and pool together the efforts of the best authorities on the subject (Kai, Andreas, the Komodo guys, HGM, Peter Osterlund et al) you could really get it worked out.
For now what I see is an average of 50-70 Elo difference, with extremes as high as 112 (Fritz 16) and as low as 29 (Fruit Reloaded 3.2.1)
Perhaps if you started a new thread in the Technical subforum and pool together the efforts of the best authorities on the subject (Kai, Andreas, the Komodo guys, HGM, Peter Osterlund et al) you could really get it worked out.
For now what I see is an average of 50-70 Elo difference, with extremes as high as 112 (Fritz 16) and as low as 29 (Fruit Reloaded 3.2.1)
-
- Posts: 3232
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: Demolito 20180301 released
Same with long time controls. You can simply do your tests in some faster tc, like 2'+2" per game or so. Then just rescale to long time control. The only thing increasing the time control does is to shrink the elo scale by increasing the draw rate. That would definitely free up resources.tpoppins wrote:I was only half-kidding as well. That magic formula, if it indeed exists, could free up a tremendous amount of testing resources; instead of 500-600 games per engine we could have thousands with much better error margins.
Perhaps if you started a new thread in the Technical subforum and pool together the efforts of the best authorities on the subject (Kai, Andreas, the Komodo guys, HGM, Peter Osterlund et al) you could really get it worked out.
For now what I see is an average of 50-70 Elo difference, with extremes as high as 112 (Fritz 16) and as low as 29 (Fruit Reloaded 3.2.1)
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
-
- Posts: 919
- Joined: Tue Nov 24, 2015 9:11 pm
- Location: upstate
Re: Demolito 20180301 released
I like your sense of humor, Lucas. You definitely have the feeling for the subtle.
-
- Posts: 2283
- Joined: Sat Jun 02, 2012 2:13 am
Re: Demolito 20180301 released
You could also have Stockfish 1 CPU vs Houdini or Komodo 4 CPU, so that the top 1 core engine can find some stronger opponents, and so on.tpoppins wrote:Yes, this would work for 1CPU 40/40 testing, even allowing two concurrent games on such a system.lucasart wrote:512 would be a good improvement already.tpoppins wrote:I agree. Perhaps we could review the single-core 40/40 testing procedure and raise the hash size to at least 512 MB. I would recommend 1024 MB, but then 4CPU testing would need to be upped to 4096 MB, which might be a problem for testers running systems with 8GB RAM or less.lucasart wrote:I've noticed that Demolito always under performs in CCRL 40/40 compared to what my local (ultra bullet) testing indicates. I think the reason is the hash pressure: 256mb hash is very low for such a long tc.
1024 should be ok with 8GB RAM (that leaves 4GB = half the ram for the OS and other processes).
However, CCRL rules stipulate 4x hash for 4CPU tests, so even a single game would consume 8GB of RAM. thus ruling out any 4CPU testing on boxes with <=8GB of RAM.
Actually, there is such a link: Strelka 5.5 64-bit. That's our anchor that ties the single-core and the 4CPU lists together, in blitz and in 40/40. Of the 200+ of its opponents on that 40/40 page more than 50 are 4CPU entries. I've been also including an occasional non-SMP engine or two other than Strelka in various 4CPU gauntlets lately, as Gabor points out; initially this was prompted by purely pragmatic reasons (not having to trackdown and install SMP engines in a specific rating range), later it occurred to me that such "cross-pollination" may help both lists (1CPU and 4CPU) by balancing them out.lucasart wrote:Regarding 4 CPU testing, the main problem of CCRL 40/40 is that none of the testers do any 4 CPU vs. 1 CPU. So you just have a 1 CPU list and a 4 CPU list, with no link between the two.
An excellent idea! So far what I described doing above has been rather random; your proposal puts it on systematic basis and should provide for some interesting individual results, possibly leading to some surprises. Not to mention the beneficial effect on both lists in the long run -- but that will take months to produce and may be not easy to measure.lucasart wrote:You guys need to run some 4 CPU vs. 1 CPU. At least a few pivot points to tie the two lists together at the various levels. For example, you could do matches like:
* Stockfish 9 - 1 CPU vs. Shredder 13 - 4 CPU
* Shredder 13 - 1 CPU vs. Nirvana 2.4 - 4 CPU
* Nirvana 2.4 - 1 CPU vs. Arasan 20.3 - 4 CPU
* and a few more down the list in this vein
Thank you for the suggestions, Lucas.