Page 1 of 1
Gaviota TBs Probing Code (v0.2) UPDATE, Bitbases on the fly
Posted: Sat Mar 20, 2010 9:27 pm
by michiguel
A new release v0.2 is available at
http://sites.google.com/site/gaviotache ... e/releases
or
http://github.com/michiguel/Gaviota-Tablebases
I do not think that the interface will change anymore after this release. So the next steps will be optimization, code cleanup, and wait if there is any bug. After that, I will hopefully rename it v1.0
Main changes in v0.2
I included functions that only probes Win-Draw-Loss info from the TBs. These functions can be used when the distance to mate is not needed, which the most common case during search. I implemented a second cache that stores the WDL info. Basically, these are bit-bases built on the fly and overall the performance should be better than plain TBs. This scheme saves a LOT of space and multiplies the efficiency of the cache 8x.
In a previous intermediate release (v0.1.6.1), I provided just the interface but now the whole thing is fully functional. These WDL functions come in two flavors, soft and hard, like the regular functions (soft probes only the cache, and never the HD). If you already implemented this interface, you do not have to change anything, and you will notice a boost in performance by upgrading the APIS.
The only change is that tbcache_init() has a new parameter (0-128) that tells how much the memory cache should be split between WDL and DTM info. Before, you had tbcache_init(cache_size), now (from tbprobe.c) you have:
Code: Select all
/*
| initialize tb cache. 96 is the fraction, over 128, that will be
| dedicated to wdl information. In other words, 3/4 of the cache
| will be dedicated to win-draw-loss info, and 1/4 dedicated to
| distance to mate information.
*/
tbcache_init(cache_size, 96);
it is suggested that most of the cache should be dedicated to WDL info. So, 96 is a good number but maybe and even higher could be good too. If you select 128, you basically do not use cache for DTM information. If you select 0, you eliminate the WDL cache and use only DTM cache. The total memory use is always cache_size.
Other changes:
From 0.1.6.1 to 0.2
- - Change char **paths to const char **paths
- All warnings in the intel and Microsoft compilers have been silenced
- malloc()'s have been casted to avoid error messages if the APIS are compiled in C++
- Modify verbose output of tb_init
- Set new STATS
From 0.1.2 to 0.1.6.1
- - I included tbcache_flush(). Sometimes you want to erase the tbcache to have reproducibility (epd tests etc).
- I cleaned up and remove a lot of commented code that was not needed.
Some new were included, until I decide they will not be needed anymore.
These are the type of stats that can be obtained
Code: Select all
WDL CACHE STATS
easy hits: 35760
hard probes: 6903
soft probes: 79900
size: 50331648
occupancy: 6.2 %
DTM CACHE STATS
easy hits: 48425
hard probes: 8040
soft probes: 200502
size: 16777216
occupancy: 100.0 %
GENERAL GTB PROBE STATS
Total hits: 86239
memory hits: 84185
drive hits: 2054
drive miss: 0
bytes read: 7481741
files opened: 36
mem. efficiency: 97.6 %
http://github.com/michiguel/Gaviota-Tablebases
Re: Gaviota TBs Probing Code (v0.2) UPDATE, Bitbases on the
Posted: Sat Mar 20, 2010 9:57 pm
by Aaron Becker
This is an excellent feature. I plan to test a lot of different TB strategies in the near future, and this allows some very interesting new possibilities. Thanks for making all your hard work in this area freely available.
Re: Gaviota TBs Probing Code (v0.2) UPDATE, Bitbases on the
Posted: Sat Mar 20, 2010 11:16 pm
by Edmund
First of all, great work!
Concerning DTM vs WDL Tablebases; I think now that both is available only the latter makes sense during search and DTM is only used at the root level. So I wonder whether Probe soft and cache for DTM is actually required.
Re: Gaviota TBs Probing Code (v0.2) UPDATE, Bitbases on the
Posted: Sun Mar 21, 2010 5:35 am
by michiguel
Edmund wrote:First of all, great work!
Concerning DTM vs WDL Tablebases; I think now that both is available only the latter makes sense during search and DTM is only used at the root level. So I wonder whether Probe soft and cache for DTM is actually required.
Yes, it is required to find mates during search, when the position at the root is not in any TB. Most of the cache should be dedicated to store WDL information, but if no DTM information is stored, the engine won't see the checkmates until it is in root. It may be enough to win, but it may not be the fastest win and the play could be unnecessary clumsy. At least a small amount of memory should be dedicated to DTM info, whereas the bulk of information could be stored as WDL. The best ratio remains to be tested.
In this position,
[D]8/3r4/7P/kp4K1/8/1P6/P7/8 b - -
which has been posted here before
Gaviota has 64 MiB of cache (3/4 WDL, 1/4 DTM)
Code: Select all
setboard 8/3r4/7P/kp4K1/8/1P6/P7/8 b - -
d
+-----------------+
| . . . . . . . . | [Black]
| . . . r . . . . |
| . . . . . . . P |
| k p . . . . K . | Castling:
| . . . . . . . . | ep: -
| . P . . . . . . |
| P . . . . . . . |
| . . . . . . . . |
+-----------------+
sd 12
go
********* Starts iterative deepening, thread = 0
set timer to 1000000.000000 , max_ply 12, panic time: 4999999.750000
28 1: 0.0 +4.07 Kb4
267 2: 0.0 +3.96 Kb4 2.Kf5
2133 3: 0.0 +3.88 b4 2.Kf6 Kb5
3287 4 0.0 +3.84 b4 2.Kf5 Rf7+ 3.Ke4 Kb5
7421 4: 0.0 +3.84 b4 2.Kf5 Rf7+ 3.Ke4 Kb5
16040 5 0.1 +3.91 b4 2.Kf6 Kb5 3.Ke5 Kc6
30124 5: 0.1 +3.91 b4 2.Kf6 Kb5 3.Ke5 Kc6
37187 6 0.1 +4.06 b4 2.Kf6 Kb5 3.Kg6 Rd6+ 4.Kg7 Rd2
65026 6: 0.2 +4.06 b4 2.Kf6 Kb5 3.Kg6 Rd6+ 4.Kg7 Rd2
67092 7 0.2 :-) b4
83151 7 0.2 :-) b4
91916 7 0.2 +5.92 b4 2.Kf6 Kb6 3.Ke5 Rh7 4.a4 bxa3
164137 7: 0.4 +5.92 b4 2.Kf6 Kb6 3.Ke5 Rh7 4.a4 bxa3
217951 8 0.5 +6.09 b4 2.Kf6 Kb5 3.Ke5 Rh7 4.Kd5 Rxh6 5.Kd4
372851 8: 0.8 +6.09 b4 2.Kf6 Kb5 3.Ke5 Rh7 4.Kd5 Rxh6 5.Kd4
469137 9 1.0 +6.16 b4 2.Kf6 Kb5 3.Ke5 Rh7 4.Kf4 Rxh6 5.Kg3
Kc5
698458 9: 1.5 +6.16 b4 2.Kf6 Kb5 3.Ke5 Rh7 4.Kf4 Rxh6 5.Kg3
Kc5
899176 10 1.9 :-) b4
1144023 10 2.4 :-) b4
1165094 10 2.4 +8.57 b4 2.Kf4 Kb5 3.Ke3 Rh7 4.Kd2 Rxh6 5.Ke3
Rh2 6.a3 Rh3+ 7.Kd4 bxa3
2169364 10: 4.5 +8.57 b4 2.Kf4 Kb5 3.Ke3 Rh7 4.Kd2 Rxh6 5.Ke3
Rh2 6.a3 Rh3+ 7.Kd4 bxa3
2175514 11 4.5 :-) b4
2181218 11 4.5 :-) b4
3410505 11 6.8 +Mat_26 b4 2.Kf5 Rd2 3.h7 Rh2 4.Kg6 Kb5 5.Kg7
Rxh7+ 6.Kxh7 <=TABLE
5026765 11: 10.1 +Mat_26 b4 2.Kf5 Rd2 3.h7 Rh2 4.Kg6 Kb5 5.Kg7
Rxh7+ 6.Kxh7 <=TABLE
8094759 12 16.6 +Mat_24 b4 2.Kg6 Kb5 3.h7 Rxh7 4.Kxh7 <=TABLE
12171851 12: 25.2 +Mat_24 b4 2.Kg6 Kb5 3.h7 Rxh7 4.Kxh7 <=TABLE
WDL CACHE STATS
easy hits: 34722
hard probes: 6903
soft probes: 78064
size: 50331648
occupancy: 6.0 %
DTM CACHE STATS
easy hits: 43380
hard probes: 7543
soft probes: 172851
size: 16777216
occupancy: 100.0 %
GENERAL GTB PROBE STATS
total hits: 79941
memory hits: 78102
drive hits: 1839
drive misses: 0
bytes read: 6658430
files opened: 33
mem. efficiency: 97.7 %
but if all the cache is dedicated to WDL, except one block of 32k (minimum, it cannot be zero by design) to DTM, we have the following. The search is identical until the engine finds a mate. But, the shortest mate is not found and the search is delayed.
Code: Select all
setboard 8/3r4/7P/kp4K1/8/1P6/P7/8 b - -
d
+-----------------+
| . . . . . . . . | [Black]
| . . . r . . . . |
| . . . . . . . P |
| k p . . . . K . | Castling:
| . . . . . . . . | ep: -
| . P . . . . . . |
| P . . . . . . . |
| . . . . . . . . |
+-----------------+
sd 12
go
********* Starts iterative deepening, thread = 0
set timer to 1000000.000000 , max_ply 12, panic time: 4999999.750000
28 1: 0.0 +4.07 Kb4
267 2: 0.0 +3.96 Kb4 2.Kf5
2133 3: 0.0 +3.88 b4 2.Kf6 Kb5
3287 4 0.0 +3.84 b4 2.Kf5 Rf7+ 3.Ke4 Kb5
7421 4: 0.0 +3.84 b4 2.Kf5 Rf7+ 3.Ke4 Kb5
16040 5 0.1 +3.91 b4 2.Kf6 Kb5 3.Ke5 Kc6
30124 5: 0.1 +3.91 b4 2.Kf6 Kb5 3.Ke5 Kc6
37187 6 0.1 +4.06 b4 2.Kf6 Kb5 3.Kg6 Rd6+ 4.Kg7 Rd2
65026 6: 0.2 +4.06 b4 2.Kf6 Kb5 3.Kg6 Rd6+ 4.Kg7 Rd2
67092 7 0.2 :-) b4
83151 7 0.2 :-) b4
91916 7 0.2 +5.92 b4 2.Kf6 Kb6 3.Ke5 Rh7 4.a4 bxa3
164137 7: 0.4 +5.92 b4 2.Kf6 Kb6 3.Ke5 Rh7 4.a4 bxa3
217951 8 0.5 +6.09 b4 2.Kf6 Kb5 3.Ke5 Rh7 4.Kd5 Rxh6 5.Kd4
372851 8: 0.9 +6.09 b4 2.Kf6 Kb5 3.Ke5 Rh7 4.Kd5 Rxh6 5.Kd4
469137 9 1.0 +6.16 b4 2.Kf6 Kb5 3.Ke5 Rh7 4.Kf4 Rxh6 5.Kg3
Kc5
698458 9: 1.5 +6.16 b4 2.Kf6 Kb5 3.Ke5 Rh7 4.Kf4 Rxh6 5.Kg3
Kc5
899176 10 1.9 :-) b4
1144023 10 2.4 :-) b4
1165094 10 2.4 +8.57 b4 2.Kf4 Kb5 3.Ke3 Rh7 4.Kd2 Rxh6 5.Ke3
Rh2 6.a3 Rh3+ 7.Kd4 bxa3
2169364 10: 4.5 +8.57 b4 2.Kf4 Kb5 3.Ke3 Rh7 4.Kd2 Rxh6 5.Ke3
Rh2 6.a3 Rh3+ 7.Kd4 bxa3
2175514 11 4.5 :-) b4
2181218 11 4.5 :-) b4
3723335 11 8.2 +Mat_28 b4 2.Kf5 Kb5 3.Ke4 Rh7 4.Kd3 Rxh6 5.Kc2
Rh2+ 6.Kb1 Rxa2 <=TABLE
5817571 11: 14.3 +Mat_28 b4 2.Kf5 Kb5 3.Ke4 Rh7 4.Kd3 Rxh6 5.Kc2
Rh2+ 6.Kb1 Rxa2 <=TABLE
7409930 12 19.3 +Mat_27 b4 2.Kf5 Rd2 3.h7 Rh2 4.Kf4 Rxh7 5.Kg3
Kb5 6.Kg2 Rh2+ 7.Kxh2 <=TABLE
12536417 12: 33.2 +Mat_27 b4 2.Kf5 Rd2 3.h7 Rh2 4.Kf4 Rxh7 5.Kg3
Kb5 6.Kg2 Rh2+ 7.Kxh2 <=TABLE
WDL CACHE STATS
easy hits: 36658
hard probes: 7085
soft probes: 84141
size: 67108864
occupancy: 3.9 %
DTM CACHE STATS
easy hits: 10471
hard probes: 16921
soft probes: 319385
size: 32768
occupancy: 100.0 %
GENERAL GTB PROBE STATS
total hits: 59309
memory hits: 47129
drive hits: 12180
drive misses: 0
bytes read: 42331143
files opened: 36
mem. efficiency: 79.5 %
In positions where there is not mate, storing in cache the DTM information may not be necessary, of course.
Miguel
Re: Gaviota TBs Probing Code (v0.2) UPDATE, Bitbases on the
Posted: Sun Mar 21, 2010 5:41 am
by michiguel
I fixed a small bug i found running a stress test with the debug version.
The current version is 0.2.0.2
Miguel
michiguel wrote:A new release v0.2 is available at
http://sites.google.com/site/gaviotache ... e/releases
or
http://github.com/michiguel/Gaviota-Tablebases
I do not think that the interface will change anymore after this release. So the next steps will be optimization, code cleanup, and wait if there is any bug. After that, I will hopefully rename it v1.0
Main changes in v0.2
I included functions that only probes Win-Draw-Loss info from the TBs. These functions can be used when the distance to mate is not needed, which the most common case during search. I implemented a second cache that stores the WDL info. Basically, these are bit-bases built on the fly and overall the performance should be better than plain TBs. This scheme saves a LOT of space and multiplies the efficiency of the cache 8x.
In a previous intermediate release (v0.1.6.1), I provided just the interface but now the whole thing is fully functional. These WDL functions come in two flavors, soft and hard, like the regular functions (soft probes only the cache, and never the HD). If you already implemented this interface, you do not have to change anything, and you will notice a boost in performance by upgrading the APIS.
The only change is that tbcache_init() has a new parameter (0-128) that tells how much the memory cache should be split between WDL and DTM info. Before, you had tbcache_init(cache_size), now (from tbprobe.c) you have:
Code: Select all
/*
| initialize tb cache. 96 is the fraction, over 128, that will be
| dedicated to wdl information. In other words, 3/4 of the cache
| will be dedicated to win-draw-loss info, and 1/4 dedicated to
| distance to mate information.
*/
tbcache_init(cache_size, 96);
it is suggested that most of the cache should be dedicated to WDL info. So, 96 is a good number but maybe and even higher could be good too. If you select 128, you basically do not use cache for DTM information. If you select 0, you eliminate the WDL cache and use only DTM cache. The total memory use is always cache_size.
Other changes:
From 0.1.6.1 to 0.2
- - Change char **paths to const char **paths
- All warnings in the intel and Microsoft compilers have been silenced
- malloc()'s have been casted to avoid error messages if the APIS are compiled in C++
- Modify verbose output of tb_init
- Set new STATS
From 0.1.2 to 0.1.6.1
- - I included tbcache_flush(). Sometimes you want to erase the tbcache to have reproducibility (epd tests etc).
- I cleaned up and remove a lot of commented code that was not needed.
Some new were included, until I decide they will not be needed anymore.
These are the type of stats that can be obtained
Code: Select all
WDL CACHE STATS
easy hits: 35760
hard probes: 6903
soft probes: 79900
size: 50331648
occupancy: 6.2 %
DTM CACHE STATS
easy hits: 48425
hard probes: 8040
soft probes: 200502
size: 16777216
occupancy: 100.0 %
GENERAL GTB PROBE STATS
Total hits: 86239
memory hits: 84185
drive hits: 2054
drive miss: 0
bytes read: 7481741
files opened: 36
mem. efficiency: 97.6 %
http://github.com/michiguel/Gaviota-Tablebases
Re: Gaviota TBs Probing Code (v0.2) UPDATE, Bitbases on the
Posted: Sun Mar 21, 2010 5:58 am
by michiguel
Aaron Becker wrote:This is an excellent feature. I plan to test a lot of different TB strategies in the near future, and this allows some very interesting new possibilities. Thanks for making all your hard work in this area freely available.
I am really happy that you are using your creativity in this topic. One of the reasons why I started to code the TBs was that I truly thought there were ideas not being explored at all. We had an interesting discussion in December that was really inspiring.
The concept that you are trying now with different threads is really an interesting avenue. Regarding that, you may want to try to do it with the uncompressed TBs to see what happens. If you have a thread that the only thing it does is collecting info from the HD, it may not consume any CPU power because it will be reading the HD all the time (you know all this much better than me). But, it you use a compressed version, the thread that is collecting data will use some CPU when it decompresses. it may be faster but is loses parallel efficiency. At least, you could try scheme 2, which decompresses at blazing speed. I originally had some thoughts along those lines, but all that has to be tested. I could be completely wrong too.
Miguel
Re: Gaviota TBs Probing Code (v0.2) UPDATE, Bitbases on the
Posted: Sun Mar 21, 2010 6:47 am
by Aaron Becker
michiguel wrote:Aaron Becker wrote:This is an excellent feature. I plan to test a lot of different TB strategies in the near future, and this allows some very interesting new possibilities. Thanks for making all your hard work in this area freely available.
I am really happy that you are using your creativity in this topic. One of the reasons why I started to code the TBs was that I truly thought there were ideas not being explored at all. We had an interesting discussion in December that was really inspiring.
The concept that you are trying now with different threads is really an interesting avenue. Regarding that, you may want to try to do it with the uncompressed TBs to see what happens. If you have a thread that the only thing it does is collecting info from the HD, it may not consume any CPU power because it will be reading the HD all the time (you know all this much better than me). But, it you use a compressed version, the thread that is collecting data will use some CPU when it decompresses. it may be faster but is loses parallel efficiency. At least, you could try scheme 2, which decompresses at blazing speed. I originally had some thoughts along those lines, but all that has to be tested. I could be completely wrong too.
Miguel
That's a great point. I'll add compression scheme to my list of things to explore. So far I've only tried cp4. Out of curiosity, what rule are you using to determine when to do soft vs. hard probes vs. not probing at all, and when to do DTM vs. WDL probes? Do you have recommendations on this topic? So far my gtb testing has mostly been on correctness rather than performance, so I'll definitely be thinking about how to quantify and improve the effect of gtbs on engine strength for the next release of daydreamer.
Re: Gaviota TBs Probing Code (v0.2) UPDATE, Bitbases on the
Posted: Sun Mar 21, 2010 5:23 pm
by michiguel
Aaron Becker wrote:michiguel wrote:Aaron Becker wrote:This is an excellent feature. I plan to test a lot of different TB strategies in the near future, and this allows some very interesting new possibilities. Thanks for making all your hard work in this area freely available.
I am really happy that you are using your creativity in this topic. One of the reasons why I started to code the TBs was that I truly thought there were ideas not being explored at all. We had an interesting discussion in December that was really inspiring.
The concept that you are trying now with different threads is really an interesting avenue. Regarding that, you may want to try to do it with the uncompressed TBs to see what happens. If you have a thread that the only thing it does is collecting info from the HD, it may not consume any CPU power because it will be reading the HD all the time (you know all this much better than me). But, it you use a compressed version, the thread that is collecting data will use some CPU when it decompresses. it may be faster but is loses parallel efficiency. At least, you could try scheme 2, which decompresses at blazing speed. I originally had some thoughts along those lines, but all that has to be tested. I could be completely wrong too.
Miguel
That's a great point. I'll add compression scheme to my list of things to explore. So far I've only tried cp4. Out of curiosity, what rule are you using to determine when to do soft vs. hard probes vs. not probing at all, and when to do DTM vs. WDL probes? Do you have recommendations on this topic? So far my gtb testing has mostly been on correctness rather than performance, so I'll definitely be thinking about how to quantify and improve the effect of gtbs on engine strength for the next release of daydreamer.
My testing has not been thorough. So far I invested more time with features and tools. My decisions have been erring on "safe side" and have been as simple as possible. I figured that if the slow down in nps is less than 20% the net effect should be positive. Time to reach a certain depth is more important and it is never bigger. But, I have not tested this with many positions. My very preliminary conclusion with Gaviota is that it is ok to probe hard when remaining depth is > 2 and I probe soft everywhere but qsearch. I always probe hard in PV (intuition tell me that it is too important not too do it).
Regarding WDL vs. DTM the scheme is very straightforward. If both alpha and beta are between -MATE200 and +MATE200 I probe WDL because it is guaranteed the value I get back will be good to return immediately. Anything else, I probe in full. Of course, this could be more elaborated but I think it covers almost all important cases.
I am using 3/4 of the cache for WDL, but I am sure that something higher will be optimum. But I know that 3/4 is certainly better than 0. In the future, this should be made dynamic. The relative size of each cache should change according to how much each of them are probed. When we start probing the DTM cache, it means we are reaching a decision in the game and the WDL cache will be less and less useful.
One word of caution: This distinction between WDL and DTM may impact the way we determine our aspiration windows. For instance, rather than expanding our window to (-INF, INF) after certain fail lows or highs, we should expand it first to (-MATE200, +MATE200) not to have the need to probe in full. It may not be required and we would waste it.
Miguel
Re: Gaviota TBs Probing Code (v0.2) UPDATE, Bitbases on the
Posted: Mon Mar 22, 2010 8:23 pm
by jwes
If you move the WDL scores toward 0 as you move up the tree like you do for mate scores, you could only probe dtm at the root, and you would at least find the shortest path to an EGTB hit.
Re: Gaviota TBs Probing Code (v0.2) UPDATE, Bitbases on the
Posted: Mon Mar 22, 2010 8:28 pm
by jwes
michiguel wrote:I included functions that only probes Win-Draw-Loss info from the TBs. These functions can be used when the distance to mate is not needed, which the most common case during search. I implemented a second cache that stores the WDL info. Basically, these are bit-bases built on the fly and overall the performance should be better than plain TBs. This scheme saves a LOT of space and multiplies the efficiency of the cache 8x.
8x? Do you use 16 bits per entry for DTM cache or 1 bit for WDL ? If you use 2 bits for WDL, you could save even more space by putting 5 entries per byte. Encoding is done with multiplies, and you could decode with a table.