TalkChess.com

Posted: **Sat Mar 20, 2010 9:27 pm**

A new release v0.2 is available at
http://sites.google.com/site/gaviotache ... e/releases
or
http://github.com/michiguel/Gaviota-Tablebases

I do not think that the interface will change anymore after this release. So the next steps will be optimization, code cleanup, and wait if there is any bug. After that, I will hopefully rename it v1.0

Main changes in v0.2

I included functions that only probes Win-Draw-Loss info from the TBs. These functions can be used when the distance to mate is not needed, which the most common case during search. I implemented a second cache that stores the WDL info. Basically, these are bit-bases built on the fly and overall the performance should be better than plain TBs. This scheme saves a LOT of space and multiplies the efficiency of the cache 8x.

In a previous intermediate release (v0.1.6.1), I provided just the interface but now the whole thing is fully functional. These WDL functions come in two flavors, soft and hard, like the regular functions (soft probes only the cache, and never the HD). If you already implemented this interface, you do not have to change anything, and you will notice a boost in performance by upgrading the APIS.
The only change is that tbcache_init() has a new parameter (0-128) that tells how much the memory cache should be split between WDL and DTM info. Before, you had tbcache_init(cache_size), now (from tbprobe.c) you have:

Code: Select all

	/* 	
	|	initialize tb cache. 96 is the fraction, over 128, that will be 
	|	dedicated to wdl information. In other words, 3/4 of the cache
	|	will be dedicated to win-draw-loss info, and 1/4 dedicated to
	|	distance to mate information. 
	*/
	tbcache_init&#40;cache_size, 96&#41;;

it is suggested that most of the cache should be dedicated to WDL info. So, 96 is a good number but maybe and even higher could be good too. If you select 128, you basically do not use cache for DTM information. If you select 0, you eliminate the WDL cache and use only DTM cache. The total memory use is always cache_size.

Other changes:

From 0.1.6.1 to 0.2

- Change char **paths to const char **paths
- All warnings in the intel and Microsoft compilers have been silenced
- malloc()'s have been casted to avoid error messages if the APIS are compiled in C++
- Modify verbose output of tb_init
- Set new STATS

From 0.1.2 to 0.1.6.1

- I included tbcache_flush(). Sometimes you want to erase the tbcache to have reproducibility (epd tests etc).
- I cleaned up and remove a lot of commented code that was not needed.
Some new were included, until I decide they will not be needed anymore.

These are the type of stats that can be obtained

Code: Select all

WDL CACHE STATS
  easy hits&#58;             35760
  hard probes&#58;            6903
  soft probes&#58;           79900
  size&#58;               50331648
  occupancy&#58;               6.2 %

DTM CACHE STATS
  easy hits&#58;             48425
  hard probes&#58;            8040
  soft probes&#58;          200502
  size&#58;               16777216
  occupancy&#58;             100.0 %

GENERAL GTB PROBE STATS
  Total hits&#58;            86239
  memory hits&#58;           84185
  drive hits&#58;             2054
  drive miss&#58;                0
  bytes read&#58;          7481741
  files opened&#58;             36
  mem. efficiency&#58;        97.6 %

http://github.com/michiguel/Gaviota-Tablebases

Posted: **Sat Mar 20, 2010 9:57 pm**

This is an excellent feature. I plan to test a lot of different TB strategies in the near future, and this allows some very interesting new possibilities. Thanks for making all your hard work in this area freely available.

Posted: **Sat Mar 20, 2010 11:16 pm**

First of all, great work!

Concerning DTM vs WDL Tablebases; I think now that both is available only the latter makes sense during search and DTM is only used at the root level. So I wonder whether Probe soft and cache for DTM is actually required.

Posted: **Sun Mar 21, 2010 5:35 am**

Edmund wrote:First of all, great work!

Concerning DTM vs WDL Tablebases; I think now that both is available only the latter makes sense during search and DTM is only used at the root level. So I wonder whether Probe soft and cache for DTM is actually required.

Yes, it is required to find mates during search, when the position at the root is not in any TB. Most of the cache should be dedicated to store WDL information, but if no DTM information is stored, the engine won't see the checkmates until it is in root. It may be enough to win, but it may not be the fastest win and the play could be unnecessary clumsy. At least a small amount of memory should be dedicated to DTM info, whereas the bulk of information could be stored as WDL. The best ratio remains to be tested.

In this position,

[D]8/3r4/7P/kp4K1/8/1P6/P7/8 b - -

which has been posted here before
Gaviota has 64 MiB of cache (3/4 WDL, 1/4 DTM)

Code: Select all

setboard 8/3r4/7P/kp4K1/8/1P6/P7/8 b - -
d
+-----------------+
| . . . . . . . . | &#91;Black&#93;
| . . . r . . . . |
| . . . . . . . P |
| k p . . . . K . |    Castling&#58; 
| . . . . . . . . |    ep&#58; -
| . P . . . . . . |
| P . . . . . . . |
| . . . . . . . . |
+-----------------+

sd 12
go
********* Starts iterative deepening, thread = 0
set timer to 1000000.000000 , max_ply 12, panic time&#58; 4999999.750000
        28   1&#58;      0.0    +4.07  Kb4
       267   2&#58;      0.0    +3.96  Kb4 2.Kf5
      2133   3&#58;      0.0    +3.88  b4 2.Kf6 Kb5
      3287   4       0.0    +3.84  b4 2.Kf5 Rf7+ 3.Ke4 Kb5
      7421   4&#58;      0.0    +3.84  b4 2.Kf5 Rf7+ 3.Ke4 Kb5
     16040   5       0.1    +3.91  b4 2.Kf6 Kb5 3.Ke5 Kc6
     30124   5&#58;      0.1    +3.91  b4 2.Kf6 Kb5 3.Ke5 Kc6
     37187   6       0.1    +4.06  b4 2.Kf6 Kb5 3.Kg6 Rd6+ 4.Kg7 Rd2
     65026   6&#58;      0.2    +4.06  b4 2.Kf6 Kb5 3.Kg6 Rd6+ 4.Kg7 Rd2
     67092   7       0.2      &#58;-)  b4
     83151   7       0.2      &#58;-)  b4
     91916   7       0.2    +5.92  b4 2.Kf6 Kb6 3.Ke5 Rh7 4.a4 bxa3
    164137   7&#58;      0.4    +5.92  b4 2.Kf6 Kb6 3.Ke5 Rh7 4.a4 bxa3
    217951   8       0.5    +6.09  b4 2.Kf6 Kb5 3.Ke5 Rh7 4.Kd5 Rxh6 5.Kd4
    372851   8&#58;      0.8    +6.09  b4 2.Kf6 Kb5 3.Ke5 Rh7 4.Kd5 Rxh6 5.Kd4
    469137   9       1.0    +6.16  b4 2.Kf6 Kb5 3.Ke5 Rh7 4.Kf4 Rxh6 5.Kg3
                                   Kc5
    698458   9&#58;      1.5    +6.16  b4 2.Kf6 Kb5 3.Ke5 Rh7 4.Kf4 Rxh6 5.Kg3
                                   Kc5
    899176  10       1.9      &#58;-)  b4
   1144023  10       2.4      &#58;-)  b4
   1165094  10       2.4    +8.57  b4 2.Kf4 Kb5 3.Ke3 Rh7 4.Kd2 Rxh6 5.Ke3
                                   Rh2 6.a3 Rh3+ 7.Kd4 bxa3
   2169364  10&#58;      4.5    +8.57  b4 2.Kf4 Kb5 3.Ke3 Rh7 4.Kd2 Rxh6 5.Ke3
                                   Rh2 6.a3 Rh3+ 7.Kd4 bxa3
   2175514  11       4.5      &#58;-)  b4
   2181218  11       4.5      &#58;-)  b4
   3410505  11       6.8  +Mat_26  b4 2.Kf5 Rd2 3.h7 Rh2 4.Kg6 Kb5 5.Kg7
                                   Rxh7+ 6.Kxh7 <=TABLE
   5026765  11&#58;     10.1  +Mat_26  b4 2.Kf5 Rd2 3.h7 Rh2 4.Kg6 Kb5 5.Kg7
                                   Rxh7+ 6.Kxh7 <=TABLE
   8094759  12      16.6  +Mat_24  b4 2.Kg6 Kb5 3.h7 Rxh7 4.Kxh7 <=TABLE
  12171851  12&#58;     25.2  +Mat_24  b4 2.Kg6 Kb5 3.h7 Rxh7 4.Kxh7 <=TABLE

WDL CACHE STATS
  easy hits&#58;             34722
  hard probes&#58;            6903
  soft probes&#58;           78064
  size&#58;               50331648
  occupancy&#58;               6.0 %

DTM CACHE STATS
  easy hits&#58;             43380
  hard probes&#58;            7543
  soft probes&#58;          172851
  size&#58;               16777216
  occupancy&#58;             100.0 %

GENERAL GTB PROBE STATS
  total hits&#58;            79941
  memory hits&#58;           78102
  drive hits&#58;             1839
  drive misses&#58;              0
  bytes read&#58;          6658430
  files opened&#58;             33
  mem. efficiency&#58;        97.7 %

but if all the cache is dedicated to WDL, except one block of 32k (minimum, it cannot be zero by design) to DTM, we have the following. The search is identical until the engine finds a mate. But, the shortest mate is not found and the search is delayed.

Code: Select all

setboard 8/3r4/7P/kp4K1/8/1P6/P7/8 b - -
d
+-----------------+
| . . . . . . . . | &#91;Black&#93;
| . . . r . . . . |
| . . . . . . . P |
| k p . . . . K . |    Castling&#58; 
| . . . . . . . . |    ep&#58; -
| . P . . . . . . |
| P . . . . . . . |
| . . . . . . . . |
+-----------------+

sd 12
go
********* Starts iterative deepening, thread = 0
set timer to 1000000.000000 , max_ply 12, panic time&#58; 4999999.750000
        28   1&#58;      0.0    +4.07  Kb4
       267   2&#58;      0.0    +3.96  Kb4 2.Kf5
      2133   3&#58;      0.0    +3.88  b4 2.Kf6 Kb5
      3287   4       0.0    +3.84  b4 2.Kf5 Rf7+ 3.Ke4 Kb5
      7421   4&#58;      0.0    +3.84  b4 2.Kf5 Rf7+ 3.Ke4 Kb5
     16040   5       0.1    +3.91  b4 2.Kf6 Kb5 3.Ke5 Kc6
     30124   5&#58;      0.1    +3.91  b4 2.Kf6 Kb5 3.Ke5 Kc6
     37187   6       0.1    +4.06  b4 2.Kf6 Kb5 3.Kg6 Rd6+ 4.Kg7 Rd2
     65026   6&#58;      0.2    +4.06  b4 2.Kf6 Kb5 3.Kg6 Rd6+ 4.Kg7 Rd2
     67092   7       0.2      &#58;-)  b4
     83151   7       0.2      &#58;-)  b4
     91916   7       0.2    +5.92  b4 2.Kf6 Kb6 3.Ke5 Rh7 4.a4 bxa3
    164137   7&#58;      0.4    +5.92  b4 2.Kf6 Kb6 3.Ke5 Rh7 4.a4 bxa3
    217951   8       0.5    +6.09  b4 2.Kf6 Kb5 3.Ke5 Rh7 4.Kd5 Rxh6 5.Kd4
    372851   8&#58;      0.9    +6.09  b4 2.Kf6 Kb5 3.Ke5 Rh7 4.Kd5 Rxh6 5.Kd4
    469137   9       1.0    +6.16  b4 2.Kf6 Kb5 3.Ke5 Rh7 4.Kf4 Rxh6 5.Kg3
                                   Kc5
    698458   9&#58;      1.5    +6.16  b4 2.Kf6 Kb5 3.Ke5 Rh7 4.Kf4 Rxh6 5.Kg3
                                   Kc5
    899176  10       1.9      &#58;-)  b4
   1144023  10       2.4      &#58;-)  b4
   1165094  10       2.4    +8.57  b4 2.Kf4 Kb5 3.Ke3 Rh7 4.Kd2 Rxh6 5.Ke3
                                   Rh2 6.a3 Rh3+ 7.Kd4 bxa3
   2169364  10&#58;      4.5    +8.57  b4 2.Kf4 Kb5 3.Ke3 Rh7 4.Kd2 Rxh6 5.Ke3
                                   Rh2 6.a3 Rh3+ 7.Kd4 bxa3
   2175514  11       4.5      &#58;-)  b4
   2181218  11       4.5      &#58;-)  b4
   3723335  11       8.2  +Mat_28  b4 2.Kf5 Kb5 3.Ke4 Rh7 4.Kd3 Rxh6 5.Kc2
                                   Rh2+ 6.Kb1 Rxa2 <=TABLE
   5817571  11&#58;     14.3  +Mat_28  b4 2.Kf5 Kb5 3.Ke4 Rh7 4.Kd3 Rxh6 5.Kc2
                                   Rh2+ 6.Kb1 Rxa2 <=TABLE
   7409930  12      19.3  +Mat_27  b4 2.Kf5 Rd2 3.h7 Rh2 4.Kf4 Rxh7 5.Kg3
                                   Kb5 6.Kg2 Rh2+ 7.Kxh2 <=TABLE
  12536417  12&#58;     33.2  +Mat_27  b4 2.Kf5 Rd2 3.h7 Rh2 4.Kf4 Rxh7 5.Kg3
                                   Kb5 6.Kg2 Rh2+ 7.Kxh2 <=TABLE

WDL CACHE STATS
  easy hits&#58;             36658
  hard probes&#58;            7085
  soft probes&#58;           84141
  size&#58;               67108864
  occupancy&#58;               3.9 %

DTM CACHE STATS
  easy hits&#58;             10471
  hard probes&#58;           16921
  soft probes&#58;          319385
  size&#58;                  32768
  occupancy&#58;             100.0 %

GENERAL GTB PROBE STATS
  total hits&#58;            59309
  memory hits&#58;           47129
  drive hits&#58;            12180
  drive misses&#58;              0
  bytes read&#58;         42331143
  files opened&#58;             36
  mem. efficiency&#58;        79.5 %

In positions where there is not mate, storing in cache the DTM information may not be necessary, of course.

Miguel

Posted: **Sun Mar 21, 2010 5:41 am**

I fixed a small bug i found running a stress test with the debug version.
The current version is 0.2.0.2

Miguel

michiguel wrote:A new release v0.2 is available at
http://sites.google.com/site/gaviotache ... e/releases
or
http://github.com/michiguel/Gaviota-Tablebases

I do not think that the interface will change anymore after this release. So the next steps will be optimization, code cleanup, and wait if there is any bug. After that, I will hopefully rename it v1.0

Main changes in v0.2

I included functions that only probes Win-Draw-Loss info from the TBs. These functions can be used when the distance to mate is not needed, which the most common case during search. I implemented a second cache that stores the WDL info. Basically, these are bit-bases built on the fly and overall the performance should be better than plain TBs. This scheme saves a LOT of space and multiplies the efficiency of the cache 8x.

In a previous intermediate release (v0.1.6.1), I provided just the interface but now the whole thing is fully functional. These WDL functions come in two flavors, soft and hard, like the regular functions (soft probes only the cache, and never the HD). If you already implemented this interface, you do not have to change anything, and you will notice a boost in performance by upgrading the APIS.
The only change is that tbcache_init() has a new parameter (0-128) that tells how much the memory cache should be split between WDL and DTM info. Before, you had tbcache_init(cache_size), now (from tbprobe.c) you have:
Code: Select all
	/* 	
	|	initialize tb cache. 96 is the fraction, over 128, that will be 
	|	dedicated to wdl information. In other words, 3/4 of the cache
	|	will be dedicated to win-draw-loss info, and 1/4 dedicated to
	|	distance to mate information. 
	*/
	tbcache_init&#40;cache_size, 96&#41;; 
it is suggested that most of the cache should be dedicated to WDL info. So, 96 is a good number but maybe and even higher could be good too. If you select 128, you basically do not use cache for DTM information. If you select 0, you eliminate the WDL cache and use only DTM cache. The total memory use is always cache_size.

Other changes:

From 0.1.6.1 to 0.2
- Change char **paths to const char **paths
- All warnings in the intel and Microsoft compilers have been silenced
- malloc()'s have been casted to avoid error messages if the APIS are compiled in C++
- Modify verbose output of tb_init
- Set new STATS
From 0.1.2 to 0.1.6.1
- I included tbcache_flush(). Sometimes you want to erase the tbcache to have reproducibility (epd tests etc).
- I cleaned up and remove a lot of commented code that was not needed.
Some new were included, until I decide they will not be needed anymore.
These are the type of stats that can be obtained
Code: Select all
WDL CACHE STATS
  easy hits&#58;             35760
  hard probes&#58;            6903
  soft probes&#58;           79900
  size&#58;               50331648
  occupancy&#58;               6.2 %

DTM CACHE STATS
  easy hits&#58;             48425
  hard probes&#58;            8040
  soft probes&#58;          200502
  size&#58;               16777216
  occupancy&#58;             100.0 %

GENERAL GTB PROBE STATS
  Total hits&#58;            86239
  memory hits&#58;           84185
  drive hits&#58;             2054
  drive miss&#58;                0
  bytes read&#58;          7481741
  files opened&#58;             36
  mem. efficiency&#58;        97.6 %
http://github.com/michiguel/Gaviota-Tablebases

Posted: **Sun Mar 21, 2010 5:58 am**

Aaron Becker wrote:This is an excellent feature. I plan to test a lot of different TB strategies in the near future, and this allows some very interesting new possibilities. Thanks for making all your hard work in this area freely available.

I am really happy that you are using your creativity in this topic. One of the reasons why I started to code the TBs was that I truly thought there were ideas not being explored at all. We had an interesting discussion in December that was really inspiring.

The concept that you are trying now with different threads is really an interesting avenue. Regarding that, you may want to try to do it with the uncompressed TBs to see what happens. If you have a thread that the only thing it does is collecting info from the HD, it may not consume any CPU power because it will be reading the HD all the time (you know all this much better than me). But, it you use a compressed version, the thread that is collecting data will use some CPU when it decompresses. it may be faster but is loses parallel efficiency. At least, you could try scheme 2, which decompresses at blazing speed. I originally had some thoughts along those lines, but all that has to be tested. I could be completely wrong too.

Miguel

Posted: **Sun Mar 21, 2010 6:47 am**

michiguel wrote:
Aaron Becker wrote:This is an excellent feature. I plan to test a lot of different TB strategies in the near future, and this allows some very interesting new possibilities. Thanks for making all your hard work in this area freely available.
I am really happy that you are using your creativity in this topic. One of the reasons why I started to code the TBs was that I truly thought there were ideas not being explored at all. We had an interesting discussion in December that was really inspiring.

The concept that you are trying now with different threads is really an interesting avenue. Regarding that, you may want to try to do it with the uncompressed TBs to see what happens. If you have a thread that the only thing it does is collecting info from the HD, it may not consume any CPU power because it will be reading the HD all the time (you know all this much better than me). But, it you use a compressed version, the thread that is collecting data will use some CPU when it decompresses. it may be faster but is loses parallel efficiency. At least, you could try scheme 2, which decompresses at blazing speed. I originally had some thoughts along those lines, but all that has to be tested. I could be completely wrong too.

Miguel

That's a great point. I'll add compression scheme to my list of things to explore. So far I've only tried cp4. Out of curiosity, what rule are you using to determine when to do soft vs. hard probes vs. not probing at all, and when to do DTM vs. WDL probes? Do you have recommendations on this topic? So far my gtb testing has mostly been on correctness rather than performance, so I'll definitely be thinking about how to quantify and improve the effect of gtbs on engine strength for the next release of daydreamer.

Posted: **Sun Mar 21, 2010 5:23 pm**

Aaron Becker wrote:
michiguel wrote:
Aaron Becker wrote:This is an excellent feature. I plan to test a lot of different TB strategies in the near future, and this allows some very interesting new possibilities. Thanks for making all your hard work in this area freely available.
I am really happy that you are using your creativity in this topic. One of the reasons why I started to code the TBs was that I truly thought there were ideas not being explored at all. We had an interesting discussion in December that was really inspiring.

The concept that you are trying now with different threads is really an interesting avenue. Regarding that, you may want to try to do it with the uncompressed TBs to see what happens. If you have a thread that the only thing it does is collecting info from the HD, it may not consume any CPU power because it will be reading the HD all the time (you know all this much better than me). But, it you use a compressed version, the thread that is collecting data will use some CPU when it decompresses. it may be faster but is loses parallel efficiency. At least, you could try scheme 2, which decompresses at blazing speed. I originally had some thoughts along those lines, but all that has to be tested. I could be completely wrong too.

Miguel
That's a great point. I'll add compression scheme to my list of things to explore. So far I've only tried cp4. Out of curiosity, what rule are you using to determine when to do soft vs. hard probes vs. not probing at all, and when to do DTM vs. WDL probes? Do you have recommendations on this topic? So far my gtb testing has mostly been on correctness rather than performance, so I'll definitely be thinking about how to quantify and improve the effect of gtbs on engine strength for the next release of daydreamer.

My testing has not been thorough. So far I invested more time with features and tools. My decisions have been erring on "safe side" and have been as simple as possible. I figured that if the slow down in nps is less than 20% the net effect should be positive. Time to reach a certain depth is more important and it is never bigger. But, I have not tested this with many positions. My very preliminary conclusion with Gaviota is that it is ok to probe hard when remaining depth is > 2 and I probe soft everywhere but qsearch. I always probe hard in PV (intuition tell me that it is too important not too do it).

Regarding WDL vs. DTM the scheme is very straightforward. If both alpha and beta are between -MATE200 and +MATE200 I probe WDL because it is guaranteed the value I get back will be good to return immediately. Anything else, I probe in full. Of course, this could be more elaborated but I think it covers almost all important cases.

I am using 3/4 of the cache for WDL, but I am sure that something higher will be optimum. But I know that 3/4 is certainly better than 0. In the future, this should be made dynamic. The relative size of each cache should change according to how much each of them are probed. When we start probing the DTM cache, it means we are reaching a decision in the game and the WDL cache will be less and less useful.

One word of caution: This distinction between WDL and DTM may impact the way we determine our aspiration windows. For instance, rather than expanding our window to (-INF, INF) after certain fail lows or highs, we should expand it first to (-MATE200, +MATE200) not to have the need to probe in full. It may not be required and we would waste it.

Miguel

Posted: **Mon Mar 22, 2010 8:23 pm**

If you move the WDL scores toward 0 as you move up the tree like you do for mate scores, you could only probe dtm at the root, and you would at least find the shortest path to an EGTB hit.

Posted: **Mon Mar 22, 2010 8:28 pm**

michiguel wrote:I included functions that only probes Win-Draw-Loss info from the TBs. These functions can be used when the distance to mate is not needed, which the most common case during search. I implemented a second cache that stores the WDL info. Basically, these are bit-bases built on the fly and overall the performance should be better than plain TBs. This scheme saves a LOT of space and multiplies the efficiency of the cache 8x.

8x? Do you use 16 bits per entry for DTM cache or 1 bit for WDL ? If you use 2 bits for WDL, you could save even more space by putting 5 entries per byte. Encoding is done with multiplies, and you could decode with a table.

TalkChess.com

Gaviota TBs Probing Code (v0.2) UPDATE, Bitbases on the fly

Gaviota TBs Probing Code (v0.2) UPDATE, Bitbases on the fly

Re: Gaviota TBs Probing Code (v0.2) UPDATE, Bitbases on the

Re: Gaviota TBs Probing Code (v0.2) UPDATE, Bitbases on the

Re: Gaviota TBs Probing Code (v0.2) UPDATE, Bitbases on the

Re: Gaviota TBs Probing Code (v0.2) UPDATE, Bitbases on the

Re: Gaviota TBs Probing Code (v0.2) UPDATE, Bitbases on the

Re: Gaviota TBs Probing Code (v0.2) UPDATE, Bitbases on the

Re: Gaviota TBs Probing Code (v0.2) UPDATE, Bitbases on the

Re: Gaviota TBs Probing Code (v0.2) UPDATE, Bitbases on the

Re: Gaviota TBs Probing Code (v0.2) UPDATE, Bitbases on the