Gaviota EGTBs, interface proposal for programmers

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Gian-Carlo Pascutto
Posts: 1243
Joined: Sat Dec 13, 2008 7:00 pm

Re: Gaviota EGTBs, interface proposal for programmers

Post by Gian-Carlo Pascutto »

Aleks Peshkov wrote:I think it is a wrong idea in principle. Good interface should hide implementation but not be based on volatile early implementation decisions.
Nobody is saying that should be done. What certainly should not be done though, is to use a format that is slow and irrelevant for BOTH sides of the API.
How much progress of chess programming would be without UCI/XBoard protocols? Imagine if we have C-like interface to relational databases instead of SQL? How WWW would evolved if we had binary HTTP-protocol?
I've seen enough systems that were unusably slow because someone thought XML was the solution to everything.
Can you compare computational overhead of handling 100 byte string comparing to database disk access?
Compared to a cache hit, it will be significant.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Gaviota EGTBs, interface proposal for programmers

Post by mcostalba »

Aleks Peshkov wrote: Can you compare computational overhead of handling 100 byte string comparing to database disk access?
I am not thinking at disk access at all, I am not thinking at 6 pieces TB.

Probably I am wrong, I am at all no egbt expert, but I think egbt _could_ work only if data is in RAM.

Anyhow I will not say anything anymore, Miguel for sure will choose the best interface because he is the one that is currently throwing in more thinking time at this stuff.



BTW accessing disk is sooo slooowww (we are talking or orders of magnitude slower) that the only possibility to make an hard disk come to play in a chess engine during its search is through some kind of async access, where engine starts the query and then goes on with the search as usal: waiting for the response from the disk it means to KILL the engine. At that point you can easily write your engine in Python or Ruby without any noticeable performance loss.
zamar
Posts: 613
Joined: Sun Jan 18, 2009 7:03 am

Re: Gaviota EGTBs, interface proposal for programmers

Post by zamar »

BTW accessing disk is sooo slooowww (we are talking or orders of magnitude slower) that the only possibility to make an hard disk come to play in a chess engine during its search is through some kind of async access, where engine starts the query and then goes on with the search as usal: waiting for the response from the disk it means to KILL the engine.
That's why the probe from hard disk is usually done only when remaining depth >= SOME_VERY_HIGH_LIMIT. But this of course limits the usefulness of the EGTB...
Joona Kiiski
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Gaviota EGTBs, interface proposal for programmers

Post by mcostalba »

zamar wrote:
BTW accessing disk is sooo slooowww (we are talking or orders of magnitude slower) that the only possibility to make an hard disk come to play in a chess engine during its search is through some kind of async access, where engine starts the query and then goes on with the search as usal: waiting for the response from the disk it means to KILL the engine.
That's why the probe from hard disk is usually done only when remaining depth >= SOME_VERY_HIGH_LIMIT. But this of course limits the usefulness of the EGTB...
I think the only realistically way to use hard disk is through an async call.

As example suppose move ordering gives the following sequence to try:

e4, Nf3, Rb5, Rb6, Rb7,

At vey high depth you could think to start an async query on Rb7 and then start search e4, Nf3, and so on..

If you are at very high depth then there is a possibility that the query return something before you reach Rb7.
Aaron Becker
Posts: 292
Joined: Tue Jul 07, 2009 4:56 am

Re: Gaviota EGTBs, interface proposal for programmers

Post by Aaron Becker »

That's a good point. I think it would be helpful to have a non-blocking api to support asynchronously fetching positions that aren't in the cache. So in addition to the blocking tb_probe function, you would also have non-blocking tb_fetch and tb_test functions. tb_fetch reads the block for the given position into the cache if it isn't present, and tb_test tells you whether or not the given position is in the cache. Both of them would always return immediately, so that the engine can continue searching while the disk is accessed.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Gaviota EGTBs, interface proposal for programmers

Post by mcostalba »

Aaron Becker wrote:That's a good point. I think it would be helpful to have a non-blocking api to support asynchronously fetching positions that aren't in the cache. So in addition to the blocking tb_probe function, you would also have non-blocking tb_fetch and tb_test functions. tb_fetch reads the block for the given position into the cache if it isn't present, and tb_test tells you whether or not the given position is in the cache. Both of them would always return immediately, so that the engine can continue searching while the disk is accessed.
From the point of view of the engine the simplest interface is

Code: Select all

tb_probe_nb()
where 'nb' is non-blocking. What the function does is to lookup in the cache, if value is presents it returns the value, otherwise it performs the following steps:

1) Starts an async I/O access to fetch the value from disk
2) Returns VALUE_NOT_DEFINED


When the I/O operation is completed then the background thread updates the cache. Engine is NOT informed back that the value is arrived (also because probably it is not useful anymore). But in case engine requests again the same position the value will be in cache and this time it will be returned.

This is wholly trasparent to the engine that has simply to ignore VALUE_NOT_DEFINED answers.

So a possible design could be:

1) Library init function allocates a big cache and a set of worker threads used for I/O and normally sleeping

2) When engine probes for a position if the position is in cache then value is returned, otherwise the first free sleeping thread is spawned to probe the position from disk and VALUE_NOT_DEFINED is immediately returned so that engine can continue searching as usual.

3) The woken up thread starts an I/O call and blocks on that. When the calls return the thread updates the cache and returns in the free pool.


This is simple and it seems quite reliable. Of course it is only an idea....never said it should work :-)
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Gaviota EGTBs, interface proposal for programmers

Post by michiguel »

mcostalba wrote:
Aaron Becker wrote:That's a good point. I think it would be helpful to have a non-blocking api to support asynchronously fetching positions that aren't in the cache. So in addition to the blocking tb_probe function, you would also have non-blocking tb_fetch and tb_test functions. tb_fetch reads the block for the given position into the cache if it isn't present, and tb_test tells you whether or not the given position is in the cache. Both of them would always return immediately, so that the engine can continue searching while the disk is accessed.
From the point of view of the engine the simplest interface is

Code: Select all

tb_probe_nb()
where 'nb' is non-blocking. What the function does is to lookup in the cache, if value is presents it returns the value, otherwise it performs the following steps:

1) Starts an async I/O access to fetch the value from disk
2) Returns VALUE_NOT_DEFINED


When the I/O operation is completed then the background thread updates the cache. Engine is NOT informed back that the value is arrived (also because probably it is not useful anymore). But in case engine requests again the same position the value will be in cache and this time it will be returned.

This is wholly trasparent to the engine that has simply to ignore VALUE_NOT_DEFINED answers.

So a possible design could be:

1) Library init function allocates a big cache and a set of worker threads used for I/O and normally sleeping

2) When engine probes for a position if the position is in cache then value is returned, otherwise the first free sleeping thread is spawned to probe the position from disk and VALUE_NOT_DEFINED is immediately returned so that engine can continue searching as usual.

3) The woken up thread starts an I/O call and blocks on that. When the calls return the thread updates the cache and returns in the free pool.


This is simple and it seems quite reliable. Of course it is only an idea....never said it should work :-)
It works, but before proposing this, I was hoping to settle on the bare minimum for the API.

Because of this idea is that I started to write my own TBs so I can implement it. However, I found that something simpler could be done. This is what I have already in Gaviota and it works very well (not in the released version yet)

tb_probe_hard ()
{
/* it is the normal blocking probe. it probes the cache, if the info is not there, it loads the cache with the info needed and also returns the info */
}

tb_probe_soft()
{
/* probes the cache, if it is not there, return tb_UNKNOWN */
}

So, when the remaining depth is < 2 I only do tb_probe_soft(). At any other point, I do tb_probe_hard().

I see no decrease in nps when using remaining depth = 2, which indicates that probing is not that bad because we do not probe so often, many positions are in the hashtable, and many are already in cache. If I probe in the last two plies "softly", those probes are for free and the number of probes may increase in an order of magnitude.

That is why I do not believe the statements "TBs do not help" is valid without trying to optimize any of this.

[EDIT] I just realized I am probing in this example compressed TBs from a USB stick!! This is not even close to having a decent transfer speed! My HD is much faster than that (~5x). In a solid state disk this will be *much* better.

Miguel
PS:

[D]1r6/8/k7/p7/2P5/1P4R1/8/K7 w - - 0 1

See the overall efficiency of the cache: 98.3% when I tried soft probes in the last two plies and hard everywhere else. The nps went only from 488k to 368k.

With TBs

Code: Select all

setboard 1r6/8/k7/p7/2P5/1P4R1/8/K7 w - - 0 1
d
+-----------------+
| . r . . . . . . |
| . . . . . . . . |
| k . . . . . . . |
| p . . . . . . . |    Castling&#58; 
| . . P . . . . . |    ep&#58; -
| . P . . . . R . |
| . . . . . . . . |
| K . . . . . . . | &#91;White&#93;
+-----------------+

sd 12
go
********* Starts iterative deepening, thread = 0
set timer to 1000000.000000 , max_ply 12, panic time&#58; 4999999.750000
         3   1       0.0    -0.23  1.b4 Rxb4
         6   1       0.0    +1.78  1.c5
        78   2       0.0    +1.58  1.c5 Rc8
       241   2       0.0    +1.70  1.Rg6+ Ka7 2.Rg7+ Ka6
       455   3       0.0    +1.80  1.Rg6+ Ka7 2.Rg7+ Ka6 3.Kb2
      1324   4       0.0    +1.76  1.Rg6+ Ka7 2.Kb2 Re8 3.c5 Re2+ 4.Ka3
      5533   5       0.1    +1.74  1.Rg6+ Ka7 2.Rg7+ Ka6 3.Ka2 Re8 4.Ka3
     18121   6       0.1    +1.76  1.Rg6+ Ka7 2.Rg7+ Ka6 3.Ka2 Re8 4.Rg6+
                                   Kb7 5.c5 Re2+ 6.Ka3
     25577   6&#58;      0.2    +1.76  1.Rg6+ Ka7 2.Rg7+ Ka6 3.Ka2 Re8 4.Rg6+
                                   Kb7 5.c5 Re2+ 6.Ka3
     68584   7       0.3    +1.76  1.Rg6+ Kb7 2.Rg5 Ka6 3.Ka2 Re8 4.Rg6+
                                   Kb7 5.c5 Re2+ 6.Ka3
     76258   7&#58;      0.4    +1.76  1.Rg6+ Kb7 2.Rg5 Ka6 3.Ka2 Re8 4.Rg6+
                                   Kb7 5.c5 Re2+ 6.Ka3
    142938   8       0.6    +1.67  1.Rg6+ Kb7 2.Rg5 Ka6 3.Ka2 Re8 4.Rg6+
                                   Kb7 5.c5 Re3
    209007   8&#58;      0.8    +1.67  1.Rg6+ Kb7 2.Rg5 Ka6 3.Ka2 Re8 4.Rg6+
                                   Kb7 5.c5 Re3
    398673   9       1.4      &#58;-(  1.Rg6+
    506663   9       1.7      &#58;-(  
    615599   9       2.1    +1.11  1.Rg6+ Ka7 2.Rg7+ Ka6 3.Rg3 a4 4.bxa4
                                   Rb4 5.Rg6+ Ka7 6.Rg7+ Kb6 7.Rg4 Rxa4+
                                   8.Kb2
    678878   9&#58;      2.2    +1.11  1.Rg6+ Ka7 2.Rg7+ Ka6 3.Rg3 a4 4.bxa4
                                   Rb4 5.Rg6+ Ka7 6.Rg7+ Kb6 7.Rg4 Rxa4+
                                   8.Kb2
    770921  10       2.4      &#58;-(  1.Rg6+
    893741  10       2.8      &#58;-(  
   1212856  10       3.6    +0.00  1.Rg6+ Ka7 2.Rg7+ Ka6 3.Rg3 a4 4.bxa4
                                   Rb4 5.Rg6+ Ka7 6.Rg7+ Kb6 7.Rg4 Rxa4+
                                   <=TABLE
   1616666  10&#58;      4.9    +0.00  1.Rg6+ Ka7 2.Rg7+ Ka6 3.Rg3 a4 4.bxa4
                                   Rb4 5.Rg6+ Ka7 6.Rg7+ Kb6 7.Rg4 Rxa4+
                                   <=TABLE
   1966156  11       5.8    +0.00  1.Rg6+ Ka7 2.Rg7+ Ka6 3.Rg3 a4 4.bxa4
                                   Rb4 5.Rg6+ Ka7 6.Rg7+ Kb6 7.Rg4 Rxa4+
                                   <=TABLE
   2306347  11&#58;      6.7    +0.00  1.Rg6+ Ka7 2.Rg7+ Ka6 3.Rg3 a4 4.bxa4
                                   Rb4 5.Rg6+ Ka7 6.Rg7+ Kb6 7.Rg4 Rxa4+
                                   <=TABLE
   2868525  12       8.0    +0.00  1.Rg6+ Ka7 2.Rg7+ Ka6 3.Rg3 a4 4.bxa4
                                   Rb4 5.Rg6+ Ka7 6.Rg7+ Kb6 7.Rg4 Rxa4+
                                   <=TABLE
   3523400  12&#58;      9.7    +0.00  1.Rg6+ Ka7 2.Rg7+ Ka6 3.Rg3 a4 4.bxa4
                                   Rb4 5.Rg6+ Ka7 6.Rg7+ Kb6 7.Rg4 Rxa4+
                                   <=TABLE

Ply&#58; 12
Rg3-g6  Ka6-a7  Rg6-g7  Ka7-a6  Rg7-g3  a5-a4  b3xa4  Rb8-b4  
Rg3-g6  Ka6-a7  Rg6-g7  Ka7-b6  Rg7-g4  Rb4xa4  <-tableb  
Score&#58; 0.00 &#40;0&#41;  Evals&#58; 1791507   Time&#58; 9.7s   nps&#58; 363574  Q/all&#58; 0.26
-------------------------------------------------------------------
               nodes        cutoffs         missed       tree_exp
-------------------------------------------------------------------
path         2594430         364699          56160         1.15
quies         928970         381457          41605         1.11
all          3523400         746156          97765         1.13
-------------------------------------------------------------------
hashtable=  attempts&#58; 2588749   hits&#58; 66.5%   perfect&#58; 40.5%

Side-to-wait attack calls&#58; 812 calls/node&#58; 0.023%
Lazy evals = low&#58;0  cutoff&#58;0  normal&#58;0
-------------------------------------------------------------------
                            hits         missed     efficiency
-------------------------------------------------------------------
 BB cached in eval        867077         923511      0.484
  Checks Cheap/Exp       3815954          41792      0.989
InChecks Cheap/Exp       2174783        1655645      0.568
Attacks generation       4070384        2834057      0.590
       Attacks SEE       1430827        2544062      0.360
         Pawn hash       1699600          61870      0.965
     Material hash       1775093            147      1.000
 Make normal moves       3815937          41792      0.989
-------------------------------------------------------------------

GTB CACHE STATS
  probes&#58;               144794
  hard&#58;                   5406
  soft&#58;                 139388
  hits&#58;                  64661
  misses&#58;                 1115
  softmisses&#58;            79018
  efficiency&#58;             98.3%
  average searches       786.4
  occupancy&#58;              54.4%
without TBs

Code: Select all

setboard 1r6/8/k7/p7/2P5/1P4R1/8/K7 w - - 0 1
d
+-----------------+
| . r . . . . . . |
| . . . . . . . . |
| k . . . . . . . |
| p . . . . . . . |    Castling&#58; 
| . . P . . . . . |    ep&#58; -
| . P . . . . R . |
| . . . . . . . . |
| K . . . . . . . | &#91;White&#93;
+-----------------+

sd 12
go
********* Starts iterative deepening, thread = 0
set timer to 1000000.000000 , max_ply 12, panic time&#58; 4999999.750000
         3   1       0.0    -0.23  1.b4 Rxb4
         6   1       0.0    +1.78  1.c5
        78   2       0.0    +1.58  1.c5 Rc8
       255   2       0.0    +1.61  1.Rg6+ Rb6 2.Rg3
       780   3       0.0    +1.58  1.Rg6+ Rb6 2.Rxb6+ Kxb6 3.Kb2 Kc6 4.Ka3
                                   Kc5
      1044   3       0.0    +1.60  1.c5 Rc8 2.Rg5
      1315   3       0.0    +1.64  1.Re3 Rb6 2.Kb2
      1736   3       0.0    +1.65  1.Kb2 Re8 2.Rg5 Re2+ 3.Kc3
      3048   4       0.0    +1.67  1.Kb2 a4 2.Rg6+ Ka5 3.Rg5+ Ka6
      4711   5       0.0      &#58;-(  1.Kb2
      7295   5       0.0    +1.62  1.Rg6+ Rb6 2.Rxb6+ Kxb6 3.Kb2 Kc6 4.Ka3
                                   Kc5 5.Ka4 Kb6
     10061   5       0.0    +1.65  1.Re3 Rb7 2.Kb2 Rf7 3.Re5 Rf2+ 4.Kc3
     15608   5       0.0    +1.67  1.Ka2 Re8 2.Rg6+ Kb7 3.c5 Re3
     22873   6       0.1    +1.60  1.Ka2 Rb6 2.c5 Rf6 3.Rg7 Rf3
     59813   6       0.2    +1.74  1.Rg6+ Ka7 2.Rg7+ Ka6 3.Ka2 Rb6 4.Ka3
                                   Re6
     63958   6&#58;      0.2    +1.74  1.Rg6+ Ka7 2.Rg7+ Ka6 3.Ka2 Rb6 4.Ka3
                                   Re6
    119215   7       0.3    +1.71  1.Rg6+ Kb7 2.Kb2 Rf8 3.c5 Rf2+ 4.Kc3
                                   Rf3+ 5.Kc4 Kc7
    131659   7&#58;      0.3    +1.71  1.Rg6+ Kb7 2.Kb2 Rf8 3.c5 Rf2+ 4.Kc3
                                   Rf3+ 5.Kc4 Kc7
    204264   8       0.5    +1.67  1.Rg6+ Kb7 2.Kb2 Rf8 3.c5 Rc8 4.Rg5 Rf8
                                   5.Ka3
    235364   8&#58;      0.6    +1.67  1.Rg6+ Kb7 2.Kb2 Rf8 3.c5 Rc8 4.Rg5 Rf8
                                   5.Ka3
    406326   9       0.9    +1.67  1.Rg6+ Kb7 2.Rg5 Ka6 3.Ka2 Rf8 4.Rg6+
                                   Kb7 5.c5 Rf2+ 6.Ka3 Rc2
    551470   9&#58;      1.2    +1.67  1.Rg6+ Kb7 2.Rg5 Ka6 3.Ka2 Rf8 4.Rg6+
                                   Kb7 5.c5 Rf2+ 6.Ka3 Rc2
    876242  10       1.9    +1.74  1.Rg6+ Kb7 2.Kb2 Rf8 3.Ka3 Rf2 4.Ka4
                                   Rf5 5.Rd6 Rf3 6.c5
    976569  10&#58;      2.1    +1.74  1.Rg6+ Kb7 2.Kb2 Rf8 3.Ka3 Rf2 4.Ka4
                                   Rf5 5.Rd6 Rf3 6.c5
   2042324  11       4.3    +1.71  1.Rg6+ Ka7 2.Ka2 Rf8 3.Ka3 Rf2 4.Rd6
                                   Kb7 5.Ka4 Kc7 6.c5 Ra2+ 7.Kb5 Rc2
   2445864  11&#58;      5.1    +1.71  1.Rg6+ Ka7 2.Ka2 Rf8 3.Ka3 Rf2 4.Rd6
                                   Kb7 5.Ka4 Kc7 6.c5 Ra2+ 7.Kb5 Rc2
   3196317  12       6.6    +2.02  1.Rg6+ Ka7 2.Ka2 Rf8 3.Ka3 Rb8 4.Rh6
                                   Rb4 5.Rc6 Rb8 6.Rg6 Rb7 7.c5
   3359336  12&#58;      6.9    +2.02  1.Rg6+ Ka7 2.Ka2 Rf8 3.Ka3 Rb8 4.Rh6
                                   Rb4 5.Rc6 Rb8 6.Rg6 Rb7 7.c5

Ply&#58; 12
Rg3-g6  Ka6-a7  Ka1-a2  Rb8-f8  Ka2-a3  Rf8-b8  Rg6-h6  Rb8-b4  
Rh6-c6  Rb4-b8  Rc6-g6  Rb8-b7  c4-c5  
Score&#58; 2.02 &#40;494&#41;  Evals&#58; 1816384   Time&#58; 6.9s   nps&#58; 488418  Q/all&#58; 0.27
-------------------------------------------------------------------
               nodes        cutoffs         missed       tree_exp
-------------------------------------------------------------------
path         2463778         367517          31177         1.08
quies         895558         368421          41128         1.11
all          3359336         735938          72305         1.10
-------------------------------------------------------------------
hashtable=  attempts&#58; 2447684   hits&#58; 69.5%   perfect&#58; 35.8%

Side-to-wait attack calls&#58; 4564 calls/node&#58; 0.136%
Lazy evals = low&#58;0  cutoff&#58;0  normal&#58;0
-------------------------------------------------------------------
                            hits         missed     efficiency
-------------------------------------------------------------------
 BB cached in eval        944829         871038      0.520
  Checks Cheap/Exp       3790960           6097      0.998
InChecks Cheap/Exp       1894482        1865331      0.504
Attacks generation       3719890        2497476      0.598
       Attacks SEE       1353678        2108212      0.391
         Pawn hash       1752660          30307      0.983
     Material hash       1799548             98      1.000
 Make normal moves       3790943           6097      0.998
-------------------------------------------------------------------

GTB CACHE STATS
  probes&#58;                    0
  hard&#58;                      0
  soft&#58;                      0
  hits&#58;                      0
  misses&#58;                    0
  softmisses&#58;                0
  efficiency&#58;              0.0%
  average searches         0.0
  occupancy&#58;               0.0%

Profile = fragment_failhigh&#40;)
Total    time&#58;   6.88 s
External time&#58;   0.70 s
Internal time&#58;   6.17 s
Ratio Internal/Total&#58; 89.79 
move g3g6

Aaron Becker
Posts: 292
Joined: Tue Jul 07, 2009 4:56 am

Re: Gaviota EGTBs, interface proposal for programmers

Post by Aaron Becker »

That seems like a reasonable solution to me. As long as there's a non-blocking probe call that returns a failure value and fetches the position into cache in the background (ie, probe_soft), efficient implementations should be possible.
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Gaviota EGTBs, interface proposal for programmers

Post by michiguel »

Aaron Becker wrote:That seems like a reasonable solution to me. As long as there's a non-blocking probe call that returns a failure value and fetches the position into cache in the background (ie, probe_soft), efficient implementations should be possible.
probe_soft() works well with compressed TBs, but a non blocking probe that delegates the problem to another thread (let's calling probe_lazy) works ONLY with uncompressed TBs. The thread that is probing the HD will stay sleeping most of the time while downloading data. That is fine because it won't interfere with the other threads that are busy "searching". However, if the TBs are compressed, the thread that is probing the HD, will have to decompress the data, using valuable CPU time, interfering with the searching threads. Compression saves downloading time, but uses CPU time (which is almost as bad as blocking).

The ideal situation is to have two sets of TBs in two disks. One compressed, one uncompressed. When I am close to the root I probe_hard to the compressed TBs (in those cases I want to have an answer and wait because it is too important, I do not want an non blocking version). When I am close to the leaves, I probe_lazy() to the uncompressed TB. In quiescence and at the leaves, I probe_soft().

Miguel
PS: On top of all that, I should have a layer of bitbases, but let's leave that for later...
IQ
Posts: 162
Joined: Thu Dec 17, 2009 10:46 am

Re: Gaviota EGTBs, interface proposal for programmers

Post by IQ »

Hi,

i wonder how effective this scheme really is. I would reckon that if hash table probes are also stored in the hash table, the number of "probe_softs" that are actually a tb cache hit to be minute. Do you have any numbers of the probe_soft hit rate?

The other scheme with the async i/o is also flawed as it would have to be queued. It would pile up a huge amount of probes in one part of the tree.

Therefore i would propose a prioritzed async queue system - the prioritization can easily done via search depth. This should ensure that the important tb probes actually get done in time to be meaningful for search.

What do you guys think?

-IQ
michiguel wrote: Because of this idea is that I started to write my own TBs so I can implement it. However, I found that something simpler could be done. This is what I have already in Gaviota and it works very well (not in the released version yet)

tb_probe_hard ()
{
/* it is the normal blocking probe. it probes the cache, if the info is not there, it loads the cache with the info needed and also returns the info */
}

tb_probe_soft()
{
/* probes the cache, if it is not there, return tb_UNKNOWN */
}

So, when the remaining depth is < 2 I only do tb_probe_soft(). At any other point, I do tb_probe_hard().

I see no decrease in nps when using remaining depth = 2, which indicates that probing is not that bad because we do not probe so often, many positions are in the hashtable, and many are already in cache. If I probe in the last two plies "softly", those probes are for free and the number of probes may increase in an order of magnitude.