mcostalba wrote:Aaron Becker wrote:That's a good point. I think it would be helpful to have a non-blocking api to support asynchronously fetching positions that aren't in the cache. So in addition to the blocking tb_probe function, you would also have non-blocking tb_fetch and tb_test functions. tb_fetch reads the block for the given position into the cache if it isn't present, and tb_test tells you whether or not the given position is in the cache. Both of them would always return immediately, so that the engine can continue searching while the disk is accessed.
From the point of view of the engine the simplest interface is
where 'nb' is non-blocking. What the function does is to lookup in the cache, if value is presents it returns the value, otherwise it performs the following steps:
1) Starts an async I/O access to fetch the value from disk
2) Returns VALUE_NOT_DEFINED
When the I/O operation is completed then the background thread updates the cache. Engine is NOT informed back that the value is arrived (also because probably it is not useful anymore). But in case engine requests again the same position the value will be in cache and this time it will be returned.
This is wholly trasparent to the engine that has simply to ignore VALUE_NOT_DEFINED answers.
So a possible design could be:
1) Library init function allocates a big cache and a set of worker threads used for I/O and normally sleeping
2) When engine probes for a position if the position is in cache then value is returned, otherwise the first free sleeping thread is spawned to probe the position from disk and VALUE_NOT_DEFINED is immediately returned so that engine can continue searching as usual.
3) The woken up thread starts an I/O call and blocks on that. When the calls return the thread updates the cache and returns in the free pool.
This is simple and it seems quite reliable. Of course it is only an idea....never said it should work
It works, but before proposing this, I was hoping to settle on the bare minimum for the API.
Because of this idea is that I started to write my own TBs so I can implement it. However, I found that something simpler could be done. This is what I have already in Gaviota and it works very well (not in the released version yet)
tb_probe_hard ()
{
/* it is the normal blocking probe. it probes the cache, if the info is not there, it loads the cache with the info needed and also returns the info */
}
tb_probe_soft()
{
/* probes the cache, if it is not there, return tb_UNKNOWN */
}
So, when the remaining depth is < 2 I only do tb_probe_soft(). At any other point, I do tb_probe_hard().
I see no decrease in nps when using remaining depth = 2, which indicates that probing is not that bad because we do not probe so often, many positions are in the hashtable, and many are already in cache. If I probe in the last two plies "softly", those probes are for free and the number of probes may increase in an order of magnitude.
That is why I do not believe the statements "TBs do not help" is valid without trying to optimize any of this.
[EDIT] I just realized I am probing in this example compressed TBs from a USB stick!! This is not even close to having a decent transfer speed! My HD is much faster than that (~5x). In a solid state disk this will be *much* better.
Miguel
PS:
[D]1r6/8/k7/p7/2P5/1P4R1/8/K7 w - - 0 1
See the overall efficiency of the cache: 98.3% when I tried soft probes in the last two plies and hard everywhere else. The nps went only from 488k to 368k.
With TBs
Code: Select all
setboard 1r6/8/k7/p7/2P5/1P4R1/8/K7 w - - 0 1
d
+-----------------+
| . r . . . . . . |
| . . . . . . . . |
| k . . . . . . . |
| p . . . . . . . | Castling:
| . . P . . . . . | ep: -
| . P . . . . R . |
| . . . . . . . . |
| K . . . . . . . | [White]
+-----------------+
sd 12
go
********* Starts iterative deepening, thread = 0
set timer to 1000000.000000 , max_ply 12, panic time: 4999999.750000
3 1 0.0 -0.23 1.b4 Rxb4
6 1 0.0 +1.78 1.c5
78 2 0.0 +1.58 1.c5 Rc8
241 2 0.0 +1.70 1.Rg6+ Ka7 2.Rg7+ Ka6
455 3 0.0 +1.80 1.Rg6+ Ka7 2.Rg7+ Ka6 3.Kb2
1324 4 0.0 +1.76 1.Rg6+ Ka7 2.Kb2 Re8 3.c5 Re2+ 4.Ka3
5533 5 0.1 +1.74 1.Rg6+ Ka7 2.Rg7+ Ka6 3.Ka2 Re8 4.Ka3
18121 6 0.1 +1.76 1.Rg6+ Ka7 2.Rg7+ Ka6 3.Ka2 Re8 4.Rg6+
Kb7 5.c5 Re2+ 6.Ka3
25577 6: 0.2 +1.76 1.Rg6+ Ka7 2.Rg7+ Ka6 3.Ka2 Re8 4.Rg6+
Kb7 5.c5 Re2+ 6.Ka3
68584 7 0.3 +1.76 1.Rg6+ Kb7 2.Rg5 Ka6 3.Ka2 Re8 4.Rg6+
Kb7 5.c5 Re2+ 6.Ka3
76258 7: 0.4 +1.76 1.Rg6+ Kb7 2.Rg5 Ka6 3.Ka2 Re8 4.Rg6+
Kb7 5.c5 Re2+ 6.Ka3
142938 8 0.6 +1.67 1.Rg6+ Kb7 2.Rg5 Ka6 3.Ka2 Re8 4.Rg6+
Kb7 5.c5 Re3
209007 8: 0.8 +1.67 1.Rg6+ Kb7 2.Rg5 Ka6 3.Ka2 Re8 4.Rg6+
Kb7 5.c5 Re3
398673 9 1.4 :-( 1.Rg6+
506663 9 1.7 :-(
615599 9 2.1 +1.11 1.Rg6+ Ka7 2.Rg7+ Ka6 3.Rg3 a4 4.bxa4
Rb4 5.Rg6+ Ka7 6.Rg7+ Kb6 7.Rg4 Rxa4+
8.Kb2
678878 9: 2.2 +1.11 1.Rg6+ Ka7 2.Rg7+ Ka6 3.Rg3 a4 4.bxa4
Rb4 5.Rg6+ Ka7 6.Rg7+ Kb6 7.Rg4 Rxa4+
8.Kb2
770921 10 2.4 :-( 1.Rg6+
893741 10 2.8 :-(
1212856 10 3.6 +0.00 1.Rg6+ Ka7 2.Rg7+ Ka6 3.Rg3 a4 4.bxa4
Rb4 5.Rg6+ Ka7 6.Rg7+ Kb6 7.Rg4 Rxa4+
<=TABLE
1616666 10: 4.9 +0.00 1.Rg6+ Ka7 2.Rg7+ Ka6 3.Rg3 a4 4.bxa4
Rb4 5.Rg6+ Ka7 6.Rg7+ Kb6 7.Rg4 Rxa4+
<=TABLE
1966156 11 5.8 +0.00 1.Rg6+ Ka7 2.Rg7+ Ka6 3.Rg3 a4 4.bxa4
Rb4 5.Rg6+ Ka7 6.Rg7+ Kb6 7.Rg4 Rxa4+
<=TABLE
2306347 11: 6.7 +0.00 1.Rg6+ Ka7 2.Rg7+ Ka6 3.Rg3 a4 4.bxa4
Rb4 5.Rg6+ Ka7 6.Rg7+ Kb6 7.Rg4 Rxa4+
<=TABLE
2868525 12 8.0 +0.00 1.Rg6+ Ka7 2.Rg7+ Ka6 3.Rg3 a4 4.bxa4
Rb4 5.Rg6+ Ka7 6.Rg7+ Kb6 7.Rg4 Rxa4+
<=TABLE
3523400 12: 9.7 +0.00 1.Rg6+ Ka7 2.Rg7+ Ka6 3.Rg3 a4 4.bxa4
Rb4 5.Rg6+ Ka7 6.Rg7+ Kb6 7.Rg4 Rxa4+
<=TABLE
Ply: 12
Rg3-g6 Ka6-a7 Rg6-g7 Ka7-a6 Rg7-g3 a5-a4 b3xa4 Rb8-b4
Rg3-g6 Ka6-a7 Rg6-g7 Ka7-b6 Rg7-g4 Rb4xa4 <-tableb
Score: 0.00 (0) Evals: 1791507 Time: 9.7s nps: 363574 Q/all: 0.26
-------------------------------------------------------------------
nodes cutoffs missed tree_exp
-------------------------------------------------------------------
path 2594430 364699 56160 1.15
quies 928970 381457 41605 1.11
all 3523400 746156 97765 1.13
-------------------------------------------------------------------
hashtable= attempts: 2588749 hits: 66.5% perfect: 40.5%
Side-to-wait attack calls: 812 calls/node: 0.023%
Lazy evals = low:0 cutoff:0 normal:0
-------------------------------------------------------------------
hits missed efficiency
-------------------------------------------------------------------
BB cached in eval 867077 923511 0.484
Checks Cheap/Exp 3815954 41792 0.989
InChecks Cheap/Exp 2174783 1655645 0.568
Attacks generation 4070384 2834057 0.590
Attacks SEE 1430827 2544062 0.360
Pawn hash 1699600 61870 0.965
Material hash 1775093 147 1.000
Make normal moves 3815937 41792 0.989
-------------------------------------------------------------------
GTB CACHE STATS
probes: 144794
hard: 5406
soft: 139388
hits: 64661
misses: 1115
softmisses: 79018
efficiency: 98.3%
average searches 786.4
occupancy: 54.4%
without TBs
Code: Select all
setboard 1r6/8/k7/p7/2P5/1P4R1/8/K7 w - - 0 1
d
+-----------------+
| . r . . . . . . |
| . . . . . . . . |
| k . . . . . . . |
| p . . . . . . . | Castling:
| . . P . . . . . | ep: -
| . P . . . . R . |
| . . . . . . . . |
| K . . . . . . . | [White]
+-----------------+
sd 12
go
********* Starts iterative deepening, thread = 0
set timer to 1000000.000000 , max_ply 12, panic time: 4999999.750000
3 1 0.0 -0.23 1.b4 Rxb4
6 1 0.0 +1.78 1.c5
78 2 0.0 +1.58 1.c5 Rc8
255 2 0.0 +1.61 1.Rg6+ Rb6 2.Rg3
780 3 0.0 +1.58 1.Rg6+ Rb6 2.Rxb6+ Kxb6 3.Kb2 Kc6 4.Ka3
Kc5
1044 3 0.0 +1.60 1.c5 Rc8 2.Rg5
1315 3 0.0 +1.64 1.Re3 Rb6 2.Kb2
1736 3 0.0 +1.65 1.Kb2 Re8 2.Rg5 Re2+ 3.Kc3
3048 4 0.0 +1.67 1.Kb2 a4 2.Rg6+ Ka5 3.Rg5+ Ka6
4711 5 0.0 :-( 1.Kb2
7295 5 0.0 +1.62 1.Rg6+ Rb6 2.Rxb6+ Kxb6 3.Kb2 Kc6 4.Ka3
Kc5 5.Ka4 Kb6
10061 5 0.0 +1.65 1.Re3 Rb7 2.Kb2 Rf7 3.Re5 Rf2+ 4.Kc3
15608 5 0.0 +1.67 1.Ka2 Re8 2.Rg6+ Kb7 3.c5 Re3
22873 6 0.1 +1.60 1.Ka2 Rb6 2.c5 Rf6 3.Rg7 Rf3
59813 6 0.2 +1.74 1.Rg6+ Ka7 2.Rg7+ Ka6 3.Ka2 Rb6 4.Ka3
Re6
63958 6: 0.2 +1.74 1.Rg6+ Ka7 2.Rg7+ Ka6 3.Ka2 Rb6 4.Ka3
Re6
119215 7 0.3 +1.71 1.Rg6+ Kb7 2.Kb2 Rf8 3.c5 Rf2+ 4.Kc3
Rf3+ 5.Kc4 Kc7
131659 7: 0.3 +1.71 1.Rg6+ Kb7 2.Kb2 Rf8 3.c5 Rf2+ 4.Kc3
Rf3+ 5.Kc4 Kc7
204264 8 0.5 +1.67 1.Rg6+ Kb7 2.Kb2 Rf8 3.c5 Rc8 4.Rg5 Rf8
5.Ka3
235364 8: 0.6 +1.67 1.Rg6+ Kb7 2.Kb2 Rf8 3.c5 Rc8 4.Rg5 Rf8
5.Ka3
406326 9 0.9 +1.67 1.Rg6+ Kb7 2.Rg5 Ka6 3.Ka2 Rf8 4.Rg6+
Kb7 5.c5 Rf2+ 6.Ka3 Rc2
551470 9: 1.2 +1.67 1.Rg6+ Kb7 2.Rg5 Ka6 3.Ka2 Rf8 4.Rg6+
Kb7 5.c5 Rf2+ 6.Ka3 Rc2
876242 10 1.9 +1.74 1.Rg6+ Kb7 2.Kb2 Rf8 3.Ka3 Rf2 4.Ka4
Rf5 5.Rd6 Rf3 6.c5
976569 10: 2.1 +1.74 1.Rg6+ Kb7 2.Kb2 Rf8 3.Ka3 Rf2 4.Ka4
Rf5 5.Rd6 Rf3 6.c5
2042324 11 4.3 +1.71 1.Rg6+ Ka7 2.Ka2 Rf8 3.Ka3 Rf2 4.Rd6
Kb7 5.Ka4 Kc7 6.c5 Ra2+ 7.Kb5 Rc2
2445864 11: 5.1 +1.71 1.Rg6+ Ka7 2.Ka2 Rf8 3.Ka3 Rf2 4.Rd6
Kb7 5.Ka4 Kc7 6.c5 Ra2+ 7.Kb5 Rc2
3196317 12 6.6 +2.02 1.Rg6+ Ka7 2.Ka2 Rf8 3.Ka3 Rb8 4.Rh6
Rb4 5.Rc6 Rb8 6.Rg6 Rb7 7.c5
3359336 12: 6.9 +2.02 1.Rg6+ Ka7 2.Ka2 Rf8 3.Ka3 Rb8 4.Rh6
Rb4 5.Rc6 Rb8 6.Rg6 Rb7 7.c5
Ply: 12
Rg3-g6 Ka6-a7 Ka1-a2 Rb8-f8 Ka2-a3 Rf8-b8 Rg6-h6 Rb8-b4
Rh6-c6 Rb4-b8 Rc6-g6 Rb8-b7 c4-c5
Score: 2.02 (494) Evals: 1816384 Time: 6.9s nps: 488418 Q/all: 0.27
-------------------------------------------------------------------
nodes cutoffs missed tree_exp
-------------------------------------------------------------------
path 2463778 367517 31177 1.08
quies 895558 368421 41128 1.11
all 3359336 735938 72305 1.10
-------------------------------------------------------------------
hashtable= attempts: 2447684 hits: 69.5% perfect: 35.8%
Side-to-wait attack calls: 4564 calls/node: 0.136%
Lazy evals = low:0 cutoff:0 normal:0
-------------------------------------------------------------------
hits missed efficiency
-------------------------------------------------------------------
BB cached in eval 944829 871038 0.520
Checks Cheap/Exp 3790960 6097 0.998
InChecks Cheap/Exp 1894482 1865331 0.504
Attacks generation 3719890 2497476 0.598
Attacks SEE 1353678 2108212 0.391
Pawn hash 1752660 30307 0.983
Material hash 1799548 98 1.000
Make normal moves 3790943 6097 0.998
-------------------------------------------------------------------
GTB CACHE STATS
probes: 0
hard: 0
soft: 0
hits: 0
misses: 0
softmisses: 0
efficiency: 0.0%
average searches 0.0
occupancy: 0.0%
Profile = fragment_failhigh()
Total time: 6.88 s
External time: 0.70 s
Internal time: 6.17 s
Ratio Internal/Total: 89.79
move g3g6