Diminishing returns of increasing search depth

jarkkop · Post by **jarkkop** » Sat Jul 18, 2009 10:08 pm

Toga II 1.4beta5c played against itself. Search depth difference was fixed so that other searched one ply less.

Results
games in one test: 102
Error bounds 95% confidence about ±55 ELO

Code: Select all

ply	ELO difference
2/1	416
3/2	326
4/3	224
5/4	152
6/5	224
7/6	168
8/7	132
9/8	168
10/9	116
11/10	135
12/11	135
13/12	116
14/13	105

Fitting rational function to data points gives following prediction
How good is the prediction below?

Code: Select all

ply	ELO difference
1	401
2	314
3	259
4	220
5	191
6	169
7	152
8	137
9	125
10	116
11	107
12	100
13	93
14	88
15	83
16	78
17	74
18	71
19	68
20	65
21	62
22	59
23	57
24	55
25	53
26	51
27	49
28	48
29	46
30	45
31	43
32	42
33	41
34	40
35	39
36	38
37	37
38	36
39	35
40	34
41	33
42	33
43	32
44	31
45	31
46	30
47	29
48	29
49	28
50	28
51	27
52	27
53	26
54	26
55	25
56	25
57	25
58	24
59	24
60	23
61	23
62	23
63	22
64	22
65	22
66	21
67	21
68	21
69	20
70	20
71	20
72	20
73	19
74	19
75	19
76	19
77	18
78	18
79	18
80	18

Larry · Post by **Larry** » Sun Jul 19, 2009 12:59 am

Hi, and thanks for the very interesting work you have done.
We will never know how accurate the prediction is, as computer
power will never be good enough, IMO.
But at the lower search depths, the rating difference is staggering.
I sometimes wish we could take the program out of the very early
dedicated chess computers and put them on discs, to run at 2ghz.
all the best
Larry

Laskos · Post by **Laskos** » Sun Jul 19, 2009 2:53 am

jarkkop wrote:Toga II 1.4beta5c played against itself. Search depth difference was fixed so that other searched one ply less.

Results
games in one test: 102
Error bounds 95% confidence about ±55 ELO
Code: Select all
ply	ELO difference
2/1	416
3/2	326
4/3	224
5/4	152
6/5	224
7/6	168
8/7	132
9/8	168
10/9	116
11/10	135
12/11	135
13/12	116
14/13	105
Fitting rational function to data points gives following prediction
How good is the prediction below?
Code: Select all
ply	ELO difference
1	401
2	314
3	259
4	220
5	191
6	169
7	152
8	137
9	125
10	116
11	107
12	100
13	93
14	88
15	83
16	78
17	74
18	71
19	68
20	65
21	62
22	59
23	57
24	55
25	53
26	51
27	49
28	48
29	46
30	45
31	43
32	42
33	41
34	40
35	39
36	38
37	37
38	36
39	35
40	34
41	33
42	33
43	32
44	31
45	31
46	30
47	29
48	29
49	28
50	28
51	27
52	27
53	26
54	26
55	25
56	25
57	25
58	24
59	24
60	23
61	23
62	23
63	22
64	22
65	22
66	21
67	21
68	21
69	20
70	20
71	20
72	20
73	19
74	19
75	19
76	19
77	18
78	18
79	18
80	18

I don't know what rational function you used, I used Least Squares with Pade Approximant (a*x^2+b*x+c)/(d*x^3+e*x^2+f*x+g)

My results (80 too) are closer than your to actual data (correlation 0.990988, what is yours?), but very similar at the end of the tail. I think we can assume much from those data, but you need more than 102 games, try to do a 1000, the two sigmas will be less than 20 Elo points.

Code: Select all


1   415.16, 
2   327.642, 
3   222.584, 
4   155.311, 
5   155.504, 
6   153.791, 
7   147.005, 
8   138.66,
9   130.283, 
10  122.41, 
11  115.196, 
12  108.648, 
13  102.722, 
14  97.3569, 
15  92.4898,
16  88.0625, 
17  84.0232, 
18  80.3263, 
19  76.9323, 
20  73.807, 
21  70.9207, 
22  68.2479,
23  65.7662, 
24  63.4563, 
25  61.3012, 
26  59.2863, 
27  57.3983, 
28  55.6258, 
29  53.9587,
30  52.3879, 
31  50.9054, 
32  49.504, 
33  48.1772, 
34  46.9194, 
35  45.7253, 
36  44.5901,
37  43.5098, 
38  42.4803, 
39  41.4983, 
40  40.5605, 
41  39.664, 
42  38.8061, 
43  37.9845,
44  37.1969, 
45  36.4411, 
46  35.7154, 
47  35.018, 
48  34.3472, 
49  33.7016, 
50  33.0798,
51  32.4805, 
52  31.9024, 
53  31.3445, 
54  30.8058, 
55  30.2853, 
56  29.782, 
57  29.2951,
58  28.8239, 
59  28.3676, 
60  27.9255, 
61  27.497, 
62  27.0814, 
63  26.6781, 
64  26.2867,
65  25.9065, 
66  25.5372, 
67  25.1783, 
68  24.8293, 
69  24.4899, 
70  24.1596, 
71  23.838, 
72  23.525,
73  23.22, 
74  22.9228, 
75  22.6331, 
76  22.3507, 
77  22.0752, 
78  21.8064, 
79  21.5441, 
80  21.288.

Kai

Eelco de Groot · Post by **Eelco de Groot** » Sun Jul 19, 2009 3:55 am

Laskos wrote:
jarkkop wrote:Toga II 1.4beta5c played against itself. Search depth difference was fixed so that other searched one ply less.

Results
games in one test: 102
Error bounds 95% confidence about ±55 ELO
Code: Select all
ply	ELO difference
2/1	416
3/2	326
4/3	224
5/4	152
6/5	224
7/6	168
8/7	132
9/8	168
10/9	116
11/10	135
12/11	135
13/12	116
14/13	105
Fitting rational function to data points gives following prediction
How good is the prediction below?
Code: Select all
ply	ELO difference
1	401
2	314
3	259
4	220
5	191
6	169
7	152
8	137
9	125
10	116
11	107
12	100
13	93
14	88
15	83
16	78
17	74
18	71
19	68
20	65
21	62
22	59
23	57
24	55
25	53
26	51
27	49
28	48
29	46
30	45
31	43
32	42
33	41
34	40
35	39
36	38
37	37
38	36
39	35
40	34
41	33
42	33
43	32
44	31
45	31
46	30
47	29
48	29
49	28
50	28
51	27
52	27
53	26
54	26
55	25
56	25
57	25
58	24
59	24
60	23
61	23
62	23
63	22
64	22
65	22
66	21
67	21
68	21
69	20
70	20
71	20
72	20
73	19
74	19
75	19
76	19
77	18
78	18
79	18
80	18
I don't know what rational function you used, I used Least Squares with Pade Approximant (a*x^2+b*x+c)/(d*x^3+e*x^2+f*x+g)

My results (80 too) are closer than your to actual data (correlation 0.990988, what is yours?), but very similar at the end of the tail. I think we can assume much from those data, but you need more than 102 games, try to do a 1000, the two sigmas will be less than 20 Elo points.
Code: Select all
1   415.16, 
2   327.642, 
3   222.584, 
4   155.311, 
5   155.504, 
6   153.791, 
7   147.005, 
8   138.66,
9   130.283, 
10  122.41, 
11  115.196, 
12  108.648, 
13  102.722, 
14  97.3569, 
15  92.4898,
16  88.0625, 
17  84.0232, 
18  80.3263, 
19  76.9323, 
20  73.807, 
21  70.9207, 
22  68.2479,
23  65.7662, 
24  63.4563, 
25  61.3012, 
26  59.2863, 
27  57.3983, 
28  55.6258, 
29  53.9587,
30  52.3879, 
31  50.9054, 
32  49.504, 
33  48.1772, 
34  46.9194, 
35  45.7253, 
36  44.5901,
37  43.5098, 
38  42.4803, 
39  41.4983, 
40  40.5605, 
41  39.664, 
42  38.8061, 
43  37.9845,
44  37.1969, 
45  36.4411, 
46  35.7154, 
47  35.018, 
48  34.3472, 
49  33.7016, 
50  33.0798,
51  32.4805, 
52  31.9024, 
53  31.3445, 
54  30.8058, 
55  30.2853, 
56  29.782, 
57  29.2951,
58  28.8239, 
59  28.3676, 
60  27.9255, 
61  27.497, 
62  27.0814, 
63  26.6781, 
64  26.2867,
65  25.9065, 
66  25.5372, 
67  25.1783, 
68  24.8293, 
69  24.4899, 
70  24.1596, 
71  23.838, 
72  23.525,
73  23.22, 
74  22.9228, 
75  22.6331, 
76  22.3507, 
77  22.0752, 
78  21.8064, 
79  21.5441, 
80  21.288.
Kai

This is of course very nice, a correlation of 0.990988 but why do you think you can extrapolate these data almost ad infinitum? I would not trust any figure beyond 16 plies if you have experimental data running only till 14 plies. It is not a linear scale, the x-axis is more a logarithmic scale. What kind of statistics do you need for that? I do not know. Please, I'm no statistician but I don't believe those tail end approximations are a hair better than reading tea-leaves

No matter how many thousands of games poor Jarkko has to play, printing this kind of list in my eyes, this is abuse of mathematics. Sorry Kai!

Eelco

Laskos · Post by **Laskos** » Sun Jul 19, 2009 4:23 am

Eelco de Groot wrote:

This is of course very nice, a correlation of 0.990988 but why do you think you can extrapolate these data almost ad infinitum? I would not trust any figure beyond 16 plies if you have experimental data running only till 14 plies. It is not a linear scale, the x-axis is more a logarithmic scale. What kind of statistics do you need for that? I do not know. Please, I'm no statistician but I don't believe those tail end approximations are a hair better than reading tea-leaves No matter how many thousands of games poor Jarkko has to play, printing this kind of list in my eyes, this is abuse of mathematics. Sorry Kai!

Eelco

Eelco, I also think like you. I tried logarithmic fitting, it doesn't work, it gives me even negative results (that is, decrease in Elo from an addition of a ply, which is absurd). I also do not believe that results beyond 20-25 plies are of some interest, but the fact that at 80 plies we agree within 3 Elo points is somehow surprising. Maybe up to 25 plies it can be trusted? Then, the doubling in speed can be calculated more precisely (as 1 ply is usually about 2.3-2.7 increase in time)? The results are not totally worthless, I would say (they are almost worthless for more than 25 plies). The Pade Approximants sometimes give unexpectedly good results. Maybe in 10 years we will see

.

Kai

jarkkop · Post by **jarkkop** » Sun Jul 19, 2009 7:36 am

Maybe Bob can help us with Crafty to get a solution?
His testing cluster could give us thousands of games and more ply depth in test results.
It would be interesting to compare my data to data that Crafty produces.

- Jarkko

Laskos · Post by **Laskos** » Sun Jul 19, 2009 4:32 pm

jarkkop wrote:Maybe Bob can help us with Crafty to get a solution?
His testing cluster could give us thousands of games and more ply depth in test results.
It would be interesting to compare my data to data that Crafty produces.

- Jarkko

I think you can do it, to 1000 games, and it is worth doing it. Below 10-12 plies the games should be very fast, do it the same way, up to ply 14, if you can. The fitted results are not so speculative, for example, from the fitted results, at ply 15 the doubling of time is around 70 Elo points (a value usually given), at 25 plies the doubling is only worth around 45 points. Not a bad prediction, we will see if it was correct in several years. Besides that, I am wondering how our tails are so close, which rational function did you use to do the fit?

Kai

Laskos · Post by **Laskos** » Mon Jul 20, 2009 2:53 am

Eelco de Groot wrote:
It is not a linear scale, the x-axis is more a logarithmic scale. What kind of statistics do you need for that? I do not know.

Eelco

A small correction, plies are logarithmic with time, Elo is also logarithmic, so Elo-plies is linear (assumed constant nearly 70 for two-fold increase in time). It is not actually constant, and Pade Approximants do a very good job when nearly asymptotic linear approximation is needed. Jarkko is doing a very good test actually, I hope he will succeed at 12 ply (at least) with 1000 games, if Bob doesn't come to give us 14-ply results. Then I could extrapolate finely. Trust me up to the ply 25 or even further. And we will see in several years if we were wrong or right. The first result is as follows: 70 Elo points for doubling the time as of now (fast game, 15 plies), 45 Elo points for doubling the time in 6-8 years (long game, 25 plies, with very much improved hardware). Would you bet on a bottle of Champagne Louis Roederer's Cristal on the first of August 2017?

Kai

Eelco de Groot · Post by **Eelco de Groot** » Mon Jul 20, 2009 2:49 pm

Laskos wrote:
Eelco de Groot wrote:
It is not a linear scale, the x-axis is more a logarithmic scale. What kind of statistics do you need for that? I do not know.

Eelco
A small correction, plies are logarithmic with time, Elo is also logarithmic, so Elo-plies is linear (assumed constant nearly 70 for two-fold increase in time). It is not actually constant, and Pade Approximants do a very good job when nearly asymptotic linear approximation is needed. Jarkko is doing a very good test actually, I hope he will succeed at 12 ply (at least) with 1000 games, if Bob doesn't come to give us 14-ply results. Then I could extrapolate finely. Trust me up to the ply 25 or even further. And we will see in several years if we were wrong or right. The first result is as follows: 70 Elo points for doubling the time as of now (fast game, 15 plies), 45 Elo points for doubling the time in 6-8 years (long game, 25 plies, with very much improved hardware). Would you bet on a bottle of Champagne Louis Roederer's Cristal on the first of August 2017?

Kai

If the bet is about a Toga that I can build myself in 2017, I would be willing to engage in any kind of bet about ply levels correlated to my own Fruit derivative playing strength

The program would probably be called, still in 2017, Blueberry, when I would still be capable of making chessprograms then, maybe some kind of fine winegrape from France would be appropiate especially for the occasion but not Toga I'm afraid, it certainly would have to be some kind of fruit!

I am not much of a champagne connaisseur though. A friend of mine treated me to a good glass of white, from a very small house, appelation controllee 2007 fine wine yesterday, unfortunately I can't quite remember the name of the house now. It was pretty strong, I had just one glass of it and it was affecting my coordinaton when talking a walk outside later. I guess I'm not used to it and it was 13.5% alcohol. Mais la belle France! Go Lance!

I take your point about the double logarithmic axes Kai. But I still think there are factors that probably do not scale up logarithmically, at least not yet. If the hash table size as an example does not scale logarithmically with the size of the searchtree, for pure practical reasons, what effects does it have the quality of the analysis? It may be slightly positive even to have smaller tables, if you have less errors from hash collisions etc, although it does not help in keeping the search times acceptable.

I am not sure also what positional errors in the searchtree do with greater depth, you assume downward progation is limited, so searching deeper is always better, but is it true when there are diminshing returns in general also? Is it possible to have upward propagation effects of search errors, evaluation errors, bugs etc. that are increasing somehow with search depth when the tree grows larger. I am totally speculating about this but I think you can't exclude the possibility

Just advocating any "other side" of a possible debate here although I am not sure what the "other side" is exactly taking position at yet

I would say though that, in my opinion, it is likely possible though that any extrapolation will asymptotically approach to zero, so zero extra measurable elo returns at great plydepth, when given the Toga 1.4 beta 5c used for Jarkko's experiment we did today, or same experiments on 2017 hardware, but the same program version, so that determines for a big part the general shape of your curve towards the tail I think.

I do not know if the asymtotical approach towards zero is an underlying assumption of the extrapolation method that you both chose? I have not looked up the extrapolation method that you used Kai, but it was mainly any presumed 10^-4 elo-accuracy of four places behind the comma of the extrapolation that I was questioning, firstly...

Regards, Eelco

Eelco de Groot · Post by **Eelco de Groot** » Mon Jul 20, 2009 6:13 pm

Eelco de Groot wrote:
I would say though that, in my opinion, it is likely possible though that any extrapolation will asymptotically approach to zero, so zero extra measurable elo returns at great plydepth, when given the Toga 1.4 beta 5c used for Jarkko's experiment we did today, or same experiments on 2017 hardware, but the same program version, so that determines for a big part the general shape of your curve towards the tail I think.

I had no more time to edit but the word "extra" is maybe not the right word, when I really expect zero returns for Toga II 1.4beta5c beyond a certain point. I suppose then, that for the experiment with Toga II 1.4beta5c running on 2017 hardware if you took your preferred method of extrapolation but added a datapoint somewhere between 30 and 50, say at 40 ply Toga defeats 39 ply Toga by no more than 51%-49% or app. 7 elo, this would be closer (or at least give an idea of a possible error bar at 20-25 plies deep, what you maybe would like to know) to my 2017 vision. I'm assuming some least squares analysis is done on the data in Pade Approximation, you mention Least Squares, so the curve does not actually have to run through this point but the curve would be lower in general than with the expected 18 elo at 80 plies. And I'm staying on the safe side, the curve will probably will be lower in reality than this new approximation. The diminishing returns will have a much greater effect than the two given approximations suggest. For many possible reasons, Toga II 1.4beta5c not being tuned or even expected to ever work to such great depths is just one of them.

Eelco

Diminishing returns of increasing search depth

Diminishing returns of increasing search depth

Re: Diminishing returns of increasing search depth

Re: Diminishing returns of increasing search depth

Re: Diminishing returns of increasing search depth

Re: Diminishing returns of increasing search depth

Re: Diminishing returns of increasing search depth

Re: Diminishing returns of increasing search depth

Re: Diminishing returns of increasing search depth

Re: Diminishing returns of increasing search depth

Re: Diminishing returns of increasing search depth