TalkChess.com

Posted: **Thu Aug 08, 2019 6:56 pm**

Things are going pretty well now - the piece values are learned long ago and it is actually starting to develop preference for well known openings now.

Code: Select all

# rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
# [st = 11114ms, mt = 29250ms , hply = 0 , moves_left 10]
63 8 111 2603  e2-e3 Nb8-c6 Nb1-c3 d7-d5 d2-d4 Ng8-f6 Ng1-f3
64 8 223 6024  c2-c4 Nb8-c6 Nb1-c3 e7-e5 e2-e4 Ng8-f6 Ng1-f3 Bf8-e7 Bf1-e2
65 6 334 10070  d2-d4 d7-d5 f2-f3 Nb8-c6 Nb1-c3 Ng8-f6 e2-e4 d5xe4 f3xe4 Qd8-d7
66 6 446 14134  d2-d4 Ng8-f6 Ng1-f3 d7-d5 Nb1-c3 Nb8-c6 h2-h4 h7-h5 Nf3-g5 Qd8-d7 f2-f4
67 4 558 18339  Nb1-c3 e7-e5 e2-e4 Nb8-c6 Ng1-f3 Ng8-f6 Bf1-e2 Bf8-e7 Rh1-f1 Rh8-f8 h2-h3
68 4 670 22748  Ng1-f3 d7-d5 c2-c3 Nb8-c6 d2-d4 Ng8-f6 b2-b4 Nf6-e4 b4-b5 Nc6-a5
69 4 782 27090  d2-d4 d7-d5 g2-g3 Ng8-f6 Ng1-f3 Nb8-c6 Nb1-c3 h7-h5 h2-h4 Nf6-g4
70 4 894 31872  Ng1-f3 c7-c5 Nb1-c3 Nb8-c6 e2-e4 Ng8-f6 e4-e5 Nf6-d5 Nc3xd5
71 5 1005 36133  Ng1-f3 c7-c5 Nb1-c3 Nb8-c6 e2-e4 Ng8-f6 e4-e5 Nf6-d5 Nc3xd5
72 5 1065 38332  d2-d4 d7-d5 g2-g3 Ng8-f6 Ng1-f3 Nb8-c6 Nb1-c3 h7-h5 h2-h4 Nf6-g4 Bf1-h3

# Move   Value=(V,P,V+P)   Policy  Visits                  PV
#----------------------------------------------------------------------------------
#  1   (0.501,0.500,0.501)  17.15    7185   Nb1-c3 d7-d5 Ng1-f3 Nb8-c6 d2-d4 Ng8-f6 h2-h4 h7-h5 Nf3-g5 Qd8-d7 f2-f4
#  2   (0.507,0.500,0.504)  15.32   10871   d2-d4 d7-d5 g2-g3 Ng8-f6 Ng1-f3 Nb8-c6 Nb1-c3 h7-h5 h2-h4 Nf6-g4 Bf1-h3
#  3   (0.500,0.500,0.500)  11.57    9395   Ng1-f3 d7-d5 c2-c3 Nb8-c6 d2-d4 Ng8-f6 b2-b4 Nf6-e4 b4-b5 Nc6-a5
#  4   (0.497,0.500,0.497)   6.20    1600   e2-e4 Nb8-c6 Nb1-c3 e7-e5 Ng1-f3 Ng8-f6 Bf1-e2 Bf8-e7 Rh1-f1 Rh8-f8 h2-h3
#  5   (0.498,0.500,0.498)   3.64     754   Ng1-h3 Nb8-c6 Nb1-c3 d7-d5 d2-d4 Ng8-f6 f2-f4 Nf6-e4 Nc3xe4 d5xe4
#  6   (0.486,0.500,0.486)   3.60     428   Nb1-a3 Nb8-c6 Ng1-f3 d7-d5 d2-d4 Ng8-f6 Qd1-d2 Nf6-e4 Qd2-e3
#  7   (0.499,0.500,0.499)   3.37     732   b2-b4 d7-d5 d2-d4 Nb8-c6 b4-b5 Nc6-a5 Ng1-f3 Ng8-f6 Nf3-e5 Qd8-d6 Nb1-c3
#  8   (0.500,0.500,0.500)   3.29     740   c2-c4 Nb8-c6 Nb1-c3 e7-e5 e2-e4 Ng8-f6 Ng1-f3 Bf8-e7 Bf1-e2 Rh8-f8 Rh1-f1
#  9   (0.491,0.500,0.491)   3.21     450   d2-d3 d7-d5 Nb1-c3 Nb8-c6 Ng1-f3 Ng8-f6 h2-h4 d5-d4
# 10   (0.497,0.500,0.497)   3.21     803   c2-c3 e7-e5 e2-e4 Nb8-c6 Ng1-f3 Ng8-f6 Bf1-e2 Bf8-e7 d2-d4
# 11   (0.494,0.500,0.494)   3.16     512   f2-f4 Nb8-c6 Nb1-c3 d7-d5 d2-d4 Ng8-f6 Ng1-f3 Nf6-e4
# 12   (0.491,0.500,0.491)   3.16     500   e2-e3 Nb8-c6 Nb1-c3 d7-d5 d2-d4 Ng8-f6 Ng1-f3 Nf6-e4 Bf1-d3
# 13   (0.495,0.500,0.495)   3.08     536   g2-g3 Nb8-c6 Nb1-c3 Ng8-f6 d2-d4 d7-d5 Ng1-f3 h7-h5 h2-h4 Nf6-g4
# 14   (0.486,0.500,0.486)   3.03     361   a2-a4 Nb8-c6 Nb1-c3 Ng8-f6 Ng1-f3 d7-d5 d2-d4 h7-h5
# 15   (0.492,0.500,0.492)   2.98     681   h2-h3 Nb8-c6 Nb1-c3 Ng8-f6 d2-d4 d7-d5 Ng1-f3 h7-h5 Qd1-d2 Qd8-d7
# 16   (0.499,0.500,0.499)   2.94     638   f2-f3 Nb8-c6 e2-e4 e7-e5 Nb1-c3 Ng8-f6 Qd1-e2 Bf8-e7 Qe2-d3 Nc6-b4
# 17   (0.496,0.500,0.496)   2.92     523   g2-g4 Nb8-c6 Nb1-c3 e7-e5 e2-e4 Ng8-f6 g4-g5 Nf6-h5 Ng1-f3
# 18   (0.501,0.500,0.500)   2.88     695   b2-b3 Nb8-c6 Ng1-f3 d7-d5 d2-d4 Ng8-f6 Nb1-c3 h7-h5 Qd1-d2
# 19   (0.499,0.500,0.499)   2.70     569   a2-a3 Nb8-c6 Nb1-c3 d7-d5 d2-d4 Ng8-f6 Ng1-f3 h7-h5 h2-h4
# 20   (0.488,0.500,0.488)   2.61     358   h2-h4 Nb8-c6 Nb1-c3 Ng8-f6 d2-d4 d7-d5 Ng1-f3 h7-h5 Nf3-g5

# nodes = 257747 <0% qnodes> time = 10703ms nps = 24081 eps = 0 nneps = 3598
# Tree: nodes = 51647 depth = 12 pps = 3596 visits = 38332 
#       qsearch_calls = 0 search_calls = 0
move d2d4
Bye Bye

It hasn't learned to castle soon or about king safety that much yet -- training is still at <250k games so there is a lot to go.
I have one question: when to drop learning rate? It is still using an LR=0.15 but as you can see in the plot it is zigzagging a lot though it was like that from the beginning anyway. The zigzaging is most likely because I am training a net every 512 games ..

I think there is a thought that keeping a learning rate as high as possible and dropping in a stepwise fashion gives better generalization than using a gradually decaying learning rate such as exponential decay. These all sounds to me like a black art.
I have not done hyper-parameter study to determine the right learning rate but often the right one is one that gives a smooth reduction in the loss.
Mine is clearly zigzagging a lot, although from the very beginning, so i am not sure when to drop it and by how much ( maybe be by 10x like A0?).

Daniel

Posted: **Mon Aug 19, 2019 11:42 am**

Michel wrote: ↑Wed Jul 31, 2019 6:48 pm
Daniel Shawul wrote: ↑Wed Jul 31, 2019 5:24 pm
Michel wrote: ↑Wed Jul 31, 2019 8:32 am
The joinntdist was still running after 10 minutes before i had to stop it. I will try again later to see what kind of error bounds it produces compared to
the full hessian inverse method which seem to be the better approach so far IMO.
Well near its maximum the posterior is multivariate Gaussian but the posterior (if derived from a logistic function) has fatter tails (they are e^{-ax} instead of e^{-ax^2}) so it is not inconceivable that in degenerate situations (i.e. a poorly connected tournament graph and few games) the true Bayesian credibility intervals would be different from those estimated with a multivariate Gaussian (I am not saying they will be, just that it seems possible).

As usual I am interested in the mathematical side of this. Standard Bayesian practice to obtain point estimates and credibility intervals is to sample from the posterior. By accident I happen to have some experience with MCMC. It is a wonderful method that you can throw at anything but it needs fine tuning for the particular problem at hand (one issue is that it is not so easy to see when it has converged). Since in this case we already have a good approximation of the posterior (a Gaussian) perhaps there are better methods to sample from the posterior.
I run the it overnight and it still didn't finish! Note that I did not use the poorly connected graph we were discussing about, but something else with just 10 players. It looks like even a 3-dimension problem takes too long with the joint probablity distribution method. The only thing i managed to get a result out of it is for 2 players and the results are similar to the rest. Both inverse diagonal and full inverse are also indistinguishable with two players only.
In any case, this method is impractical in the current non-montecarlo implementation form. Since you have MCMC experience, maybe you can implement something for comparing it to the full hessian inverse method?
It is an enticing idea but I am rather busy professionally right now. I'll see what I can do.

In any case Gibbs sampling https://en.wikipedia.org/wiki/Gibbs_sampling appears to be well adapted to
elo estimation. It reduces the problem to sampling from the posterior for a single player against n-1 other
players with given fixed elo (and then one cycles through the players each time updating the elo with the latest sample).

The case of a single player is maybe easiest to do by discretization of the posterior.

I thought a bit more about this and while it is interesting from a mathematical point of view, from a practical point of view I think it is not worth the effort. In situations where it really matters, i.e. when there are so few games that the Hessian approximation is no longer valid, the result of the "exact" Bayesian calculation is going to depend heavily on the prior which is of course pretty arbitrary. So one could not say that the "exact" Bayesian calculation (using Monte Carlo of course) is somehow better than the approximate calculation using the Hessian.

TalkChess.com

best way to determine elos of a group

Re: best way to determine elos of a group

Re: best way to determine elos of a group