Code: Select all
# rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
# [st = 11114ms, mt = 29250ms , hply = 0 , moves_left 10]
63 8 111 2603 e2-e3 Nb8-c6 Nb1-c3 d7-d5 d2-d4 Ng8-f6 Ng1-f3
64 8 223 6024 c2-c4 Nb8-c6 Nb1-c3 e7-e5 e2-e4 Ng8-f6 Ng1-f3 Bf8-e7 Bf1-e2
65 6 334 10070 d2-d4 d7-d5 f2-f3 Nb8-c6 Nb1-c3 Ng8-f6 e2-e4 d5xe4 f3xe4 Qd8-d7
66 6 446 14134 d2-d4 Ng8-f6 Ng1-f3 d7-d5 Nb1-c3 Nb8-c6 h2-h4 h7-h5 Nf3-g5 Qd8-d7 f2-f4
67 4 558 18339 Nb1-c3 e7-e5 e2-e4 Nb8-c6 Ng1-f3 Ng8-f6 Bf1-e2 Bf8-e7 Rh1-f1 Rh8-f8 h2-h3
68 4 670 22748 Ng1-f3 d7-d5 c2-c3 Nb8-c6 d2-d4 Ng8-f6 b2-b4 Nf6-e4 b4-b5 Nc6-a5
69 4 782 27090 d2-d4 d7-d5 g2-g3 Ng8-f6 Ng1-f3 Nb8-c6 Nb1-c3 h7-h5 h2-h4 Nf6-g4
70 4 894 31872 Ng1-f3 c7-c5 Nb1-c3 Nb8-c6 e2-e4 Ng8-f6 e4-e5 Nf6-d5 Nc3xd5
71 5 1005 36133 Ng1-f3 c7-c5 Nb1-c3 Nb8-c6 e2-e4 Ng8-f6 e4-e5 Nf6-d5 Nc3xd5
72 5 1065 38332 d2-d4 d7-d5 g2-g3 Ng8-f6 Ng1-f3 Nb8-c6 Nb1-c3 h7-h5 h2-h4 Nf6-g4 Bf1-h3
# Move Value=(V,P,V+P) Policy Visits PV
#----------------------------------------------------------------------------------
# 1 (0.501,0.500,0.501) 17.15 7185 Nb1-c3 d7-d5 Ng1-f3 Nb8-c6 d2-d4 Ng8-f6 h2-h4 h7-h5 Nf3-g5 Qd8-d7 f2-f4
# 2 (0.507,0.500,0.504) 15.32 10871 d2-d4 d7-d5 g2-g3 Ng8-f6 Ng1-f3 Nb8-c6 Nb1-c3 h7-h5 h2-h4 Nf6-g4 Bf1-h3
# 3 (0.500,0.500,0.500) 11.57 9395 Ng1-f3 d7-d5 c2-c3 Nb8-c6 d2-d4 Ng8-f6 b2-b4 Nf6-e4 b4-b5 Nc6-a5
# 4 (0.497,0.500,0.497) 6.20 1600 e2-e4 Nb8-c6 Nb1-c3 e7-e5 Ng1-f3 Ng8-f6 Bf1-e2 Bf8-e7 Rh1-f1 Rh8-f8 h2-h3
# 5 (0.498,0.500,0.498) 3.64 754 Ng1-h3 Nb8-c6 Nb1-c3 d7-d5 d2-d4 Ng8-f6 f2-f4 Nf6-e4 Nc3xe4 d5xe4
# 6 (0.486,0.500,0.486) 3.60 428 Nb1-a3 Nb8-c6 Ng1-f3 d7-d5 d2-d4 Ng8-f6 Qd1-d2 Nf6-e4 Qd2-e3
# 7 (0.499,0.500,0.499) 3.37 732 b2-b4 d7-d5 d2-d4 Nb8-c6 b4-b5 Nc6-a5 Ng1-f3 Ng8-f6 Nf3-e5 Qd8-d6 Nb1-c3
# 8 (0.500,0.500,0.500) 3.29 740 c2-c4 Nb8-c6 Nb1-c3 e7-e5 e2-e4 Ng8-f6 Ng1-f3 Bf8-e7 Bf1-e2 Rh8-f8 Rh1-f1
# 9 (0.491,0.500,0.491) 3.21 450 d2-d3 d7-d5 Nb1-c3 Nb8-c6 Ng1-f3 Ng8-f6 h2-h4 d5-d4
# 10 (0.497,0.500,0.497) 3.21 803 c2-c3 e7-e5 e2-e4 Nb8-c6 Ng1-f3 Ng8-f6 Bf1-e2 Bf8-e7 d2-d4
# 11 (0.494,0.500,0.494) 3.16 512 f2-f4 Nb8-c6 Nb1-c3 d7-d5 d2-d4 Ng8-f6 Ng1-f3 Nf6-e4
# 12 (0.491,0.500,0.491) 3.16 500 e2-e3 Nb8-c6 Nb1-c3 d7-d5 d2-d4 Ng8-f6 Ng1-f3 Nf6-e4 Bf1-d3
# 13 (0.495,0.500,0.495) 3.08 536 g2-g3 Nb8-c6 Nb1-c3 Ng8-f6 d2-d4 d7-d5 Ng1-f3 h7-h5 h2-h4 Nf6-g4
# 14 (0.486,0.500,0.486) 3.03 361 a2-a4 Nb8-c6 Nb1-c3 Ng8-f6 Ng1-f3 d7-d5 d2-d4 h7-h5
# 15 (0.492,0.500,0.492) 2.98 681 h2-h3 Nb8-c6 Nb1-c3 Ng8-f6 d2-d4 d7-d5 Ng1-f3 h7-h5 Qd1-d2 Qd8-d7
# 16 (0.499,0.500,0.499) 2.94 638 f2-f3 Nb8-c6 e2-e4 e7-e5 Nb1-c3 Ng8-f6 Qd1-e2 Bf8-e7 Qe2-d3 Nc6-b4
# 17 (0.496,0.500,0.496) 2.92 523 g2-g4 Nb8-c6 Nb1-c3 e7-e5 e2-e4 Ng8-f6 g4-g5 Nf6-h5 Ng1-f3
# 18 (0.501,0.500,0.500) 2.88 695 b2-b3 Nb8-c6 Ng1-f3 d7-d5 d2-d4 Ng8-f6 Nb1-c3 h7-h5 Qd1-d2
# 19 (0.499,0.500,0.499) 2.70 569 a2-a3 Nb8-c6 Nb1-c3 d7-d5 d2-d4 Ng8-f6 Ng1-f3 h7-h5 h2-h4
# 20 (0.488,0.500,0.488) 2.61 358 h2-h4 Nb8-c6 Nb1-c3 Ng8-f6 d2-d4 d7-d5 Ng1-f3 h7-h5 Nf3-g5
# nodes = 257747 <0% qnodes> time = 10703ms nps = 24081 eps = 0 nneps = 3598
# Tree: nodes = 51647 depth = 12 pps = 3596 visits = 38332
# qsearch_calls = 0 search_calls = 0
move d2d4
Bye Bye
I have one question: when to drop learning rate? It is still using an LR=0.15 but as you can see in the plot it is zigzagging a lot though it was like that from the beginning anyway. The zigzaging is most likely because I am training a net every 512 games ..
I think there is a thought that keeping a learning rate as high as possible and dropping in a stepwise fashion gives better generalization than using a gradually decaying learning rate such as exponential decay. These all sounds to me like a black art.
I have not done hyper-parameter study to determine the right learning rate but often the right one is one that gives a smooth reduction in the loss.
Mine is clearly zigzagging a lot, although from the very beginning, so i am not sure when to drop it and by how much ( maybe be by 10x like A0?).
Daniel