Announcing lczero

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Thu Jan 11, 2018 3:39 pm

Daniel Shawul wrote:I will be very curious to see how the MCTS + NN improve upon its tactical weakness.

I think If stockfish changes approach, it would only be to replace its evaluation with the NN while keeping the same search.

The problem of this is that you just cannot run NN evaluations fast enough to keep up. So you require the NN "evaluation" to be able to resolve a lot of tactics in the network. And if it can do that, the obvious question becomes, why not let it guide the search as well, as it would be a lot smarter than existing SF heuristics.

By the way, one question to ask yourself is whether you want to keep the "Zero knowledge" approach of AlphaGo, or take shortcuts.

You can take obvious shortcuts by feeding the training self-play games from Stockfish (though you need a PGN parser, which SF doesn't have) to take a headstart over random play.

You might want to feed a lot of attack and pawn eval planes, instead of just piece positions. You can even provide the SF eval as the network input, so the network only has to calculate things SF doesn't understand.

For Leela, I followed the AGZ "tabula rasa" approach almost directly. Not because I think it is the best (their own paper contains some indications it may not be), but because I did not want to have endless discussions about what "improvements" should be taken. One cannot run the entire distributed effort for every change to the procedure, like one runs changes on Fishtest.

Daniel Shawul · Post by **Daniel Shawul** » Thu Jan 11, 2018 6:45 pm

Gian-Carlo Pascutto wrote:
Daniel Shawul wrote:I will be very curious to see how the MCTS + NN improve upon its tactical weakness.

I think If stockfish changes approach, it would only be to replace its evaluation with the NN while keeping the same search.
The problem of this is that you just cannot run NN evaluations fast enough to keep up. So you require the NN "evaluation" to be able to resolve a lot of tactics in the network. And if it can do that, the obvious question becomes, why not let it guide the search as well, as it would be a lot smarter than existing SF heuristics.

This is one of the major questions I have with their approach. MCTS is not very good at tactics especially in chess where traps are common. See my post here: http://talkchess.com/forum/viewtopic.ph ... light=mcts
Because of that we can not directly translate its success in Go, where traps are very uncommon aside from late end games, to Chess, where it is very common according to the paper I mentioned there.

Then, it means that either the policy network have to be very good in tactics -- stretching my imagination I can assume it is able to resolve a 5-ply deep tactics. Even then it won't bee enough to compete the brute-forcers as they will insistenly find ways to make traps happen. I was able to alleviate this problem a bit once i started expanding the tree after every visit just like A0 -- but it is still very week in tactics and only slightly better than TSCP. How much improvment due you think the NNs would give if i use them to replace the qsearch, which is currently taking its place.

By the way, one question to ask yourself is whether you want to keep the "Zero knowledge" approach of AlphaGo, or take shortcuts.

You can take obvious shortcuts by feeding the training self-play games from Stockfish (though you need a PGN parser, which SF doesn't have) to take a headstart over random play.

You might want to feed a lot of attack and pawn eval planes, instead of just piece positions. You can even provide the SF eval as the network input, so the network only has to calculate things SF doesn't understand.

For Leela, I followed the AGZ "tabula rasa" approach almost directly. Not because I think it is the best (their own paper contains some indications it may not be), but because I did not want to have endless discussions about what "improvements" should be taken. One cannot run the entire distributed effort for every change to the procedure, like one runs changes on Fishtest.

I agree, if the goal is to get to the destination faster, we should be able to use better inputs like Giraffee did. A0 was aimed at making a point with the "tabula rasa" approach and its application to other domains.

Daniel

CheckersGuy · Post by **CheckersGuy** » Thu Jan 11, 2018 7:56 pm

Since A0 only did only visit 80k nodes per second I assume that the neural network had a good grasp on tactics. The gains in chess from the AlphaZero approach are arguably less in chess than in Go because of the more tactical nature in chess

supersharp77 · Post by **supersharp77** » Thu Jan 11, 2018 11:28 pm

gladius wrote:https://github.com/glinscott/leela-chess

It's a port of GCP's Leela Zero (https://github.com/gcp/leela-zero) to chess. GCP and the community have really done a wonderful job on the project.

The goal of the project is a distributed training project for the network weights, hopefully building a strong chess AI from scratch. I haven't had time to set up the training server yet. It's getting close though . If anyone wants to work on it as well, please let me know! It's exciting to see a totally different method of search/evaluation be competitive, and we need a public version of this.

Interestingly for chess, it appears that you need a really fast GPU to match the speed of evaluating the NN using the CPU! I get about 1,200 nodes/sec, running on 2 threads using CPU on my macbook pro. The GPU gives only 120 nodes/sec! My desktop with a Titan X gets about 2000 nodes/sec. This is also using a 5x64 net, instead of the AlphaZero 20x256 net, which would probably only be feasible on GPU.

There is incredibly basic UCI support (plays out a fixed 800 nodes) included.

Here is a game generated using the initial (completely random) weights:
[pgn]
1. b4 h5 2. h4 d5 3. Nh3 Nd7 4. Na3 Rh7 5. f4 e5 6. c4 exf4 7. Nf2 Ke7 8. cxd5 g5
9. Qc2 g4 10. Qb1 Kf6 11. Qe4 Qe8 12. e3 Qe6 13. Bc4 Be7 14. Ke2 Rb8 15. Qd4+ Ne5
16. Ne4+ Kg7 17. Rd1 Bxb4 18. Rg1 Bd6 19. Bb3 Qg6 20. Qa4 Qg5 21. Qc4 Bf5 22. Qd4
Be7 23. Qa4 Bf6 24. Qe8 Bd8 25. Qe7 Bxe7 26. d6 Re8 27. dxe7 Bg6 28. hxg5 g3 29. Nf6
Nxe7 30. Ng8 N7c6 31. Ne7 Nd8 32. Nxg6 R8h8 33. Ne7 Rf8 34. Ng6 R8h8 35. Ne7 Rf8
36. Ng6 R8h8
[/pgn]

Great Work....Congrats! AR

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Fri Jan 12, 2018 9:03 am

Daniel Shawul wrote:I will be very curious to see how the
This is one of the major questions I have with their approach. MCTS is not very good at tactics especially in chess where traps are common. See my post here: http://talkchess.com/forum/viewtopic.ph ... light=mcts

I think one no longer has to bother about shallow tactics if the DCNN is guiding search and evaluation. It can handle those very easily.

Then, it means that either the policy network have to be very good in tactics -- stretching my imagination I can assume it is able to resolve a 5-ply deep tactics.

It's not so much how many ply but more how often tactics resolve from certain piece configurations. DCNN in Go can read out tactics easily. This is why a pure DCNN player without search can still reach 4-5 dan (and a very big one like AGZ apparently to pro level).

Even then it won't bee enough to compete the brute-forcers as they will insistenly find ways to make traps happen.

Controversial as it may be, the AZ result vs Stockfish pulls this assumption in doubt, no? (Although you can see they had more problems in blitz in their graphs)

How much improvment due you think the NNs would give if i use them to replace the qsearch, which is currently taking its place.

I am not sure, but I would not do a regular A/B search with LMR/history/butterfly pruning at all. The DCNN should give much more accuracy pruning throughout the tree, and I would want this especially near the root (the opposite of what you propose).

I do not mean, I absolutely don't want the A/B part of the search (no idea what is best there!), but I mean I would take the DCNN probabilities to guide reductions over the search tables.

I agree, if the goal is to get to the destination faster, we should be able to use better inputs like Giraffee did. A0 was aimed at making a point with the "tabula rasa" approach and its application to other domains.

It's notable that on Go the best human players (i.e. source data if one would do supervised learning) were very far below the endpoint reached and so a kind of very far from optimal state. But in chess Stockfish was very close. So the risk of going in the wrong direction seems slim.

Rebel · Post by **Rebel** » Fri Jan 12, 2018 9:15 am

Gian-Carlo Pascutto wrote:I do not mean, I absolutely don't want the A/B part of the search (no idea what is best there!), but I mean I would take the DCNN probabilities to guide reductions over the search tables.

That has been my dream for years, a database with patterns that returns the number of reductions or a complete cut-off (kind of TB hit) based on the probability win rate of the pattern. Is that what the training sessions are about?

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Fri Jan 12, 2018 4:12 pm

Rebel wrote:
Gian-Carlo Pascutto wrote:I do not mean, I absolutely don't want the A/B part of the search (no idea what is best there!), but I mean I would take the DCNN probabilities to guide reductions over the search tables.
That has been my dream for years, a database with patterns that returns the number of reductions or a complete cut-off (kind of TB hit) based on the probability win rate of the pattern. Is that what the training sessions are about?

They are, with the added bonus that you get the static evaluation of the position in one go. (It turns out that computing that together with the move probabilities gives mutual benefits).

For example, in Go for the openings position you would get the following output:

Code: Select all

 Q16 ->    1783 &#40;V&#58; 49.77%) &#40;N&#58; 17.43%) PV&#58; Q16 D4 C16 Q4 J17 R14 O16 S16 R17 R11 C6 F3
  Q4 ->    1470 &#40;V&#58; 49.70%) &#40;N&#58; 15.44%) PV&#58; Q4 D16 D3 Q16 C14 F17 C8 K3 O3 G3 F4 F3
 D16 ->    1421 &#40;V&#58; 49.68%) &#40;N&#58; 15.24%) PV&#58; D16 Q4 Q17 D4 R6 O3 R12 K17 H17 N17
  D4 ->    1368 &#40;V&#58; 49.82%) &#40;N&#58; 12.71%) PV&#58; D4 Q16 R4 D16 L3 C6
 Q17 ->    1111 &#40;V&#58; 50.37%) &#40;N&#58;  4.19%) PV&#58; Q17 Q4 C16 D4 R6 O3 R12 E16 E17 F17 D17 G16
 D17 ->    1028 &#40;V&#58; 50.28%) &#40;N&#58;  4.75%) PV&#58; D17 D4 R16 Q4 C6 F3 C11 P16 P17 O17 Q17 O16 R14 K17
 R16 ->    1014 &#40;V&#58; 50.30%) &#40;N&#58;  4.53%) PV&#58; R16 D16 Q3 D4 F17 C14 M17 Q5 R5 R6 R4 Q7 O3
  Q3 ->     969 &#40;V&#58; 50.29%) &#40;N&#58;  4.39%) PV&#58; Q3 D16 C4 R16 P16 P17 O17 Q17 C14 F17 N16 R14
  D3 ->     948 &#40;V&#58; 50.24%) &#40;N&#58;  4.80%) PV&#58; D3 D16 R4 Q16 C14 F17 C8 P4 P3 O3 Q3 O4
  C4 ->     826 &#40;V&#58; 50.19%) &#40;N&#58;  4.59%) PV&#58; C4 Q4 D17 Q16 O3 R6 H3 D15 C15 C14
  R4 ->     803 &#40;V&#58; 50.20%) &#40;N&#58;  4.39%) PV&#58; R4 D4 Q17 D16 F3 C6 M3 Q15 R13 R17
 C16 ->     788 &#40;V&#58; 50.24%) &#40;N&#58;  3.96%) PV&#58; C16 Q16 D3 Q4 O17 R14 H17 D5 C5 C6 C4 D6
 R15 ->      14 &#40;V&#58; 49.70%) &#40;N&#58;  0.15%) PV&#58; R15 D4 D16
  P3 ->      13 &#40;V&#58; 49.54%) &#40;N&#58;  0.16%) PV&#58; P3 Q16 D4 C16
...

Move, Search Nodes, Evaluation, "Move probability"

Note how there's effectively only 12 out 361 moves searched, the others are barely looked at, or not at all. So it spits out a 12 ply line with a 14k node search despite the large theoretical branching factor. And the DCNN will be resolving tactics at the end of it still, but we can't see its PV of course.

There have been publications about similar ideas before, i.e. search with realization probabilities, which IIRC worked in Shogi. But what makes it work so well is that the DCNN is so damn accurate.

Imagine your code to reduce moves or not were of GM level by itself. That's how it is in Go. And maybe chess isn't necessarily far off.

gladius · Post by **gladius** » Fri Jan 12, 2018 7:14 pm

Gian-Carlo Pascutto wrote:
Daniel Shawul wrote:I will be very curious to see how the MCTS + NN improve upon its tactical weakness.

I think If stockfish changes approach, it would only be to replace its evaluation with the NN while keeping the same search.
The problem of this is that you just cannot run NN evaluations fast enough to keep up. So you require the NN "evaluation" to be able to resolve a lot of tactics in the network. And if it can do that, the obvious question becomes, why not let it guide the search as well, as it would be a lot smarter than existing SF heuristics.

By the way, one question to ask yourself is whether you want to keep the "Zero knowledge" approach of AlphaGo, or take shortcuts.

You can take obvious shortcuts by feeding the training self-play games from Stockfish (though you need a PGN parser, which SF doesn't have) to take a headstart over random play.

You might want to feed a lot of attack and pawn eval planes, instead of just piece positions. You can even provide the SF eval as the network input, so the network only has to calculate things SF doesn't understand.

For Leela, I followed the AGZ "tabula rasa" approach almost directly. Not because I think it is the best (their own paper contains some indications it may not be), but because I did not want to have endless discussions about what "improvements" should be taken. One cannot run the entire distributed effort for every change to the procedure, like one runs changes on Fishtest.

Yes, it's quite interesting, sending the SF eval would be trivial, although the eval is really designed to be used within a qsearch.

I agree that validating the framework using supervised learning makes a lot of sense, and that should be the next step. Past that, starting to feed different input planes for chess is quite interesting, as it's not like Go where those features would be incredibly hard to figure out. Chess is in some ways, pretty straightforward

.

Pretty exciting space to play in. Now I just need a few TPUs...

Daniel Shawul · Post by **Daniel Shawul** » Fri Jan 12, 2018 9:20 pm

Thanks, I value your experience with this and am ready to believe that the policy network can resolve tactics well.

I asked a similar question in computer go mailing list a while ago and learned that AlphaGo Lee handled ladders not via its neural networks, but simply by hand-coding the pattern and basically reducing the problem to a 1-ply lookahead search even for full board ladders. The pattern is mentioned in extended table 2 of the first paper. However, no one seemed to know how it was handled in AlphaGo zero and AlphaZero -- with the closest thing mentioned in the paper regarding this being that ladders were one of the last features to be learned.

If the NN is able to resolve tactics very well, the policy network alone should be a strong player by itself just like in Go. Even playing at TSCP level would impress me to be honest. It would be good to have a measure of the policy network's strength alone included in the final A0 paper, just like done in the the AlphaGo papers. However, say the policy network has a limit on the tactics it can resolve -- assume 8 plies tactics -- and we compare it against a brute force searcher that explores a 20-ply deep tree with lmr+nullmove, I would expect the latter to win but clearly the result of A0 says otherwise. Infact, it seems to be the opposite from the way it outperformed stockfish in tactics as well -- though now the distinction between tactics and strategy seems to be blurred.

Daniel

M ANSARI · Post by **M ANSARI** » Sat Jan 13, 2018 9:55 am

gladius wrote:https://github.com/glinscott/leela-chess

It's a port of GCP's Leela Zero (https://github.com/gcp/leela-zero) to chess. GCP and the community have really done a wonderful job on the project.

The goal of the project is a distributed training project for the network weights, hopefully building a strong chess AI from scratch. I haven't had time to set up the training server yet. It's getting close though . If anyone wants to work on it as well, please let me know! It's exciting to see a totally different method of search/evaluation be competitive, and we need a public version of this.

Interestingly for chess, it appears that you need a really fast GPU to match the speed of evaluating the NN using the CPU! I get about 1,200 nodes/sec, running on 2 threads using CPU on my macbook pro. The GPU gives only 120 nodes/sec! My desktop with a Titan X gets about 2000 nodes/sec. This is also using a 5x64 net, instead of the AlphaZero 20x256 net, which would probably only be feasible on GPU.

There is incredibly basic UCI support (plays out a fixed 800 nodes) included.

Here is a game generated using the initial (completely random) weights:
[pgn]
1. b4 h5 2. h4 d5 3. Nh3 Nd7 4. Na3 Rh7 5. f4 e5 6. c4 exf4 7. Nf2 Ke7 8. cxd5 g5
9. Qc2 g4 10. Qb1 Kf6 11. Qe4 Qe8 12. e3 Qe6 13. Bc4 Be7 14. Ke2 Rb8 15. Qd4+ Ne5
16. Ne4+ Kg7 17. Rd1 Bxb4 18. Rg1 Bd6 19. Bb3 Qg6 20. Qa4 Qg5 21. Qc4 Bf5 22. Qd4
Be7 23. Qa4 Bf6 24. Qe8 Bd8 25. Qe7 Bxe7 26. d6 Re8 27. dxe7 Bg6 28. hxg5 g3 29. Nf6
Nxe7 30. Ng8 N7c6 31. Ne7 Nd8 32. Nxg6 R8h8 33. Ne7 Rf8 34. Ng6 R8h8 35. Ne7 Rf8
36. Ng6 R8h8
[/pgn]

You need to look at bitcoin mining and how multiple GPU's are connected to dramatically increase the GPU's strength. A lot of the hardware used for bitcoin mining via GPU can be bought quite cheap now as ASIC miners are the only way to efficiently mine bitcoin today. My guess is that some of the specialized motherboards for bitcoin mining might be good hardware for this project. This project just seems to need a good hardware solution in place.

Announcing lczero

Re: Announcing lczero

Re: Announcing lczero

Re: Announcing lczero

Re: Announcing lczero

Re: Announcing lczero

Re: Announcing lczero

Re: Announcing lczero

Re: Announcing lczero

Re: Announcing lczero

Re: Announcing lczero