Monte Carlo or Minimax?

Aldus · Post by **Aldus** » Thu Sep 13, 2018 6:09 pm

So basically the strength of AZ was mostly doe to NN, not to MCST per se.

The bits about linear evaluation are very interesting. Because indeed, linear evaluation is wrong evaluation. A strong human player doesn't linearly evaluate, but correlates everything with everything. Two pieces acting in synergy are stronger than their mere summation, and that also correlates with other aspects of the board.

So ultimately, it's a victory of NN. Because while we have troubles putting on paper every parameter that a grandmaster is using in his head and it's accurate value, a proper designed and sufficiently fast NN could discover them by simply training. And in theory, we could apply the NN knowledge to better evaluate a leaf or to better prune a search node in a minimax program.

hgm · Post by **hgm** » Fri Sep 14, 2018 7:50 pm

Aldus wrote: ↑Thu Sep 13, 2018 12:10 amSpecial hardware that condensed that enormous amount of training in 9 hours, according to Wikipedia.

Even when playing (rather than training), AlphaZero was using 4 TPUs, which equates to a computing power far greater than the best that consumers can buy (in the form of advanced graphics cards) and several orders of magnitude more than what you could do just using the CPU of a PC or laptop.

But the point of interest here, imho, is that AlphaZero reached that level of play by only searching an average of 80k nps.

Indeed, that is very impressive. Problem is that on a typical PC you should be happy to do 80 nps, with the same NN.

What do you mean, the version of Stockfish and Elmo that AZ competed against?

I mean virtually every engine that currently exiss. Like the participants of TCEC.

Aldus · Post by **Aldus** » Sat Sep 15, 2018 12:45 am

hgm wrote: ↑Fri Sep 14, 2018 7:50 pm I mean virtually every engine that currently exiss. Like the participants of TCEC.

Oh, I get it now. But before posting I did a search here for Monte Carlo. I was expecting to see a lot of discussions around the subject and many possible attempts to try MC instead of AB in current engines. Instead, I got very few results. That's why I asked how come everyone is so fond of minimax.

Michael Sherwin · Post by **Michael Sherwin** » Sun Sep 16, 2018 2:01 am

Pure Monte Carlo plays enormous amounts of random games but has no way of intelligently exploring the search space. Pure minimax explores the entire lower part of the search space efficiently but that space is enormous and therefore suffers from the horizon effect. The NN approach of guiding the search to play far less games than MC is superior to MC because the search space is searched more efficiently compared to MC based on the fact that they both use special hardware. There is another way!

Using an alpha/beta searcher that is basically a material value only searcher the number of beta cutoffs (and or mates) in the tree of each root move can be counted for both sides and a ratio can be computed for each root move. Then choose the highest material score using the ratio value as a tiebreak. This will play surprisingly well beyond what anyone would guess. And it is very fast as there is absolutely no evaluation function! That means that if the search is limited to 6 ply (give or take) that very fast games can be played. As each game is played it is grafted into a permanent tree structure. And reinforcement learning values in the tree structure are tweaked after each game played to cause future games to vary.

The simple counting of beta cutoffs combined with reinforcement learning will be enough to cause the search space to be searched very efficiently and intelligently and much much faster than using NN. And will run on normal hardware!

hgm · Post by **hgm** » Sun Sep 16, 2018 11:49 pm

Aldus wrote: ↑Sat Sep 15, 2018 12:45 amOh, I get it now. But before posting I did a search here for Monte Carlo. I was expecting to see a lot of discussions around the subject and many possible attempts to try MC instead of AB in current engines. Instead, I got very few results. That's why I asked how come everyone is so fond of minimax.

Well, this would only apply to people who started writing an engine from scratch in the past year, which there aren't all that much.

And the MC method so far only has been shown to be competitive on a computer with a high-end video card (which not many people have), after 50 years or so of training. (Unless you own 50 such computers, or can draw on a large community of people that donate their computing power to you.) LeelaZero is such a project, but there isn't room for many such project, as one project already consumes all resources of the entire community.

With this in mind it doesn't seem very surprisong that someone who wants to start writing a new Chess engine is deterred from using MC.

Still, I am tempted to try something my hand at the MC-zero approach. But for Peace Chess rather than orthodox Chess. Because in that case we really have no clue how the game should be played at all.

Aldus · Post by **Aldus** » Mon Sep 17, 2018 9:52 pm

Michael Sherwin wrote: ↑Sun Sep 16, 2018 2:01 am Pure Monte Carlo plays enormous amounts of random games but has no way of intelligently exploring the search space. Pure minimax explores the entire lower part of the search space efficiently but that space is enormous and therefore suffers from the horizon effect. The NN approach of guiding the search to play far less games than MC is superior to MC because the search space is searched more efficiently compared to MC based on the fact that they both use special hardware. There is another way!

Using an alpha/beta searcher that is basically a material value only searcher the number of beta cutoffs (and or mates) in the tree of each root move can be counted for both sides and a ratio can be computed for each root move. Then choose the highest material score using the ratio value as a tiebreak. This will play surprisingly well beyond what anyone would guess. And it is very fast as there is absolutely no evaluation function! That means that if the search is limited to 6 ply (give or take) that very fast games can be played. As each game is played it is grafted into a permanent tree structure. And reinforcement learning values in the tree structure are tweaked after each game played to cause future games to vary.

The simple counting of beta cutoffs combined with reinforcement learning will be enough to cause the search space to be searched very efficiently and intelligently and much much faster than using NN. And will run on normal hardware!

All so interesting stuff! Is there a discussion about this idea somewhere? Cause I would like to read it.

As far as I get it, the key point seems to replace all positional factors with "more good moves = better position" logic, right? Apply one of the basic ideas of MC into Minimax space.

Michael Sherwin · Post by **Michael Sherwin** » Mon Sep 17, 2018 10:50 pm

As far as I get it, the key point seems to replace all positional factors with "more good moves = better position" logic, right? Apply one of the basic ideas of MC into Minimax space.

That is essentially correct. My very first working chess code was in x86 assembler and was not fully legal chess. All it was was a material searcher that counted beta cutoffs in the tree above each root move and subtract the opponent's count from the computer's count and used it as a tiebreak for best material score. Then I manually played it against CM5000 and it did really well even winning a couple of games. Although most games ended prematurely because my code did not understand castling or cep or queening.

Then in my engine RomiChess there has been reinforcement learning since January 2006. People were talking about MC algorithms in 2006 and one of the things that I wanted to try was exactly what I posted to your topic above. I made a post or two about the idea back then. Then my mom came down with Alzheimer's and I had to take care of her. Then I got very sick and have never fully recovered. That is why I have not programmed it myself.

Since the mini-max search is only material and does not vary each game of the MC approach there must be a way to cause it to search the tree intelligently. That is where reinforcement learning would come in. It would work just like in RomiChess except it would be learning in real time as opposed to end of game learning like in RomiChess. So all the moves of the winning side get a small bonus and all the moves of the losing side get a small penalty. In drawn games all the moves get a very small penalty. These reinforcement learning scores then get added to each root move and will cause the games to vary.

I hope that this was clear and understandable. I'm struggling with my health a bit too much right now so I'll have to leave it here, for now.

ycv005 · Post by **ycv005** » Sun Oct 07, 2018 5:19 am

Minimax is stable than MCT.

Monte Carlo or Minimax?

Re: Monte Carlo or Minimax?

Re: Monte Carlo or Minimax?

Re: Monte Carlo or Minimax?

Re: Monte Carlo or Minimax? Or this!

Re: Monte Carlo or Minimax?

Re: Monte Carlo or Minimax? Or this!

Re: Monte Carlo or Minimax?

Re: Monte Carlo or Minimax?