Why minimax is fundamentally flawed

kbhearn · Post by **kbhearn** » Mon Nov 10, 2014 6:41 pm

You could reduce leaf node positional evals as you back up as well beyond a certain threshold. There's not really a problem with say 2.00 meaning 2.05 in 5 moves. It might help a little with seeking progress in a timely manner when combined with hash grafting provided the local optimum you're stuck in is not too steep.

syzygy · Post by **syzygy** » Mon Nov 10, 2014 7:24 pm

hgm wrote:
syzygy wrote:Theoretically there may be cases where it goes wrong. In practice this seems to occur so rarely that I am confident that any "solution" (replace the minimax principle by somthing else?) would negatively affect playing strength.
That is a daring claim, for something you have not tested and goes against logic at the same time...

Feel free to come up with an alternative to the "fundamentally flawed" minimax approach that comes close in performance to current engines.

In the meantime, I am interested in examples of top engines falling into this trap.

Evert · Post by **Evert** » Mon Nov 10, 2014 7:42 pm

If I recall correctly, you do need to change the alpha/beta bounds at the top of the search, right (I forgot which way it goes, but I can probably reconstruct that if I think about it for more than 10 seconds)?

Otherwise, this is just the same sort of adjustment you do for a mate score, right?

hgm · Post by **hgm** » Mon Nov 10, 2014 7:45 pm

syzygy wrote:Feel free to come up with an alternative to the "fundamentally flawed" minimax approach that comes close in performance to current engines.

I already did. Years ago, in fact. It is called 'delayed-loss bonus'...

hgm · Post by **hgm** » Mon Nov 10, 2014 8:00 pm

Evert wrote:If I recall correctly, you do need to change the alpha/beta bounds at the top of the search, right (I forgot which way it goes, but I can probably reconstruct that if I think about it for more than 10 seconds)?

Otherwise, this is just the same sort of adjustment you do for a mate score, right?

Indeed, whenever you adjust a score afterwards, you must make sure it cannot enter the search window (being mistaken for an exact score, or a bound of the opposite type as it really was). So you have to imply the inverse operation to the window bounds. If all scores < curEval will be incremented by one point, you will have to decrement all window bounds that were negative by one point. So that a score that will be brought in-window by the increment, will already have been searched within the window initially.

So Fairy-Max has

Code: Select all

Search&#40;alpha, beta&#41;
&#123;
  int curEval = Eval&#40;);
  if&#40;alpha < curEval&#41; alpha--;
  if&#40;beta <= curEval&#41; beta--;

// usual Search code
...

  if&#40;bestScore < curEval&#41; bestScore++;
  return bestScore;
&#125;

This converts the mate score to DTM as a side effect (as mate scores are always larger than curEval and mated scores always smaller). It causes mate-distance pruning as another side effect, as when a mate is already found in another branch, setting alpha and beta, the pre-decrement of beta will push it below -INF in branches that reach beyond the already found mate, so that the initialization bestScore = -INF already causes a beta cutoff.

mvk · Post by **mvk** » Mon Nov 10, 2014 8:46 pm

hgm wrote:
syzygy wrote:Feel free to come up with an alternative to the "fundamentally flawed" minimax approach that comes close in performance to current engines.
I already did. Years ago, in fact. It is called 'delayed-loss bonus'...

Coincidentally, I had that in Rookie v2, 1997-1998 era. I don't have it in the v3 rewrite anymore because it costs several hard to predict branches per node (and it is difficult to understand). In fact, I dropped 'beta' from cut(), all() and qsearch(), and effectively just pass ~alpha to the children.

In v3, in regular positions, I don't observe the horizon problems that it helped solve in v2, I believe because v3 has far better evaluation. The end games with few pieces I intend to solve with DTZ tables, and after that remove the mopup evaluation. For next week I will just dump all of KRK into the opening book, as I don't feel like touching the engine on such short notice... Just in case it has to play out KRK against Spartacus next week...

Regarding resigning not gaining elo: for me it is a matter of courtesy. Online, the opponent is more likely to play more games if you resign on time, and when you can offer and accept draws at reasonable times. More games means more learning. Learning caused Rookie to have a book exit of +2.8 against Spartacus yesterday. With Black it might have been a draw otherwise. At least one player (not me...) in the Leiden tournaments now has a "Joker mode" that speeds up the moves in a won position, because he got upset having to play out a long and obvious win all the way to checkmate

michiguel · Post by **michiguel** » Mon Nov 10, 2014 8:58 pm

hgm wrote:
Evert wrote:If I recall correctly, you do need to change the alpha/beta bounds at the top of the search, right (I forgot which way it goes, but I can probably reconstruct that if I think about it for more than 10 seconds)?

Otherwise, this is just the same sort of adjustment you do for a mate score, right?
Indeed, whenever you adjust a score afterwards, you must make sure it cannot enter the search window (being mistaken for an exact score, or a bound of the opposite type as it really was). So you have to imply the inverse operation to the window bounds. If all scores < curEval will be incremented by one point, you will have to decrement all window bounds that were negative by one point. So that a score that will be brought in-window by the increment, will already have been searched within the window initially.

So Fairy-Max has
Code: Select all
Search&#40;alpha, beta&#41;
&#123;
  int curEval = Eval&#40;);
  if&#40;alpha < curEval&#41; alpha--;
  if&#40;beta <= curEval&#41; beta--;

// usual Search code
...

  if&#40;bestScore < curEval&#41; bestScore++;
  return bestScore;
&#125;
This converts the mate score to DTM as a side effect (as mate scores are always larger than curEval and mated scores always smaller). It causes mate-distance pruning as another side effect, as when a mate is already found in another branch, setting alpha and beta, the pre-decrement of beta will push it below -INF in branches that reach beyond the already found mate, so that the initialization bestScore = -INF already causes a beta cutoff.

We had a discussion a long while ago about this. That is what I do in Gaviota with the checkmates, and that is the only way I have even done it. Then, there is no need to manipulate whatever is in the hash table at all. I was always surprised that not everybody did it that way, but I guess it is a matter of perspective.

Miguel

michiguel · Post by **michiguel** » Mon Nov 10, 2014 9:01 pm

hgm wrote:Actually it is exactly the opposite. This is not really path-dependent info. That is, it does not depend on the path through which you reach the node from the root. What you hash is the path-independent eval score from the leaves, corrected according to the path between the leaves and the current node.

That is an intrinsic property of the position; each time you would encounter that same position, you would sprout the same sub-tree from it, which would apply the same correction to the score that was backed up from the leaf to the current node. There would never be any consistency. The inconsistencies occur when you start including info on the path between the root and the note (like distance to the root).

Micro-Max relies for the scoring of the mate distance entirely on the delayed-loss bonus. (On the path to a mate the score is of course always much higher than the current evaluation, so every move the opponent can delay the mate gives him the bonus, counting the length of the path.) There never is any need to correct what you store or retrieve from the hash table. It is automatically correct, no matter in what level of the tree you would hit on it.

Exactly. (blue color is mine).

Miguel

hgm · Post by **hgm** » Mon Nov 10, 2014 10:01 pm

mvk wrote:Coincidentally, I had that in Rookie v2, 1997-1998 era. I don't have it in the v3 rewrite anymore because it costs several hard to predict branches per node (and it is difficult to understand).

Actually I don't think it costs any branches at all. The way Fairy-Max writes it is different from the code I gave above for educational purposes. What it really does is:

Code: Select all

alpha -= &#40;alpha < curEval&#41;;
beta -= &#40;beta <= curEval&#41;;
...
return bestScore + &#40;bestScore < curEval&#41;;

i386 has instructions to copy a selected condition code into the low bit of AL. So the whole thing just compiles to a compare, a copy-condition-code and add instruction. I noticed that sometimes the compiler is even smarter, and generates a compare and and ADC $0 to add the carry bit that was or was not set by the compare. So it is completely branch free.

Personally I have always found it easier to understand than other methods. Future gains are devaluated, just as in my bank account.

In Usurpator II the delayed-loss bonus was not just a point, but a hefty 12.5% of the score. As mate was worth about 40, that meant it would only postpone a checkmate 1 ply to grab a Rook. Of course I was already very happy if it could see a mate in 3, so the discount was not applied very often along the branch! The problem was of course that when you add an easy 1/8 bonus to the score, you have to pre-correct the search window by multiplying with a non-trivial 8/7. And the rounding involved in my approximations to calculate that fast would of course cause the most horendous oversights, because they did in fact allow upper bounds to be corrected to scores that would be considered exact. I really had to make absolutely sure that 8/7*alpha was always rounded down and 8/7*beta always rounded up. In the end I got so fed up with it that I just multiplied it with 5/4, I think.

Code: Select all

PAT	jsr	EVAL
	jmp     RET
CUTOFF	ldx	MAXVAL
	ldy     MAXVAL+1
RET	stx	DYNVAL
	sty     DYNVAL+1
	bit     DYNVAL+1
	bpl     STS2
	tya
	lsr     a
	ror     DYNVAL
	lsr     a
	ror     DYNVAL
	lsr     a
	ror     DYNVAL
	ora     #$E0
	tay
	txa
	sec
	sbc     DYNVAL
	sta     DYNVAL
	lda     DYNVAL+1
	sty     DYNVAL+1
	sbc     DYNVAL+1
	sta	DYNVAL+1
STS2	rts

Evert · Post by **Evert** » Tue Nov 11, 2014 9:43 am

Thinking about it, I wonder if this is really a problem with minimax. We use minimax to find the best move in a position, and we define the best move to be the one that leads to a position with the best score (for us, obviously).

So far so good, but we haven't defined what we mean by "best score". If we define it to be "highest score at a leaf node", that's typically what everyone uses, but already for mate scores it's clear that this is not the best way to do it. A better definition is "the best move is the move that leads to the best score as quickly as possible", and that is what the delay bonus represents. That's not so much correcting a flaw in minimax as in our definition of "best move" and "best score".

Probably doesn't make much difference in practice though.

Why minimax is fundamentally flawed

Re: Why minimax is fundamentally flawed

Re: Why minimax is fundamentally flawed

Re: Why minimax is fundamentally flawed

Re: Why minimax is fundamentally flawed

Re: Why minimax is fundamentally flawed

Re: Why minimax is fundamentally flawed

Re: Why minimax is fundamentally flawed

Re: Why minimax is fundamentally flawed

Re: Why minimax is fundamentally flawed

Re: Why minimax is fundamentally flawed