Writing a minimax equivalent engine

koedem · Post by **koedem** » Thu Oct 21, 2021 3:32 pm

I've been writing an engine whose search is meant to be equivalent to minimax, mostly out of curiosity how strong this might still be, but also because the search behaves more "naturally" without some search instability, so it would be easier to understand.
The search therefore is a plain PV search. The TT only accepts bounds for entries with the same search depth as to stay equivalent indeed, although entries for different depths may be used for move ordering. (this all is achieved by storing the search depth in the hash key so a position might have multiple entries; in practice this is surprisingly not that much weaker although it is of course inferior to a normal TT)
There is one slight issue still, namely some GHI issues which can happen in positions containing move repetitions. (however, I think those are the only such issues that can arise, since I only accept exact depth hits) I had some thoughts on that which I may post at a later time.
But for now, I found the following curiosity and wondered whether this also happens in "real" engines and what the standard solution is:

I have (pseudo-)code roughly like this:

Code: Select all

fullWindowSearch(alpha, beta, ...)
	lookup = TT_lookup(pos)
	if (lookup.isLowerBound) {
		...
		if (lookup.eval > alpha) {
			alpha = lookup.eval // we have at least this score proven so it becomes alpha
		}	
	} else if (lookup.isUpperBound) {
		...
		if (lookup.eval < beta) {
			beta = lookup.eval // we have at most this score proven so it becomes beta
		}
	}
	for (move in moves) {
		if (nwSearch(pos, ...) > score) { // what about > alpha instead?
			score = fullWindowSearch(...)
		}

[EDIT: Now that I wrote it up like this I am starting to wonder why this old code is correct at all. It did appear to give the same results as my pre-TT version so I had always assumed it was correct. But is that actually so?
Could one break it like this? According to a TT hit we have at most x, so x becomes beta. One iteration deeper -x is alpha. We try first move, opponent refutes with their score of x being beta yet one iteration deeper, however they would have had better moves. (i.e. first move is not the move that would achieve that -x) Now the second move is the best move and the one that achieves -x, however, since it also gives -x as a score we can't tell that it's better than the first move so e.g. a PV might be incorrect.
If this is indeed a bug, I suppose it never occurred so far because it still returns the correct score so it might not matter much in practice? (I'll have to ponder on whether that assumption is true) But probably this is incorrect and as I change things might break stuff? ]

This all works fine as it should. [Or does it?] However, I was wondering what would happen if I replaced the > score condition with > alpha. In a normal NW-search that should not matter in terms of performance, however, when one wants to add aspiration windows or similar, it might matter. The problem is, with > alpha the code becomes incorrect as follows:
1. Pos x: Analyze move 1 with full window, get score = -20
2. Pos x: Analyze move 2 with null window, in Pos x + move2 the opponent needs to achieve +20 to fail high. All opponent moves fail low, so we know the upper bound for the opponent is +19. We store +19 as upper bound in the TT, then return.
3. Pos x: Now analyze move 2 with full window, in Pos x + move2 we read the TT entry and see the upper bound of +19, so set beta = +19.
4. Pos x + move2: Analyze move a with full window -> enter recursion with alpha = -19.
5. Pos x + move2 + move a: Analyze move 1 with full window, fail low, e.g. score = -20.
6. Pos x + move2 + move a: Analyze move 2 with null window. Now the condition tells us we have to beat alpha = -19 so we need at least -18. But we might not get that.
So then it breaks, because the alpha bound doesn't give us another move in reserve, but it's our own move that gave us this alpha value, i.e. we shouldn't need to beat it. Why does the old version work? After my EDIT above I'll have to think about that again since my thought of explanation might or might not still hold.
(EDIT 2: I suppose I might change the further up conditions to "if (lookup.eval - 1 > alpha) alpha = lookup.eval -1" which possibly solves both problems? Would I want to do that?)

koedem · Post by **koedem** » Thu Oct 21, 2021 4:11 pm

Re my last edit, since I can't actually edit the post it seems, I suppose what I would want is to use the exact bounds given in that iteration, but the softer ones deeper in the recursion where I don't yet know whether I have found those values yet? (and then once I find this value, alpha will naturally be updated anyway) This way I could still for instance return upon reaching the looked up beta, but not incorrectly pass it on into deeper search and potentially lead to issues.

Writing a minimax equivalent engine

Writing a minimax equivalent engine

Re: Writing a minimax equivalent engine