Alpha-Beta as a rollouts algorithm

Discussion of chess software programming and technical issues.

Moderator: Ras

Michel
Posts: 2292
Joined: Mon Sep 29, 2008 1:50 am

Re: Alpha-Beta as a rollouts algorithm

Post by Michel »

The rollouts are still alpha-beta not scout (PVS) but we don't really need that.
It seems PVS is described in Theorem 3. It is actually very elegant (more elegant than the standard implementation with zero window searches).
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Michel
Posts: 2292
Joined: Mon Sep 29, 2008 1:50 am

Re: Alpha-Beta as a rollouts algorithm

Post by Michel »

The PVS rollout mechanism always selects the move with the highest stored (mathematical) upper bound.

This is very similar to UCT where a move is chosen with the highest statistical upper bound.

This makes indeed possible to envision all kinds of interpolations between UCT and alpha/beta.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Alpha-Beta as a rollouts algorithm

Post by Daniel Shawul »

Michel wrote:The PVS rollout mechanism always selects the move with the highest stored (mathematical) upper bound.

This is very similar to UCT where a move is chosen with the highest statistical upper bound.

This makes indeed possible to envision all kinds of interpolations between UCT and alpha/beta.
I thought they were going to implement MT-SSS* indirectly with null windows searches just like Aske Plaat did i.e using the MT driver passing (beta-1,beta) bounds at the root shown in Algorithm 2. I guess since PVS ~ MT-SSS*, PVS is sort of implemented indirectly. I quickly tried this and search hangs but is definately due to a bug i have, because move sorting should not have mattered. Will come back later with this.

---
For PVS, I was thinking of a more direct approach where we do the following:

1) Check if a given node has atleast one sibling with closed window (i.e. alpha bound determined), if so use a nullwindow for the rollout of the move.

2) When we backup and find out the score fails high, do research with full alpha-beta window instead of closing the window. This will force more rollouts on this node and at the root till the correct score is found.

This direct approach will also be useful for re-searching LMR reduced moves when they fail high, which i think it would be difficult with the other approach.

---

Theorem 2 always chooses the left-most child and its behviour is static for the duration of a given iteration in an ID framework. My selection policy is a gready one and always chooses the child with the highest score -- that means it is dynamic in a given iteration. I think this is better since it incorporates behaviour of dynamic move sorting heuristics.

If I did a uniform (random) selection of moves alpha-beta goes back toward minmax with a branching factor b^3/4 instead of b^1/2. So I was thinking whatever you do in the selection phase is equivalent to a move sorting heuristics -- and shouldn't lead to a new algorithm like going from alpha-beta to PVS, but maybe it does as you pointed out in Theorem 3. I need to digest this.

Daniel
Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Alpha-Beta as a rollouts algorithm

Post by Daniel Shawul »

I quickly tried this and search hangs but is definately due to a bug i have
Fixed the bug and now using beta_c for selection runs fine. However, it seems it doesn't improve things compared to what i had before ( i.e using the actual score of the node). Infact using beta_c seems to be slower than score_c.
Dann Corbit
Posts: 12797
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Alpha-Beta as a rollouts algorithm

Post by Dann Corbit »

Daniel Shawul wrote:
I quickly tried this and search hangs but is definately due to a bug i have
Fixed the bug and now using beta_c for selection runs fine. However, it seems it doesn't improve things compared to what i had before ( i.e using the actual score of the node). Infact using beta_c seems to be slower than score_c.
I guess the really interesting question is, "Why does score_c seem to work better?"
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Daniel Shawul
Posts: 4186
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Alpha-Beta as a rollouts algorithm

Post by Daniel Shawul »

I may have a bug that invalidates any comparison at this point. Actually, I am going back to no pruning, no aspiration version and try to work my way up again.

For those interested, there are more details in the full paper here.

Also if anyone want to experment with scorpio's implemenation and contribute, i could add you to the repo. Let's have fun with it.

Daniel
smatovic
Posts: 3359
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: Alpha-Beta as a rollouts algorithm

Post by smatovic »

Daniel Shawul wrote: Thu Jan 25, 2018 11:59 pm ...
Daniel, can you elaborate a bit what you do currently with Scorpio, I read here and there that you implemented all kinds of techniques, MCTS, AB NNUE, even Lc0 CNN batches via parallel AB, what did work best for you? Is it now AB Rollouts with NNUE?Thx.

--
Srdja