It seems PVS is described in Theorem 3. It is actually very elegant (more elegant than the standard implementation with zero window searches).The rollouts are still alpha-beta not scout (PVS) but we don't really need that.
Alpha-Beta as a rollouts algorithm
Moderator: Ras
- 
				Michel
- Posts: 2292
- Joined: Mon Sep 29, 2008 1:50 am
Re: Alpha-Beta as a rollouts algorithm
Ideas=science. Simplification=engineering. 
Without ideas there is nothing to simplify.
			
						Without ideas there is nothing to simplify.
- 
				Michel
- Posts: 2292
- Joined: Mon Sep 29, 2008 1:50 am
Re: Alpha-Beta as a rollouts algorithm
The PVS rollout mechanism always selects the move with the highest stored (mathematical) upper bound.
This is very similar to UCT where a move is chosen with the highest statistical upper bound.
This makes indeed possible to envision all kinds of interpolations between UCT and alpha/beta.
			
			
									
						
							This is very similar to UCT where a move is chosen with the highest statistical upper bound.
This makes indeed possible to envision all kinds of interpolations between UCT and alpha/beta.
Ideas=science. Simplification=engineering. 
Without ideas there is nothing to simplify.
			
						Without ideas there is nothing to simplify.
- 
				Daniel Shawul
- Posts: 4186
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: Alpha-Beta as a rollouts algorithm
I thought they were going to implement MT-SSS* indirectly with null windows searches just like Aske Plaat did i.e using the MT driver passing (beta-1,beta) bounds at the root shown in Algorithm 2. I guess since PVS ~ MT-SSS*, PVS is sort of implemented indirectly. I quickly tried this and search hangs but is definately due to a bug i have, because move sorting should not have mattered. Will come back later with this.Michel wrote:The PVS rollout mechanism always selects the move with the highest stored (mathematical) upper bound.
This is very similar to UCT where a move is chosen with the highest statistical upper bound.
This makes indeed possible to envision all kinds of interpolations between UCT and alpha/beta.
---
For PVS, I was thinking of a more direct approach where we do the following:
1) Check if a given node has atleast one sibling with closed window (i.e. alpha bound determined), if so use a nullwindow for the rollout of the move.
2) When we backup and find out the score fails high, do research with full alpha-beta window instead of closing the window. This will force more rollouts on this node and at the root till the correct score is found.
This direct approach will also be useful for re-searching LMR reduced moves when they fail high, which i think it would be difficult with the other approach.
---
Theorem 2 always chooses the left-most child and its behviour is static for the duration of a given iteration in an ID framework. My selection policy is a gready one and always chooses the child with the highest score -- that means it is dynamic in a given iteration. I think this is better since it incorporates behaviour of dynamic move sorting heuristics.
If I did a uniform (random) selection of moves alpha-beta goes back toward minmax with a branching factor b^3/4 instead of b^1/2. So I was thinking whatever you do in the selection phase is equivalent to a move sorting heuristics -- and shouldn't lead to a new algorithm like going from alpha-beta to PVS, but maybe it does as you pointed out in Theorem 3. I need to digest this.
Daniel
- 
				Daniel Shawul
- Posts: 4186
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: Alpha-Beta as a rollouts algorithm
Fixed the bug and now using beta_c for selection runs fine. However, it seems it doesn't improve things compared to what i had before ( i.e using the actual score of the node). Infact using beta_c seems to be slower than score_c.I quickly tried this and search hangs but is definately due to a bug i have
- 
				Dann Corbit
- Posts: 12797
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Alpha-Beta as a rollouts algorithm
I guess the really interesting question is, "Why does score_c seem to work better?"Daniel Shawul wrote:Fixed the bug and now using beta_c for selection runs fine. However, it seems it doesn't improve things compared to what i had before ( i.e using the actual score of the node). Infact using beta_c seems to be slower than score_c.I quickly tried this and search hangs but is definately due to a bug i have
Taking ideas is not a vice, it is a virtue.  We have another word for this.  It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
			
						But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
- 
				Daniel Shawul
- Posts: 4186
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: Alpha-Beta as a rollouts algorithm
I may have a bug that invalidates any comparison at this point. Actually, I am going back to no pruning, no aspiration version and try to work my way up again.
For those interested, there are more details in the full paper here.
Also if anyone want to experment with scorpio's implemenation and contribute, i could add you to the repo. Let's have fun with it.
Daniel
			
			
									
						
										
						For those interested, there are more details in the full paper here.
Also if anyone want to experment with scorpio's implemenation and contribute, i could add you to the repo. Let's have fun with it.
Daniel
- 
				smatovic
- Posts: 3359
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: Alpha-Beta as a rollouts algorithm
Daniel, can you elaborate a bit what you do currently with Scorpio, I read here and there that you implemented all kinds of techniques, MCTS, AB NNUE, even Lc0 CNN batches via parallel AB, what did work best for you? Is it now AB Rollouts with NNUE?Thx.
--
Srdja