uct algorithm bad at chess?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

OfekShochat
Posts: 50
Joined: Thu Oct 15, 2020 10:19 am
Full name: ghostway

uct algorithm bad at chess?

Post by OfekShochat »

I implemented the UCT algorithm with the ucb1 policy. But it seems that it is awful (choosing terrible moves with high depth).
Is that a bug in my implementation, or what?
thank you
Joerg Oster
Posts: 939
Joined: Fri Mar 10, 2006 4:29 pm
Location: Germany
Full name: Jörg Oster

Re: uct algorithm bad at chess?

Post by Joerg Oster »

OfekShochat wrote: Wed Oct 21, 2020 9:20 pm I implemented the UCT algorithm with the ucb1 policy. But it seems that it is awful (choosing terrible moves with high depth).
Is that a bug in my implementation, or what?
thank you
Some weeks ago I put a very basic (and very crude and hackish) UCT search into Stockfish,
also with the standard UCB1 policy.

Here is what I get from the start position:

Code: Select all

info string classical evaluation enabled.

 +---+---+---+---+---+---+---+---+
 | r | n | b | q | k | b | n | r | 8
 +---+---+---+---+---+---+---+---+
 | p | p | p | p | p | p | p | p | 7
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   | 6
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   | 5
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   | 4
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   | 3
 +---+---+---+---+---+---+---+---+
 | P | P | P | P | P | P | P | P | 2
 +---+---+---+---+---+---+---+---+
 | R | N | B | Q | K | B | N | R | 1
 +---+---+---+---+---+---+---+---+
   a   b   c   d   e   f   g   h

Fen: rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
Key: 8F8F01D4562F59FB
Checkers: 
info depth 0 seldepth 12 score cp 33 nodes 733806 nps 1571319 time 467 pv d2d4
info depth 0 seldepth 13 score cp 28 nodes 1527574 nps 1463193 time 1044 pv d2d4
info depth 0 seldepth 17 score cp 27 nodes 2331249 nps 1454303 time 1603 pv d2d4
info depth 0 seldepth 17 score cp 32 nodes 3167371 nps 1359386 time 2330 pv d2d4
info depth 0 seldepth 17 score cp 30 nodes 4016911 nps 1321352 time 3040 pv d2d4
info depth 0 seldepth 23 score cp 28 nodes 4987661 nps 1251294 time 3986 pv d2d4
info depth 0 seldepth 23 score cp 26 nodes 5843996 nps 1208685 time 4835 pv d2d4
info depth 0 seldepth 23 score cp 28 nodes 6672750 nps 1216988 time 5483 pv d2d4
info depth 0 seldepth 23 score cp 26 nodes 7514024 nps 1221395 time 6152 pv d2d4
info depth 0 seldepth 23 score cp 25 nodes 8259959 nps 1240420 time 6659 pv d2d4

Number of legal moves: 20
Root move:   a2a3     Visits:       23     Q:    -52 (cp -25)
Root move:   b2b3     Visits:       27     Q:    -41 (cp -19)
Root move:   c2c3     Visits:       29     Q:    -36 (cp -17)
Root move:   d2d3     Visits:       52     Q:    -15 (cp -7)
Root move:   e2e3     Visits:     1791     Q:     43 (cp 20)
Root move:   f2f3     Visits:       19     Q:    -62 (cp -29)
Root move:   g2g3     Visits:       28     Q:    -41 (cp -19)
Root move:   h2h3     Visits:       23     Q:    -52 (cp -25)
Root move:   a2a4     Visits:       26     Q:    -48 (cp -23)
Root move:   b2b4     Visits:       25     Q:    -47 (cp -22)
Root move:   c2c4     Visits:       30     Q:    -36 (cp -17)
Root move:   d2d4     Visits:   157315     Q:     54 (cp 25)
Root move:   e2e4     Visits:    26515     Q:     52 (cp 25)
Root move:   f2f4     Visits:       22     Q:    -55 (cp -26)
Root move:   g2g4     Visits:       25     Q:    -45 (cp -21)
Root move:   h2h4     Visits:       31     Q:    -37 (cp -17)
Root move:   b1a3     Visits:       29     Q:    -38 (cp -18)
Root move:   b1c3     Visits:      109     Q:      8 (cp 3)
Root move:   g1f3     Visits:    13854     Q:     66 (cp 31)
Root move:   g1h3     Visits:       27     Q:    -41 (cp -19)
bestmove d2d4
The search time is currently very limited and no pv by now.
Jörg Oster
OfekShochat
Posts: 50
Joined: Thu Oct 15, 2020 10:19 am
Full name: ghostway

Re: no pv line

Post by OfekShochat »

You can do the PV line with picking the one with the most visits (or however you choose the best move) and do for the second depth node from the best move in the first depth. The same for all depths.
My nps is in the thousands. Do you think that is that it?
Joerg Oster
Posts: 939
Joined: Fri Mar 10, 2006 4:29 pm
Location: Germany
Full name: Jörg Oster

Re: no pv line

Post by Joerg Oster »

OfekShochat wrote: Thu Oct 22, 2020 11:47 am You can do the PV line with picking the one with the most visits (or however you choose the best move) and do for the second depth node from the best move in the first depth. The same for all depths
Thank you.
Yes, I pick the move with the most visits as best move.

But like I said this is very basic at the moment.
I plan to work on it in the coming weeks.

Did you try different values for C?
Jörg Oster
OfekShochat
Posts: 50
Joined: Thu Oct 15, 2020 10:19 am
Full name: ghostway

Re: uct algorithm bad at chess?

Post by OfekShochat »

Yeah, what is yours set at? I tried 2, 1.5, 4, 0.5
And for example, at depth 10 its getting a2a3
Joerg Oster
Posts: 939
Joined: Fri Mar 10, 2006 4:29 pm
Location: Germany
Full name: Jörg Oster

Re: uct algorithm bad at chess?

Post by Joerg Oster »

OfekShochat wrote: Thu Oct 22, 2020 11:59 am Yeah, what is yours set at? I tried 2, 1.5, 4, 0.5
In the example given it was sqrt(2) if I'm not mistaken.
I'm not sure one value will fit all positions, though ... :D
Jörg Oster
OfekShochat
Posts: 50
Joined: Thu Oct 15, 2020 10:19 am
Full name: ghostway

Re: uct algorithm bad at chess?

Post by OfekShochat »

Surely.
Madeleine Birchfield
Posts: 512
Joined: Tue Sep 29, 2020 4:29 pm
Location: Dublin, Ireland
Full name: Madeleine Birchfield

Re: uct algorithm bad at chess?

Post by Madeleine Birchfield »

Joerg Oster wrote: Thu Oct 22, 2020 11:09 am
OfekShochat wrote: Wed Oct 21, 2020 9:20 pm I implemented the UCT algorithm with the ucb1 policy. But it seems that it is awful (choosing terrible moves with high depth).
Is that a bug in my implementation, or what?
thank you
Some weeks ago I put a very basic (and very crude and hackish) UCT search into Stockfish,
also with the standard UCB1 policy.

Here is what I get from the start position:

Code: Select all

info string classical evaluation enabled.

 +---+---+---+---+---+---+---+---+
 | r | n | b | q | k | b | n | r | 8
 +---+---+---+---+---+---+---+---+
 | p | p | p | p | p | p | p | p | 7
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   | 6
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   | 5
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   | 4
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   | 3
 +---+---+---+---+---+---+---+---+
 | P | P | P | P | P | P | P | P | 2
 +---+---+---+---+---+---+---+---+
 | R | N | B | Q | K | B | N | R | 1
 +---+---+---+---+---+---+---+---+
   a   b   c   d   e   f   g   h

Fen: rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
Key: 8F8F01D4562F59FB
Checkers: 
info depth 0 seldepth 12 score cp 33 nodes 733806 nps 1571319 time 467 pv d2d4
info depth 0 seldepth 13 score cp 28 nodes 1527574 nps 1463193 time 1044 pv d2d4
info depth 0 seldepth 17 score cp 27 nodes 2331249 nps 1454303 time 1603 pv d2d4
info depth 0 seldepth 17 score cp 32 nodes 3167371 nps 1359386 time 2330 pv d2d4
info depth 0 seldepth 17 score cp 30 nodes 4016911 nps 1321352 time 3040 pv d2d4
info depth 0 seldepth 23 score cp 28 nodes 4987661 nps 1251294 time 3986 pv d2d4
info depth 0 seldepth 23 score cp 26 nodes 5843996 nps 1208685 time 4835 pv d2d4
info depth 0 seldepth 23 score cp 28 nodes 6672750 nps 1216988 time 5483 pv d2d4
info depth 0 seldepth 23 score cp 26 nodes 7514024 nps 1221395 time 6152 pv d2d4
info depth 0 seldepth 23 score cp 25 nodes 8259959 nps 1240420 time 6659 pv d2d4

Number of legal moves: 20
Root move:   a2a3     Visits:       23     Q:    -52 (cp -25)
Root move:   b2b3     Visits:       27     Q:    -41 (cp -19)
Root move:   c2c3     Visits:       29     Q:    -36 (cp -17)
Root move:   d2d3     Visits:       52     Q:    -15 (cp -7)
Root move:   e2e3     Visits:     1791     Q:     43 (cp 20)
Root move:   f2f3     Visits:       19     Q:    -62 (cp -29)
Root move:   g2g3     Visits:       28     Q:    -41 (cp -19)
Root move:   h2h3     Visits:       23     Q:    -52 (cp -25)
Root move:   a2a4     Visits:       26     Q:    -48 (cp -23)
Root move:   b2b4     Visits:       25     Q:    -47 (cp -22)
Root move:   c2c4     Visits:       30     Q:    -36 (cp -17)
Root move:   d2d4     Visits:   157315     Q:     54 (cp 25)
Root move:   e2e4     Visits:    26515     Q:     52 (cp 25)
Root move:   f2f4     Visits:       22     Q:    -55 (cp -26)
Root move:   g2g4     Visits:       25     Q:    -45 (cp -21)
Root move:   h2h4     Visits:       31     Q:    -37 (cp -17)
Root move:   b1a3     Visits:       29     Q:    -38 (cp -18)
Root move:   b1c3     Visits:      109     Q:      8 (cp 3)
Root move:   g1f3     Visits:    13854     Q:     66 (cp 31)
Root move:   g1h3     Visits:       27     Q:    -41 (cp -19)
bestmove d2d4
The search time is currently very limited and no pv by now.
Was this before or after Stockfish implemented NNUE?

Edit: never mind, I see the classical evaluation enabled.

I wonder how well Stockfish would do with NNUE enabled.