AlphaZero

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Fafkorn
Posts: 16
Joined: Tue Apr 14, 2020 11:15 am
Full name: Pawel Wojcik

AlphaZero

Post by Fafkorn » Sun Apr 26, 2020 5:14 pm

Hello, Im trying to develop my own Alpha Zero engine. In the original paper authors are using board representation with information about previous moves. My goal is to achieve engine about 2000 ELO, I don't have too much expectations. Using standard 773 bits board representation (12x8x8 + castling + move) would work? Do I need pretrain that model? Do I have to use convolutional layers? Thanks in advance.

brianr
Posts: 443
Joined: Thu Mar 09, 2006 2:01 pm

Re: AlphaZero

Post by brianr » Sun Apr 26, 2020 8:27 pm

Suggest reviewing this first:

https://github.com/Zeta36/chess-alpha-zero

or here for a simpler game like Connect4:

https://github.com/suragnair/alpha-zero-general

Fafkorn
Posts: 16
Joined: Tue Apr 14, 2020 11:15 am
Full name: Pawel Wojcik

Re: AlphaZero

Post by Fafkorn » Mon Apr 27, 2020 5:26 pm

I'm on the stage where I implemented MCTS and I have some random NN with few layers just to test working.
Zeta36 is using input of size (18, 8, 8)
18 - 12 piece type, 4 castling rights, 1 for fifty moves rule, 1 for en-passant

Information about side to move isn't necessary?
Author is using many many layers, doesn't it affect on the performance?
So many bits used for castling, en passant and fifty moves rules are used only because of normalization, right?

brianr
Posts: 443
Joined: Thu Mar 09, 2006 2:01 pm

Re: AlphaZero

Post by brianr » Mon Apr 27, 2020 7:07 pm

Sometimes the answer is simply because that's what Alpha Zero did...

Experiment and see what works for you after you have established a working baseline.

Have fun.

Fafkorn
Posts: 16
Joined: Tue Apr 14, 2020 11:15 am
Full name: Pawel Wojcik

Re: AlphaZero

Post by Fafkorn » Mon Apr 27, 2020 7:57 pm

Pretraining that author executed is based on grandmaster games. Policy output was hot-one for move chosen by GM?

supersharp77
Posts: 903
Joined: Sat Jul 05, 2014 5:54 am
Location: Southwest USA

Re: AlphaZero

Post by supersharp77 » Mon Apr 27, 2020 8:51 pm

Fafkorn wrote:
Mon Apr 27, 2020 7:57 pm
Pretraining that author executed is based on grandmaster games. Policy output was hot-one for move chosen by GM?
NN Policy Output....

https://www.reddit.com/r/reinforcementl ... tput_in_a/

https://towardsdatascience.com/policy-n ... 2776056ad2

http://web.mst.edu/~gosavia/neural_networks_RL.pdf

https://www.researchgate.net/publicatio ... y_Gradient

https://people.eecs.berkeley.edu/~svlev ... mfcgps.pdf

https://flyyufelix.github.io/2017/10/12/dqn-vs-pg.html

All the top GM (and master games) should be integrated into the learning process..and if successful engine strength should be at a minimum 2500+ (See Giraffe chess engine)....research models are a bit complex....good luck! AR :D :wink:

smcracraft
Posts: 714
Joined: Wed Mar 08, 2006 7:08 pm
Location: Orange County California
Full name: Stuart Cracraft
Contact:

Re: AlphaZero

Post by smcracraft » Tue Apr 28, 2020 12:56 am

Skip it.

Stand up lczero.org and enjoy.

No need to reinvent the wheel.

Leela later will be more Tal-like once they fix it.

Stuart
https://youtu.be/3A1hDPRmoyc

User avatar
phhnguyen
Posts: 847
Joined: Wed Apr 21, 2010 2:58 am
Location: Australia
Full name: Nguyen Hong Pham
Contact:

Re: AlphaZero

Post by phhnguyen » Tue Apr 28, 2020 2:45 am

smcracraft wrote:
Tue Apr 28, 2020 12:56 am
Skip it.

Stand up lczero.org and enjoy.

No need to reinvent the wheel.
Do you want to divide the computer chess world into only two groups: lc0 clone and traditional alpha-beta? :wink: :D
https://banksiagui.com
A freeware chess GUI, based on opensource Banksia - the chess tournament manager

Fafkorn
Posts: 16
Joined: Tue Apr 14, 2020 11:15 am
Full name: Pawel Wojcik

Re: AlphaZero

Post by Fafkorn » Tue Apr 28, 2020 9:42 am

smcracraft wrote:
Tue Apr 28, 2020 12:56 am
Skip it.

Stand up lczero.org and enjoy.

No need to reinvent the wheel.

Leela later will be more Tal-like once they fix it.

Stuart
https://youtu.be/3A1hDPRmoyc
I'm not trying to compete with LeelaChessZero or AlphaZero. I'm trying to develop my own ALphaZero for academic purpose (my Thesis). I just want to dispel some doubts.

Fafkorn
Posts: 16
Joined: Tue Apr 14, 2020 11:15 am
Full name: Pawel Wojcik

Re: AlphaZero

Post by Fafkorn » Tue Apr 28, 2020 4:42 pm

I have another question according to this topic. When I get my policy from neural network (in my case 4096 numbers), let's say only 3 moves are legal with policy values 0.1, 0.1, 0.1. This affects on proportions postion reward to visits value in my U(s, a). Do I have to normalize policy values and how can I do this?

Post Reply