Lc0 Blog Post - Transformer Progress (from CNN to Transformer)

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

smatovic
Posts: 3189
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Lc0 Blog Post - Transformer Progress (from CNN to Transformer)

Post by smatovic »

For those who missed it, Lc0 uses since 2022 a Transformer based neural network architecture:

Transformer Progress, Lc0 Blog, 2024-02-28
https://lczero.org/blog/2024/02/transformer-progress/

The CPW article in regard of CNN arch is completely outdated, maybe team Lc0 finds some time to give an update?

From the blog post:
Our strongest transformer model, BT4, is nearly 300 elo stronger in terms of raw policy than our strongest convolution-based model, T78, with fewer parameters and less computation. We’ve tested dozens of modifications to get our transformer architecture to where it is today.
This module, which we call “smolgen”, allows the model to play as if it were an additional 50% larger with only a 10% reduction in throughput.
Here is a short summary of our timeline of progress. BT1 was our first transformer model, performing roughly on par with T78, our strongest convolution-based model. BT2 improved on BT1 by adding smolgen and increasing head count. BT3 further improved on BT2 by increasing head count again and adding the new embedding layer. BT4 built on BT3 by doubling model size to push our architecture to the limit.
The future of Leela is bright. Early experiments with relative positional encodings show that our architecture still has plenty of room for improvement. Also, we’re finally having success with INT8 quantization, which could speed up our models by 50% without quality degradation.
Interesting, first CNN, then Transformers, who knows what's next.

Further papers:

Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
https://arxiv.org/abs/2406.00877

Attention Is All You Need
https://arxiv.org/abs/1706.03762

--
Srdja
Jouni
Posts: 3611
Joined: Wed Mar 08, 2006 8:15 pm
Full name: Jouni Uski

Re: Lc0 Blog Post - Transformer Progress (from CNN to Transformer)

Post by Jouni »

+270 Elo from T78 to BT4 :D :? . And position 5 in CCC. So CPU engines has gained more obviously.
Jouni
smatovic
Posts: 3189
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: Lc0 Blog Post - Transformer Progress (from CNN to Transformer)

Post by smatovic »

Jouni wrote: Tue Oct 29, 2024 6:02 pm +270 Elo from T78 to BT4 :D :? . And position 5 in CCC. So CPU engines has gained more obviously.
The Elo gain mentioned is for policy only (selectivity), w/o search.

--
Srdja
smatovic
Posts: 3189
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: Lc0 Blog Post - Transformer Progress (from CNN to Transformer)

Post by smatovic »

Just for the files, there is a paper on Lc0 with Transformers:
chaeronanaut wrote: Sat Feb 22, 2025 12:35 pm Additionally, it's worth reading the Lc0 paper: https://arxiv.org/abs/2409.12272
Mastering Chess with a Transformer Model
https://arxiv.org/abs/2409.12272

--
Srdja