Lc0 Blog Post - Transformer Progress (from CNN to Transformer)

smatovic · Post by **smatovic** » Tue Oct 29, 2024 3:40 am

For those who missed it, Lc0 uses since 2022 a Transformer based neural network architecture:

Transformer Progress, Lc0 Blog, 2024-02-28
https://lczero.org/blog/2024/02/transformer-progress/

The CPW article in regard of CNN arch is completely outdated, maybe team Lc0 finds some time to give an update?

From the blog post:

Our strongest transformer model, BT4, is nearly 300 elo stronger in terms of raw policy than our strongest convolution-based model, T78, with fewer parameters and less computation. We’ve tested dozens of modifications to get our transformer architecture to where it is today.

This module, which we call “smolgen”, allows the model to play as if it were an additional 50% larger with only a 10% reduction in throughput.

Here is a short summary of our timeline of progress. BT1 was our first transformer model, performing roughly on par with T78, our strongest convolution-based model. BT2 improved on BT1 by adding smolgen and increasing head count. BT3 further improved on BT2 by increasing head count again and adding the new embedding layer. BT4 built on BT3 by doubling model size to push our architecture to the limit.

The future of Leela is bright. Early experiments with relative positional encodings show that our architecture still has plenty of room for improvement. Also, we’re finally having success with INT8 quantization, which could speed up our models by 50% without quality degradation.

Interesting, first CNN, then Transformers, who knows what's next.

Further papers:

Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
https://arxiv.org/abs/2406.00877

Attention Is All You Need
https://arxiv.org/abs/1706.03762

--
Srdja

Jouni · Post by **Jouni** » Tue Oct 29, 2024 6:02 pm

+270 Elo from T78 to BT4

. And position 5 in CCC. So CPU engines has gained more obviously.

smatovic · Post by **smatovic** » Tue Oct 29, 2024 6:19 pm

Jouni wrote: ↑Tue Oct 29, 2024 6:02 pm +270 Elo from T78 to BT4 . And position 5 in CCC. So CPU engines has gained more obviously.

The Elo gain mentioned is for policy only (selectivity), w/o search.

--
Srdja

smatovic · Post by **smatovic** » Fri May 02, 2025 4:48 pm

Just for the files, there is a paper on Lc0 with Transformers:

chaeronanaut wrote: ↑Sat Feb 22, 2025 12:35 pm Additionally, it's worth reading the Lc0 paper: https://arxiv.org/abs/2409.12272

Mastering Chess with a Transformer Model
https://arxiv.org/abs/2409.12272

--
Srdja

Lc0 Blog Post - Transformer Progress (from CNN to Transformer)

Lc0 Blog Post - Transformer Progress (from CNN to Transformer)

Re: Lc0 Blog Post - Transformer Progress (from CNN to Transformer)

Re: Lc0 Blog Post - Transformer Progress (from CNN to Transformer)

Re: Lc0 Blog Post - Transformer Progress (from CNN to Transformer)