Further testing Leela-0.32

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
Rebel
Posts: 7414
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Further testing Leela-0.32

Post by Rebel »

Since the surprising victory of Leela-0.32 with BT3 network over Stockfish 17.1 (with 52.6%) Leela should play against the rest of the top-12 engines also else it would be unfair to Stockfish. Meaning, playing the needed 9 x 3000 = 27.000 games. Considering each 3000 match takes about 2-2½ days, this will take 3-4 weeks to finish.

First victim ;) is Reckless-0.8 (third in the list), progress can be followed here, currently 1005 / 3000 are played.

Code: Select all

Results from file leela-BT3-reckless.pgn:

No. Name            Win Draw Loss Unf.  Score Games       %
-----------------------------------------------------------
  1 Leela-0.32-BT3 +273 =676  -56   *0  611.0  1005   60.8%
  2 Reckless-0.8    +56 =676 -273   *0  394.0  1005   39.2%

Total Games:    1005
White Wins:      291 (29.0%)
Black Wins:       38 (3.8%)
Draws:           676 (67.3%)
90% of coding is debugging, the other 10% is writing bugs.
Jouni
Posts: 3723
Joined: Wed Mar 08, 2006 8:15 pm
Full name: Jouni Uski

Re: Further testing Leela-0.32

Post by Jouni »

You have no info about conditions. 1 core for engines? Leela has zero chance vs SF at 16 cores.
Jouni
User avatar
Rebel
Posts: 7414
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: Further testing Leela-0.32

Post by Rebel »

Jouni wrote: Sat Nov 15, 2025 8:55 pm You have no info about conditions. 1 core for engines? Leela has zero chance vs SF at 16 cores.
:?:

The STC is about one core, likewise for Leela.

I don't have 16 GPU's, just one.

It seems to me Leela testing (with the exception of Stefan Pohl) is a neglected engine because it, depending on your hardware, and in my case, it takes 16 times longer to test it. Ever seen Leela tested with more than 1000 games? I will treat Leela the same as other engines, meaning 30.000 games.
90% of coding is debugging, the other 10% is writing bugs.
Jouni
Posts: 3723
Joined: Wed Mar 08, 2006 8:15 pm
Full name: Jouni Uski

Re: Further testing Leela-0.32

Post by Jouni »

RTX 4080 super has 10240 GPU cores :lol: .
Jouni
User avatar
Rebel
Posts: 7414
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: Further testing Leela-0.32

Post by Rebel »

Jouni wrote: Sun Nov 16, 2025 11:08 am RTX 4080 super has 10240 GPU cores :lol: .
:D

Google : tcec-chess.com hardware requirements

The TCEC wiki states that the latest hardware includes dual AMD EPYC 9175F CPUs, 8x NVIDIA RTX 5090 GPUs, and 768GiB of RAM. However, TCEC uses different hardware for different leagues and seasons, with past hardware including older server-grade CPUs like Intel Xeons and GPUs like the RTX 2080 Ti. The hardware requirements depend on the specific league, as CPU-only divisions will use high-core-count CPUs, while GPU divisions rely on powerful graphics cards.
Current hardware (as of late 2025)

CPUs: 2x AMD EPYC 9175F (32 cores/64 threads each)
GPUs: 8x NVIDIA GeForce RTX 5090
RAM: 768GiB DDR5 (24x 32GB)
Storage: 1TiB NVMe SSD + 14TiB NVMe Syzygy3-7 + 14TiB HDD for Syzygy7 DTZ
Operating System: Ubuntu 22.04

Past hardware examples

Season 28: Dual 64-core Intel Xeon E5-2699 v4 CPUs and 4x Nvidia GeForce RTX 3090 GPUs were used.
Season 26: Superfinal was played on a 52-core Xeon 6230R system.
Season 18: 4x Intel Xeon 4xE5-4669v4 CPUs (88 physical/176 threads total) with 128GB RAM were used for the CPU-only league.
Season 17: Included 4x NVIDIA RTX 2080 ti GPUs and 2x Intel Xeon E5-2630V4 CPUs.


Who has the better hardware, CPU engines or Leela ?

Good luck with that :!:
90% of coding is debugging, the other 10% is writing bugs.
User avatar
Rebel
Posts: 7414
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: Further testing Leela-0.32

Post by Rebel »

Code: Select all

No. Name            Win  Draw Loss  Score Games    %
------------------------------------------------------
  1 Leela-0.32-BT3 +736 =2071 -193 1771.5  3000  59.0%
  2 Reckless-0.8   +193 =2071 -736 1228.5  3000  41.0%

Total Games:    3000
White Wins:      822 (27.4%)
Black Wins:      107 (3.6%)
Draws:          2071 (69.0%)
Interesting, half way Leela had 62%, nice come back of Reckless.

Next,  Leela-0.32-BT3 - PlentyChess-7.0, 3000 games.
90% of coding is debugging, the other 10% is writing bugs.
Modern Times
Posts: 3767
Joined: Thu Jun 07, 2012 11:02 pm

Re: Further testing Leela-0.32

Post by Modern Times »

Rebel wrote: Sun Nov 16, 2025 9:55 am
It seems to me Leela testing (with the exception of Stefan Pohl) is a neglected engine because it, depending on your hardware, and in my case, it takes 16 times longer to test it.
Exactly right. No concurrency with Leela.
User avatar
Rebel
Posts: 7414
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: Further testing Leela-0.32

Post by Rebel »

Results from file leela-BT3-PlentyChess.pgn:

Code: Select all

No. Name             Win  Draw Loss  Score Games    %
-------------------------------------------------------
  1 Leela-0.32-BT3  +908 =1922 -170 1869.0  3000  62.3%
  2 PlentyChess-7.0 +170 =1922 -908 1131.0  3000  37.7%

Total Games:    3000
White Wins:      935 (31.2%)
Black Wins:      143 (4.8%)
Draws:          1922 (64.1%)
Provisional STC rating list :

Code: Select all

   # PLAYER                   :  RATING  ERROR   POINTS  PLAYED     W      D     L  D(%)
   1 Leela-0.32-BT3           :  3816.3    4.8   5223.5    9000  2164   6119   717    68
   2 Stockfish-TI             :  3794.1    4.3  18987.5   30000  9635  18705  1660    62
   3 Stockfish-17.1           :  3786.7    2.7  20089.5   33000  9628  20923  2449    63
   4 Reckless-0.8             :  3742.7    3.7  19339.5   36000  7292  24095  4613    67
   5 PlentyChess-7.0          :  3737.8    4.0  19089.5   36000  6935  24309  4756    68
   6 Obsidian-16              :  3731.8    4.6  17616.0   33000  6499  22234  4267    67
   7 Stockfish-15             :  3718.8    3.7  17001.0   33000  6020  21962  5018    67
   8 Alexandria-8.1.2         :  3703.3    4.9  16264.0   33000  5364  21800  5836    66
   9 Caissa-1.23              :  3671.3    5.0  14751.5   33000  3959  21585  7456    65
  10 Viridithas-18.0.0        :  3665.0    1.8  12837.5   30000  3294  19087  7619    64
  11 Clover-9.1               :  3661.5    4.5  14289.0   33000  3375  21828  7797    66
  12 Berserk-13               :  3650.4    5.4  13772.5   33000  3455  20635  8910    63
  13 Horsie-1.1               :  3638.0    9.0   1294.0    2200   616   1356   228    62
  14 Integral-v7              :  3636.4    3.7  13028.0   32200  3084  19888  9228    62
  15 RubiChess-20240817       :  3612.2    9.0   1213.0    2200   535   1356   309    62
  16 Lizard-11.2              :  3590.3    7.6   1143.5    2200   441   1405   354    64
  17 Starzix-6.1              :  3588.9    7.2   1139.0    2200   464   1350   386    61
  18 Fritz-20                 :  3580.8    8.0   1113.0    2200   413   1400   387    64
  19 Chess-System-Tal-2.00    :  3577.8    5.4   1103.5    2200   403   1401   396    64
  20 Halogen-15.17            :  3568.8    4.4   1075.0    2200   407   1336   457    61
  21 Revenge-4.0              :  3560.4    8.1   1048.0    2200   354   1388   458    63
  22 Seer-2.8.0               :  3529.7    7.0    951.0    2200   284   1334   582    61
  23 Titan-1.1                :  3524.2    9.7    933.5    2200   289   1289   622    59
  24 Booot-7.4                :  3512.6    5.5    897.5    2200   253   1289   658    59
Next, Leela -  Obsidian-16, 3000 games.
90% of coding is debugging, the other 10% is writing bugs.