One million games by LeelaOddsBots

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

lkaufman
Posts: 6279
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

One million games by LeelaOddsBots

Post by lkaufman »

On Dec. 6, the five LeelaOddsBots passed the one million mark in total games played on LiChess! 698k by LeelaQueenOdds, 105k by LeelaPieceOdds (for two or more pieces removed), 95k by LeelaRookOdds, and 89k by LeelaKnightOdds, plus 14k by LeelaQueenForKnight. Most were played this year. Coincidentally, this morning LeelaQueenOdds was updated to use a new net, based on the larger BT4 net instead of the smaller T3 net. Testing shows about a fifty elo gain for this vs. bots. Early results vs humans seem to confirm a similar (or larger) gain. It will take some time to determine whether this net is also better for other odds.
Komodo rules!
jkominek
Posts: 106
Joined: Tue Sep 04, 2018 5:33 am
Full name: John Kominek

Re: One million games by LeelaOddsBots

Post by jkominek »

Can you share any of the behind-the-scenes details of the OddsBots initiative? For example who is the team doing the nets retraining, how are they approaching the problem, e.g. how much data is used during adaptation and how is it generated? What is the testing gauntlet prior to release?

Topping off, what roles do you play in the project? Public outreach is one, obviously. (Like with Komodo.)
lkaufman
Posts: 6279
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: One million games by LeelaOddsBots

Post by lkaufman »

jkominek wrote: Sun Dec 07, 2025 10:43 pm Can you share any of the behind-the-scenes details of the OddsBots initiative? For example who is the team doing the nets retraining, how are they approaching the problem, e.g. how much data is used during adaptation and how is it generated? What is the testing gauntlet prior to release?

Topping off, what roles do you play in the project? Public outreach is one, obviously. (Like with Komodo.)
Most of the people on the project are only known by their handles on either LiChess or LeelaDiscord, I don't know which of them would like their actual names made public. It got started by "Naphthalin" on the Leela team, who suggested that using Contempt would make Leela play much better when giving odds, and when I signed on and offered to help in various ways it became a reality. For the first year it was just this, Leela with Contempt, but then people started training nets for specific odds which turned out to be much stronger than just Leela with Contempt. First "Marcus87" trained nets for knight odds (later making a combined knight odds/rook odds net), which is still in use in the bots to this day, a year later! Then "Noah" made a very good queen odds net which we used until it was supplanted by a stronger net by "naca fox" ("ira_jed" on lichess) in August, who also made the net currently used for two knights handicap. Meanwhile "amjsh" invented "search contempt" which was implemented in late Feb. of this year, which provided a major boost in strength. Later "Lucario6607" improved this further with variable search contempt, and quite recently "Borg" implemented my suggestion of variable max nodes. Also "Menkib" implemented Syzygy tablebases for us. Others, including "johnsp" and "Tiiber" have been involved recently, especially with the "LeelaPieceOddsFRC bot created and run by "Naphthalin". My role in the project consists of 1. Writing and updating the opening book for the odds 2. Parameter tuning for each new net and for each handicap 3. Running the bots on my home computers (which requires restarts after every change, or power outage, or windows update...). 4. I ran many of the games needed on which to train the nets. 5. Testing - I have to determine which new nets or search changes actually are worth using in the bots. This is primarily done by playing against "human-like" nets of suitable strength that were themselves trained on human games. Regarding the actual training, I make suggestions and contribute to the cost of renting machines for this when needed (often we can train with our own hardware, depending on net size etc.), but I don't know enough to train nets myself. Game generation for testing is typically between some normal Leela net (sometimes with Contempt) and some human-like net or nets. Anywhere from 100k to 400 or 500k games have been used for various nets. Starting with an existing strong net is what keeps the needed number of games to manageable numbers; starting from scratch would take millions of games, though it might produce even stronger nets eventually.
Komodo rules!
lucario6607
Posts: 36
Joined: Sun May 19, 2024 5:44 am
Full name: Kolby Mcgowan

Re: One million games by LeelaOddsBots

Post by lucario6607 »

jkominek wrote: Sun Dec 07, 2025 10:43 pm Can you share any of the behind-the-scenes details of the OddsBots initiative? For example who is the team doing the nets retraining, how are they approaching the problem, e.g. how much data is used during adaptation and how is it generated? What is the testing gauntlet prior to release?

Topping off, what roles do you play in the project? Public outreach is one, obviously. (Like with Komodo.)
I have generated games for queen odds, nn odds, and rook odds. it used to be 100k games but now we do upwards of 400k. In game gen you have the odds net or any leela net play down a piece against a human like net, these should never switch colors. This allows the net to learn odds play for one color and human play for the other color so in the search it can look for moves that humans likely miss. with search contempt it freezes the policy for odd depth nodes so the search now assumes the human will always be that weak. Recently this has been changed to where the node budget is split between puct doing a regular search to prevent most blunders caused by the low search contempt limit and thompson sampling that continues to sample from the frozen policy. As the search continues the ratio of thompson sampling to puct increases. The search eventually refutes itself and stops trying for tricks and stuff so that is something we want to try and improve. 1 idea is the amcts used against katago where the human net would perform a search for each of its nodes in the search tree so it wouldn't have access to all the information o the search up to that point. I welcome Peter to correct me on how Leela's search works if he would so kindly do that.
jkominek
Posts: 106
Joined: Tue Sep 04, 2018 5:33 am
Full name: John Kominek

Re: One million games by LeelaOddsBots

Post by jkominek »

Thank you both for the explanations. This is perhaps the most information that has leaked outside of discord.

Training Leela to play like humans, as a training stepping-stone, is an interesting problem in itself. For that aspect did you take an approach similar to the U of T McIlroy-Young paper?
lkaufman
Posts: 6279
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: One million games by LeelaOddsBots

Post by lkaufman »

jkominek wrote: Mon Dec 08, 2025 12:27 pm Thank you both for the explanations. This is perhaps the most information that has leaked outside of discord.

Training Leela to play like humans, as a training stepping-stone, is an interesting problem in itself. For that aspect did you take an approach similar to the U of T McIlroy-Young paper?
I wasn't involved in that aspect, we initially used nets by others like "Maia 1900" "Maia 2200" "Elite 1" and "Elite 2", then some of our group made their own such nets. You can look up the above for details.
Komodo rules!
User avatar
M ANSARI
Posts: 3734
Joined: Thu Mar 16, 2006 7:10 pm

Re: One million games by LeelaOddsBots

Post by M ANSARI »

Thank you for all your efforts in this. I personally have not noticed that the Bot is 50 elo stronger, but I still feel that sometimes it plays much weaker than usual. Maybe when there are many players simultaneously it loses strength. Hard to believe that the AI engine down a queen can be such a challenge ... but it is. I just checked my games and I have about a 1400 games with approximately 20% win rate. In my defense many of those games were when I was trying faster time controls like 1+2 or even 2+3. I have now given up on those time controls as I feel at my age I just suck at fast time controls and the bot has an uncanny ability to throw in a bunch of moves really fast as time is down to seconds, that need accurate calculation. I am especially surprised how easily I can fall for stalemate tricks in totally winning endgames when you put your guard down ... I am happy to say in this regard I have improved dramatically and am much more aware of stalemate tricks.

I will say that I feel this bot certainly improved my chess knowledge and chess play. I tend to appreciate much more how material is not as important as active pieces. I mean you hear about it and read about it all the time ... but here you really get to see how powerful that is. Also that a pawn storm can be incredibly dangerous even when the other side doesn't have a queen! You need to sacrifice a few defensive moves to defuse this. It also teaches you that sometimes you need to give up material to simplify the game into an easily won endgame. So really kudos to all the people in this project, you really have contributed dramatically to chess!!!
jkominek
Posts: 106
Joined: Tue Sep 04, 2018 5:33 am
Full name: John Kominek

Re: One million games by LeelaOddsBots

Post by jkominek »

lkaufman wrote: Sun Dec 07, 2025 11:34 pm
jkominek wrote: Sun Dec 07, 2025 10:43 pm Can you share any of the behind-the-scenes details of the OddsBots initiative? For example who is the team doing the nets retraining, how are they approaching the problem, e.g. how much data is used during adaptation and how is it generated? What is the testing gauntlet prior to release?

Topping off, what roles do you play in the project? Public outreach is one, obviously. (Like with Komodo.)
Most of the people on the project are only known by their handles on either LiChess or LeelaDiscord, I don't know which of them would like their actual names made public. It got started by "Naphthalin" on the Leela team, who suggested that using Contempt would make Leela play much better when giving odds, and when I signed on and offered to help in various ways it became a reality. For the first year it was just this, Leela with Contempt, but then people started training nets for specific odds which turned out to be much stronger than just Leela with Contempt. First "Marcus87" trained nets for knight odds (later making a combined knight odds/rook odds net), which is still in use in the bots to this day, a year later! Then "Noah" made a very good queen odds net which we used until it was supplanted by a stronger net by "naca fox" ("ira_jed" on lichess) in August, who also made the net currently used for two knights handicap. Meanwhile "amjsh" invented "search contempt" which was implemented in late Feb. of this year, which provided a major boost in strength. Later "Lucario6607" improved this further with variable search contempt, and quite recently "Borg" implemented my suggestion of variable max nodes. Also "Menkib" implemented Syzygy tablebases for us. Others, including "johnsp" and "Tiiber" have been involved recently, especially with the "LeelaPieceOddsFRC bot created and run by "Naphthalin". My role in the project consists of 1. Writing and updating the opening book for the odds 2. Parameter tuning for each new net and for each handicap 3. Running the bots on my home computers (which requires restarts after every change, or power outage, or windows update...). 4. I ran many of the games needed on which to train the nets. 5. Testing - I have to determine which new nets or search changes actually are worth using in the bots. This is primarily done by playing against "human-like" nets of suitable strength that were themselves trained on human games. Regarding the actual training, I make suggestions and contribute to the cost of renting machines for this when needed (often we can train with our own hardware, depending on net size etc.), but I don't know enough to train nets myself. Game generation for testing is typically between some normal Leela net (sometimes with Contempt) and some human-like net or nets. Anywhere from 100k to 400 or 500k games have been used for various nets. Starting with an existing strong net is what keeps the needed number of games to manageable numbers; starting from scratch would take millions of games, though it might produce even stronger nets eventually.
To keep track of credits I found it helpful to write down a crib sheet of contributors. Anyone missing?
  • o Knight Odds: Marcus87
    o Knight+Rook Odds: Marcus87
    o Two Knights Odds: naca fox
    o Queen Odds: Noah -> naca fox ("ira_jed" on lichess)
    o FRC Piece Odds: Naphthalin, johnsp, Tiiber
    o Simple contempt: Naphthalin
    o Variable Search contempt: amjsh, Lucario6607, Borg
    o Syzygy Implementation: Menkib
    o Game generation: lkaufman, lucario6607
What I had not appreciated is that the Odds Bots are hosted by developer volunteers. Fitting I guess for the community-supported nature of Lichess.
lkaufman
Posts: 6279
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: One million games by LeelaOddsBots

Post by lkaufman »

jkominek wrote: Wed Dec 10, 2025 10:27 am
lkaufman wrote: Sun Dec 07, 2025 11:34 pm
jkominek wrote: Sun Dec 07, 2025 10:43 pm Can you share any of the behind-the-scenes details of the OddsBots initiative? For example who is the team doing the nets retraining, how are they approaching the problem, e.g. how much data is used during adaptation and how is it generated? What is the testing gauntlet prior to release?

Topping off, what roles do you play in the project? Public outreach is one, obviously. (Like with Komodo.)
Most of the people on the project are only known by their handles on either LiChess or LeelaDiscord, I don't know which of them would like their actual names made public. It got started by "Naphthalin" on the Leela team, who suggested that using Contempt would make Leela play much better when giving odds, and when I signed on and offered to help in various ways it became a reality. For the first year it was just this, Leela with Contempt, but then people started training nets for specific odds which turned out to be much stronger than just Leela with Contempt. First "Marcus87" trained nets for knight odds (later making a combined knight odds/rook odds net), which is still in use in the bots to this day, a year later! Then "Noah" made a very good queen odds net which we used until it was supplanted by a stronger net by "naca fox" ("ira_jed" on lichess) in August, who also made the net currently used for two knights handicap. Meanwhile "amjsh" invented "search contempt" which was implemented in late Feb. of this year, which provided a major boost in strength. Later "Lucario6607" improved this further with variable search contempt, and quite recently "Borg" implemented my suggestion of variable max nodes. Also "Menkib" implemented Syzygy tablebases for us. Others, including "johnsp" and "Tiiber" have been involved recently, especially with the "LeelaPieceOddsFRC bot created and run by "Naphthalin". My role in the project consists of 1. Writing and updating the opening book for the odds 2. Parameter tuning for each new net and for each handicap 3. Running the bots on my home computers (which requires restarts after every change, or power outage, or windows update...). 4. I ran many of the games needed on which to train the nets. 5. Testing - I have to determine which new nets or search changes actually are worth using in the bots. This is primarily done by playing against "human-like" nets of suitable strength that were themselves trained on human games. Regarding the actual training, I make suggestions and contribute to the cost of renting machines for this when needed (often we can train with our own hardware, depending on net size etc.), but I don't know enough to train nets myself. Game generation for testing is typically between some normal Leela net (sometimes with Contempt) and some human-like net or nets. Anywhere from 100k to 400 or 500k games have been used for various nets. Starting with an existing strong net is what keeps the needed number of games to manageable numbers; starting from scratch would take millions of games, though it might produce even stronger nets eventually.
To keep track of credits I found it helpful to write down a crib sheet of contributors. Anyone missing?
  • o Knight Odds: Marcus87
    o Knight+Rook Odds: Marcus87
    o Two Knights Odds: naca fox
    o Queen Odds: Noah -> naca fox ("ira_jed" on lichess)
    o FRC Piece Odds: Naphthalin, johnsp, Tiiber
    o Simple contempt: Naphthalin
    o Variable Search contempt: amjsh, Lucario6607, Borg
    o Syzygy Implementation: Menkib
    o Game generation: lkaufman, lucario6607
What I had not appreciated is that the Odds Bots are hosted by developer volunteers. Fitting I guess for the community-supported nature of Lichess.
All those who train nets also do game generation. Lucario did a lot of it. Noah does the Leaderboard website. Knight+Rook Odds should just read Rook odds, as you have knight odds listed separately (the same net was trained on both). Naca Fox is now starting to do param tuning, which until now only I did. Borg has taken on various tasks related to the project. Johnsp and Tiiber have trained nets for us, but so far they haven't beaten what we now use. Menkib is working on alleviating the memory limitations we have now with the bots. Markus87 created several of the opponent bots we use.
Komodo rules!
lkaufman
Posts: 6279
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: One million games by LeelaOddsBots

Post by lkaufman »

M ANSARI wrote: Wed Dec 10, 2025 8:41 am Thank you for all your efforts in this. I personally have not noticed that the Bot is 50 elo stronger, but I still feel that sometimes it plays much weaker than usual. Maybe when there are many players simultaneously it loses strength. Hard to believe that the AI engine down a queen can be such a challenge ... but it is. I just checked my games and I have about a 1400 games with approximately 20% win rate. In my defense many of those games were when I was trying faster time controls like 1+2 or even 2+3. I have now given up on those time controls as I feel at my age I just suck at fast time controls and the bot has an uncanny ability to throw in a bunch of moves really fast as time is down to seconds, that need accurate calculation. I am especially surprised how easily I can fall for stalemate tricks in totally winning endgames when you put your guard down ... I am happy to say in this regard I have improved dramatically and am much more aware of stalemate tricks.

I will say that I feel this bot certainly improved my chess knowledge and chess play. I tend to appreciate much more how material is not as important as active pieces. I mean you hear about it and read about it all the time ... but here you really get to see how powerful that is. Also that a pawn storm can be incredibly dangerous even when the other side doesn't have a queen! You need to sacrifice a few defensive moves to defuse this. It also teaches you that sometimes you need to give up material to simplify the game into an easily won endgame. So really kudos to all the people in this project, you really have contributed dramatically to chess!!!
My own experience is similar to yours. At age 78 I'm quite weak in bullet chess and really need Rapid time controls to play close to my standard rating (relative to others of course). I notice the same things about stalemates, pawn storms, and general willingness to sacrifice material for vague compensation. With queen odds I generally lose at blitz, generally win at Rapid or slow blitz levels, though I haven't yet played against the new BT4 net. I also believe I am learning a lot from playing the bots, though I may not be playing much more classical chess to prove so. I think we have also shown pretty accurately the relative values of the pieces IN THE INITIAL POSITION; with "f" pawn = 1, N=3, B=3.5, R=4, Q=8 seems pretty accurate.
Komodo rules!