Strategies for Testing with UHO Openings and Bullet Time Controls

supernova · Post by **supernova** » Thu Apr 24, 2025 5:17 pm

When testing computer chess engines under specific conditions like UHO (Unorthodox Openings) and bullet time controls (1-minute games), several critical issues arise. These challenges primarily revolve around reliability, statistical validity, and balancing quantity versus quality of games. Below is a detailed analysis of these issues and potential strategies to address them.

1. Issues with UHO Openings

UHO openings, by their nature, are highly irregular and often lead to positions that are far removed from standard chess theory. While this can be an interesting way to test engines' adaptability and creativity, it introduces several challenges:

Bias in Opening Selection: The choice of UHO openings can heavily influence the results. Some engines may be better optimized for handling chaotic or unconventional positions, while others may struggle. This creates a bias that skews the rating list, as the results may not reflect the engines' overall strength in more balanced or standard positions.

Reduced Relevance to Practical Play: Testing engines exclusively with UHO openings may not provide meaningful insights into their performance in real-world scenarios. This limits the applicability of the rating list for users who want engines for practical purposes, such as standard chess analysis or preparation.

Overfitting to Specific Openings: Engines might "learn" to perform well in specific UHO positions if the same set of openings is repeatedly used. This overfitting undermines the generalizability of the results.

2. Challenges of Bullet Time Controls (1-Minute Games)

Bullet chess introduces its own set of problems when used as a testing environment for engines:

Emphasis on Speed Over Quality: In bullet games, engines are forced to prioritize speed over deep calculation. This can lead to suboptimal moves and a focus on tactics rather than strategy. As a result, the rating list may favor engines with faster evaluation functions rather than those with superior overall strength.

Increased Randomness: The shorter time control increases the likelihood of blunders, even for engines. This randomness can distort the results, making it harder to determine which engine is genuinely stronger.

Limited Depth of Analysis: Bullet games do not allow engines to reach their full potential in terms of depth of calculation. This means the results may not accurately reflect their capabilities in longer time controls, where deeper analysis is possible.

3. Quantity vs. Quality of Games

To create a reliable rating list, a balance must be struck between the number of games played and the quality of those games. This balance is particularly challenging in the context of UHO openings and bullet time controls:

Quantity of Games: A large number of games is necessary to reduce statistical noise and account for the inherent randomness of bullet chess. However, playing a high volume of games can be computationally expensive and time-consuming.

Quality of Games: The quality of games is often compromised in bullet chess due to the time constraints. Additionally, the use of UHO openings can lead to positions that are less instructive or meaningful for evaluating engine strength.

Statistical Reliability: A high quantity of low-quality games may still fail to produce reliable results. Conversely, a smaller number of high-quality games may not provide enough data to draw statistically significant conclusions.

Strategies to Address These Issues

To mitigate the challenges outlined above, the following strategies can be employed:

1. Diversify Opening Selection
Use a wide variety of UHO openings to reduce bias and prevent engines from overfitting to specific positions.
Consider including a mix of standard and unorthodox openings to provide a more balanced test environment.
2. Adjust Time Controls
While bullet games are fast and exciting, incorporating slightly longer time controls (e.g., 2+1 or 3+2) can improve the quality of play without sacrificing too much speed.
Alternatively, use bullet games for initial testing and longer time controls for tie-breaks or final evaluations.
3. Use Statistical Techniques
Employ techniques like Elo inflation adjustment or Bayesian rating systems to account for the increased randomness in bullet games.
Run multiple matches between the same engines with different UHO openings to ensure the results are not skewed by specific positions.
4. Focus on Engine Behavior
Analyze not just the win/loss outcomes but also the quality of moves played by the engines. Metrics like average centipawn loss or blunder rates can provide additional insights into engine performance.
5. Balance Quantity and Quality
Instead of playing an excessive number of bullet games, focus on a moderate number of games with slightly longer time controls and diverse openings. This approach strikes a balance between statistical reliability and meaningful evaluation.

Modern Times · Post by **Modern Times** » Fri Apr 25, 2025 12:28 am

I don't agree with your observations on UHO openings. For computer chess I think they are absolutely the best way to differentiate and rate engines for ratings lists and suitability for analysis. SPCC EAS is also an innovative new initiative. https://www.sp-cc.de/eas-ratinglist.htm

Different variants like Chess960 and Chess324 are also very valuable in my view.

Bullet time controls 1-minute games - agreed, to me they have no relevance. Longer time controls are far more meaningful, but you must have sufficient volume so that the statistical margins of error are low enough.

Also important is the need for a structured and methodical approach for opponent selection, number of games per pairing etc. in order to avoid ratings distortion.

AndrewGrant · Post by **AndrewGrant** » Fri Apr 25, 2025 12:48 am

Respectfully, this just reads like a ChatGPT summary. I'm not sure what you've personally added to the conversation here.

I think the only part of interest is whether or not engines are actually being optimized for specific openings. And I think they certainly are. Think training vs validation data. But I would bet good money that the results generalize to all openings.

Worth noting that a 60+1 time control on modern hardware is unbelievably high quality chess. Take into account that a modern SF would be 10x to 20x slower on a Pentium III for example.

Modern Times · Post by **Modern Times** » Fri Apr 25, 2025 1:09 am

AndrewGrant wrote: ↑Fri Apr 25, 2025 12:48 am Worth noting that a 60+1 time control on modern hardware is unbelievably high quality chess. Take into account that a modern SF would be 10x to 20x slower on a Pentium III for example.

Very good point, with advances in computer hardware yesterday's long time controls are today's bullet in terms of quality. And today's long time control will be bullet in future.

pohl4711 · Post by **pohl4711** » Fri Apr 25, 2025 6:37 am

AndrewGrant wrote: ↑Fri Apr 25, 2025 12:48 am
Worth noting that a 60+1 time control on modern hardware is unbelievably high quality chess.

And, additionally, the modern hardware (and their usage) can be very different:
The ccc-tournaments on chesscom are running on a brutal fast machine, the engines have 250 threads for their calculations (!!!) and one engine can use these threads all alone. So, a 1min+1sec Bullet here is on a completely another level than (for example) my ratinglist-testruns. I do these testruns on a "modern hardware", too, but my machine is much slower (16c/32t) and the engines are running singlethread-mode, when tested. So, in my testings, the level of chess is much lower, even though I do my testruns with 3min+1sec instead of 1min+1sec bullet...

pohl4711 · Post by **pohl4711** » Fri Apr 25, 2025 7:29 am

supernova wrote: ↑Thu Apr 24, 2025 5:17 pm Testing engines exclusively with UHO openings may not provide meaningful insights into their performance in real-world scenarios.

The problem here is, that in computerchess, UHO openings are the real-world scenario since several years:
- UHO-openings are used in the development of all top engines (look at Fishtest or OpenBench...) since end of 2021 (since August 2021, Stockfish-Framework uses UHO-openings, OpenBench followed a little later)
- UHO-openings are used in the 2 tournaments, which are important in computerchess (and recognized in the chess-world): TCEC and the engine-tournaments on chesscom

So, only non-UHO-openings do not provide meaningful insights into the engine-performances. UHO does.
Thats the reason, why ratinglists, still using balanced openings, make no sense in these days anymore: Balanced openings just are no longer the "real-world scenario" in computerchess. UHO has taken over completely here since years. So, why testing engines completely different (balanced openings) to the way the engines are constructed and completely different to the way the engines play in the big tournaments (UHO openings) ? Obviously a bad idea, IMHO, and just ignoring the reality in computerchess.
Besides the fact, that the statistics are imploding, because of the draw-ratio of 90% or more, when pairing 2 strong engines for a ratinglist-testrun using balanced openings...

And for human games analysis or preparation, the engines of today are so much stronger than any human, that it is completely meaningless, if the engine is developed and tested, using UHO openings (this is what happens today) or not (before year 2021/22 or so). If anybody is afraid, that the modern engines will not deliver proper results, because UHO was used in their development (what I think, is completely nonsense), can easily use an older engine from 2021 or so. Stockfish 14 was the latest official Stockfish, developed using balanced openings. So, use Stockfish 14. -100 Elo to Stockfish 17.1, but this is completely meaningless, because Stockfish 14 has already 3750 CElo in my testings. Should be strong enough for any human purpose, until the end of time, or not?

Graham Banks · Post by **Graham Banks** » Fri Apr 25, 2025 8:08 am

I think that most of your UHO openings are too extreme.

I'd be interested to see whether viewership numbers for TCEC broadcasts have decreased since they started using UHO openings.

Ciekce · Post by **Ciekce** » Fri Apr 25, 2025 8:43 am

supernova wrote: ↑Thu Apr 24, 2025 5:17 pm <a thousand words of LLM slop>

Why is this allowed?

chrisw · Post by **chrisw** » Fri Apr 25, 2025 2:41 pm

Ciekce wrote: ↑Fri Apr 25, 2025 8:43 am
supernova wrote: ↑Thu Apr 24, 2025 5:17 pm <a thousand words of LLM slop>
Why is this allowed?

The test is whether or not the OP can follow through with some replies. Nothing so far. Maybe we should create a dump forum “Probable ChatAI content”

supernova · Post by **supernova** » Fri Apr 25, 2025 3:42 pm

AndrewGrant wrote: ↑Fri Apr 25, 2025 12:48 am Respectfully, this just reads like a ChatGPT summary. I'm not sure what you've personally added to the conversation here.

I think the only part of interest is whether or not engines are actually being optimized for specific openings. And I think they certainly are. Think training vs validation data. But I would bet good money that the results generalize to all openings.

Worth noting that a 60+1 time control on modern hardware is unbelievably high quality chess. Take into account that a modern SF would be 10x to 20x slower on a Pentium III for example.

Andrew, I understand your concerns about my previous post, but I thought to bring an old subject already discussed. You're right that engines are being optimized for specific openings, and that's a valid point. However, I'm not convinced that the results necessarily generalize to all openings.

If an engine is trained on a specific set of openings, it's likely to perform better in those openings, but that doesn't mean it'll do as well in others. UHO, in my opinion, has its limitations. A 1-minute game is a good way to test an engine's speed, but it's not necessarily a good way to test its strategic understanding. Openings with high bias, like +1.50 centi-pawn, can be problematic as they may not accurately reflect the engine's capabilities.

I think Chess.com's goal is to promote their engine, Torch, through this tournament, which is a high-traffic platform. This doesn't necessarily mean the entire exercise is without value, but it does raise questions about the motivations behind it. TCEC's approach seems more balanced, and their results are worth considering alongside Chess.com's.

I used Grammarly to help with grammar, as I'm not a native English speaker. While it has AI capabilities, I'm still the one driving the conversation.

Regarding the 60+1 time control, I disagree that it's always "high-quality chess." While it's faster than over-the-board play, engines can still make mistakes, even with ample time. In tactical positions, sometimes they need to see a few moves ahead to resolve the situation.

Strategies for Testing with UHO Openings and Bullet Time Controls

Strategies for Testing with UHO Openings and Bullet Time Controls

Re: Strategies for Testing with UHO Openings and Bullet Time Controls

Re: Strategies for Testing with UHO Openings and Bullet Time Controls

Re: Strategies for Testing with UHO Openings and Bullet Time Controls

Re: Strategies for Testing with UHO Openings and Bullet Time Controls

Re: Strategies for Testing with UHO Openings and Bullet Time Controls

Re: Strategies for Testing with UHO Openings and Bullet Time Controls

Re: Strategies for Testing with UHO Openings and Bullet Time Controls

Re: Strategies for Testing with UHO Openings and Bullet Time Controls

Re: Strategies for Testing with UHO Openings and Bullet Time Controls