I'm disappointed with Stockfish dev.

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Sopel
Posts: 391
Joined: Tue Oct 08, 2019 11:39 pm
Full name: Tomasz Sobczyk

Re: I'm disappointed with Stockfish dev.

Post by Sopel »

Uri Blass wrote: Sat Mar 11, 2023 9:32 am I prefer to get less elo and better understanding because better understanding may help later for better decisions what to test later.
How helpful would it be to have +0.7 Elo +-0.5 for every patch?
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.

Maybe you copied your stockfish commits from someone else too?
I will look into that.
syzygy
Posts: 5672
Joined: Tue Feb 28, 2012 11:56 pm

Re: I'm disappointed with Stockfish dev.

Post by syzygy »

chrisw wrote: Sat Mar 11, 2023 12:35 am
syzygy wrote: Fri Mar 10, 2023 11:13 pm
CornfedForever wrote: Fri Mar 10, 2023 11:02 pm
syzygy wrote: Fri Mar 10, 2023 10:19 pm
CornfedForever wrote: Wed Mar 08, 2023 10:06 pm
syzygy wrote: Wed Mar 08, 2023 9:34 pm
CornfedForever wrote: Tue Mar 07, 2023 4:02 am
Eduard wrote: Mon Mar 06, 2023 12:25 pm

A total of 73 parameters were changed here. Known parameters that are constantly changing. Let's see when one of these parameters will be changed again? It won't take too long. :)

And they wonder why I question how they can know which changes actually resulted in a positive change and which result in a negative change. :roll:
No, they see Dunning-Kruger at work.
Enough with what is essentially name calling rather than an argument. I'm talking about the data and not knowing with any real certainly how you get to a + elo or a -elo (outside of tollerance) because so much is tested together. I mean...if every patch was a positive...SF would be increasing in strength every week. It is not.
Who says SF is not increasing in strength?

SF has increased hundreds of Elo because its development process works.
Of course ultimately there is a ceiling to what can be achieved.
Once again...intentionally misinterpreting my words...but we have come to expect that from you.

No one is saying it is 'not increasing in strength'. But there is are a series of patches released each and every week...if every one of them was a positive, it would be increasing each week and clearly it is not. Sometimes it is one step forward, two steps back...not a straight linear progression.

Maybe some remedial English for you is in order?
Ok, so you mean all is going perfectly fine with SF development. Then we can close the thread.

forum3/viewtopic.php?p=943314#p943314
syzygy wrote:The SF development process does not require 100% certainty that a patch gains Elo. It is a game of statistics.
You can NEVER be 100% sure that a patch that seems to gain 1 Elo really does not lose Elo.
You can be 99.99% sure if you want, but it would be a waste of resources.
Chess engine development is a game of statistcs.
Chess engine development is a game of statistcs.
Chess engine development is a game of statistcs.
That's a circular argument. If you only have a hammer then everything is hammering.
What argument?
CornfedForever
Posts: 646
Joined: Mon Jun 20, 2022 4:08 am
Full name: Brian D. Smith

Re: I'm disappointed with Stockfish dev.

Post by CornfedForever »

syzygy wrote: Sat Mar 11, 2023 8:55 pm
What argument?
If I may...just google 'circular argument' or 'circular reasoning'...it's a logical falllacy...and you will see what he mans.
syzygy
Posts: 5672
Joined: Tue Feb 28, 2012 11:56 pm

Re: I'm disappointed with Stockfish dev.

Post by syzygy »

CornfedForever wrote: Sat Mar 11, 2023 9:01 pm
syzygy wrote: Sat Mar 11, 2023 8:55 pm
What argument?
If I may...just google 'circular argument' or 'circular reasoning'...it's a logical falllacy...and you will see what he mans.
Oh boy. You must be a fun person.

What argument did I make. Before an argument can be circular, there has to be an argument.
chrisw
Posts: 4617
Joined: Tue Apr 03, 2012 4:28 pm
Location: Midi-Pyrénées
Full name: Christopher Whittington

Re: I'm disappointed with Stockfish dev.

Post by chrisw »

syzygy wrote: Sat Mar 11, 2023 8:55 pm
chrisw wrote: Sat Mar 11, 2023 12:35 am
syzygy wrote: Fri Mar 10, 2023 11:13 pm
CornfedForever wrote: Fri Mar 10, 2023 11:02 pm
syzygy wrote: Fri Mar 10, 2023 10:19 pm
CornfedForever wrote: Wed Mar 08, 2023 10:06 pm
syzygy wrote: Wed Mar 08, 2023 9:34 pm
CornfedForever wrote: Tue Mar 07, 2023 4:02 am
Eduard wrote: Mon Mar 06, 2023 12:25 pm

A total of 73 parameters were changed here. Known parameters that are constantly changing. Let's see when one of these parameters will be changed again? It won't take too long. :)

And they wonder why I question how they can know which changes actually resulted in a positive change and which result in a negative change. :roll:
No, they see Dunning-Kruger at work.
Enough with what is essentially name calling rather than an argument. I'm talking about the data and not knowing with any real certainly how you get to a + elo or a -elo (outside of tollerance) because so much is tested together. I mean...if every patch was a positive...SF would be increasing in strength every week. It is not.
Who says SF is not increasing in strength?

SF has increased hundreds of Elo because its development process works.
Of course ultimately there is a ceiling to what can be achieved.
Once again...intentionally misinterpreting my words...but we have come to expect that from you.

No one is saying it is 'not increasing in strength'. But there is are a series of patches released each and every week...if every one of them was a positive, it would be increasing each week and clearly it is not. Sometimes it is one step forward, two steps back...not a straight linear progression.

Maybe some remedial English for you is in order?
Ok, so you mean all is going perfectly fine with SF development. Then we can close the thread.

forum3/viewtopic.php?p=943314#p943314
syzygy wrote:The SF development process does not require 100% certainty that a patch gains Elo. It is a game of statistics.
You can NEVER be 100% sure that a patch that seems to gain 1 Elo really does not lose Elo.
You can be 99.99% sure if you want, but it would be a waste of resources.
Chess engine development is a game of statistcs.
Chess engine development is a game of statistcs.
Chess engine development is a game of statistcs.
That's a circular argument. If you only have a hammer then everything is hammering.
What argument?
It's been reduced to a statistical game by reducing the game itself to 1, 0 and -1. If however, you were to regard the chess engine/game/author/development thing with features more than 1,0,-1 it would be not just more than a "game of statistics" but have other properties too. It must have those other properties otherwise you wouldn't be doing it and the general interest would sink to zero. One assumes.
connor_mcmonigle
Posts: 544
Joined: Sun Sep 06, 2020 4:40 am
Full name: Connor McMonigle

Re: I'm disappointed with Stockfish dev.

Post by connor_mcmonigle »

It's immediately telling that the only individuals criticizing the Stockfish project's testing methodology are those who are not remotely experienced when it comes to actual chess engine development. SPRT both is statistically principled and has been empirically demonstrated to be effective across a wide variety of disciplines. It is true that results at STC aren't necessarily guaranteed to translate to results at LTC perfectly, but Stockfish's incredible progress in terms of Elo at a wide range of time controls over the last several years is a testament to the fact that the Stockfish project's testing methodology is effective.

This is a classic case of Dunning Kruger at work. To have any ground to stand on in criticizing Stockfish's testing methodology, you have to both propose and demonstrate an effective alternative. Go develop your own engine from scratch or start from a much weaker engine. Implement your own alternative testing methodology and see what kind of progress you make. You'll quickly find that practical considerations prevent VLTC SPRT testing. You'll quickly find that just using test positions (as was suggested earlier in this thread for some reason) as a proxy for engine strength will get you nowhere. Or you could try the approach which Eduard here has seemingly taken: make a few random changes and watch the engine play a handful of games on some random server and call it good enough. In a great surprise to no one, this also won't get you anywhere.

Starting with Stockfish as a base for experimenting with alternative testing methodologies is incredibly daft as Stockfish is so incredibly strong that random garbage changes usually won't significantly harm its LTC performance. If you've weakened Stockfish by tens of Elo in STC testing and your changes seem roughly neutral in limited VLTC testing, that doesn't mean your changes are brilliant and will continue to scale better at increasingly longer time controls. Rather, it just means chess is pretty close to a draw for an engine as strong as Stockfish and, with sufficient time, even garbage patches won't significantly harm its performance.
chrisw
Posts: 4617
Joined: Tue Apr 03, 2012 4:28 pm
Location: Midi-Pyrénées
Full name: Christopher Whittington

Re: I'm disappointed with Stockfish dev.

Post by chrisw »

connor_mcmonigle wrote: Sun Mar 12, 2023 12:24 am It's immediately telling that the only individuals criticizing the Stockfish project's testing methodology are those who are not remotely experienced when it comes to actual chess engine development. SPRT both is statistically principled and has been empirically demonstrated to be effective across a wide variety of disciplines. It is true that results at STC aren't necessarily guaranteed to translate to results at LTC perfectly, but Stockfish's incredible progress in terms of Elo at a wide range of time controls over the last several years is a testament to the fact that the Stockfish project's testing methodology is effective.

This is a classic case of Dunning Kruger at work. To have any ground to stand on in criticizing Stockfish's testing methodology, you have to both propose and demonstrate an effective alternative. Go develop your own engine from scratch or start from a much weaker engine. Implement your own alternative testing methodology and see what kind of progress you make. You'll quickly find that practical considerations prevent VLTC SPRT testing. You'll quickly find that just using test positions (as was suggested earlier in this thread for some reason) as a proxy for engine strength will get you nowhere. Or you could try the approach which Eduard here has seemingly taken: make a few random changes and watch the engine play a handful of games on some random server and call it good enough. In a great surprise to no one, this also won't get you anywhere.

Starting with Stockfish as a base for experimenting with alternative testing methodologies is incredibly daft as Stockfish is so incredibly strong that random garbage changes usually won't significantly harm its LTC performance. If you've weakened Stockfish by tens of Elo in STC testing and your changes seem roughly neutral in limited VLTC testing, that doesn't mean your changes are brilliant and will continue to scale better at increasingly longer time controls. Rather, it just means chess is pretty close to a draw for an engine as strong as Stockfish and, with sufficient time, even garbage patches won't significantly harm its performance.
This is true if there is only one hill to climb.
DrEinstein
Posts: 75
Joined: Wed Sep 15, 2021 8:50 pm
Full name: Albert Einstein

Re: I'm disappointed with Stockfish dev.

Post by DrEinstein »

So we are all patiently waiting for the next big jump to the bigger hill. I believe, or want to believe, that Stockfish is not yet standing on top of the highest mountain.
syzygy
Posts: 5672
Joined: Tue Feb 28, 2012 11:56 pm

Re: I'm disappointed with Stockfish dev.

Post by syzygy »

chrisw wrote: Sun Mar 12, 2023 12:03 am
syzygy wrote: Sat Mar 11, 2023 8:55 pm
chrisw wrote: Sat Mar 11, 2023 12:35 am
syzygy wrote: Fri Mar 10, 2023 11:13 pm Ok, so you mean all is going perfectly fine with SF development. Then we can close the thread.

forum3/viewtopic.php?p=943314#p943314
syzygy wrote:The SF development process does not require 100% certainty that a patch gains Elo. It is a game of statistics.
You can NEVER be 100% sure that a patch that seems to gain 1 Elo really does not lose Elo.
You can be 99.99% sure if you want, but it would be a waste of resources.
Chess engine development is a game of statistcs.
Chess engine development is a game of statistcs.
Chess engine development is a game of statistcs.
That's a circular argument. If you only have a hammer then everything is hammering.
What argument?
It's been reduced to a statistical game by reducing the game itself to 1, 0 and -1. If however, you were to regard the chess engine/game/author/development thing with features more than 1,0,-1 it would be not just more than a "game of statistics" but have other properties too. It must have those other properties otherwise you wouldn't be doing it and the general interest would sink to zero. One assumes.
I am not making an argument but stating a fact.

Sure, there is engine functionality not related to chess-playing strength, and I am not talking about that kind of functionality.
CornfedForever
Posts: 646
Joined: Mon Jun 20, 2022 4:08 am
Full name: Brian D. Smith

Re: I'm disappointed with Stockfish dev.

Post by CornfedForever »

DrEinstein wrote: Sun Mar 12, 2023 11:12 am So we are all patiently waiting for the next big jump to the bigger hill. I believe, or want to believe, that Stockfish is not yet standing on top of the highest mountain.
I wonder how one might define next "big jump". All the 'big jumps' have likely come and gone as engine strength is closer to topping out. What is left are likely 'little jumps'. The issue I (and I think others - but I do not speak for them ) see is that those are harder to find...and probably harder under the traditional testing framework to - these days, actually know 'what tweaks" actually' are responsible for those...really, very a little jumps if only because they fall closer to the 'margin of error'. You get a '+' and presume you 'have it' when it is part of multiple 'patches' working together...then later we find something in the tweaks/patches being disregarded or at least changed. And some people...do not seem to want to admit to seeing this 2 steps forward, 1 step back/1step forward, 2 step back thing happening. But it is a viable 'blind approach' that can work over time.

I (like to think) I know a little about quantum physics. There reality is just so 'odd' that no one currently fully understands it...you just "follow the math" into the darkness. Chess though is different animal as we know there are 'only' 10 to the 40 legal moves possible in a game, you play it on only 64 squars and Knights do not move like Bishops...etc.

Sure you can see VERY slow, incremental progress with the path being taken (and steps backward...). However, being at a bit of a loss for exactly what tweak 'works' means it resembles more 'wishcraft' than science - throwing things against the wall and hoping 'something' sticks (and often not knowing exactly what or why it stuck). It's almost like blindly taking herbs to combat Covid-19 until you eventually find in your testing a statistical 'hit' that seems to indicate 'something' in one of those herbs resulted in a tiny number of people not dying who might otherwise would have....vs identifying 'what' specific thing in a given herb actually is responsible and using that...or looking at things differently and finding a spike protein and using it to alert the bodies immune system to respond to something that looks like it...or viral vector technologies for dealing with other disease. etc. Wishcraft vs Science. Both can work...but with one you tend to know 'why' it is working...which in theory should mean 'less steps back'.