I have a question for my critics:
I built the Leptir Analyzer engine for deep analysis, and I think the engine is good for that purpose. You can even have fun while playing (I would never use the engine myself for blitz games with only 2700 kns, I'm surprised myself). What I have done wrong?
I'm disappointed with Stockfish dev.
Moderator: Ras
-
- Posts: 1439
- Joined: Sat Oct 27, 2018 12:58 am
- Location: Germany
- Full name: N.N.
-
- Posts: 143
- Joined: Wed Feb 28, 2018 2:50 pm
Re: I'm disappointed with Stockfish dev.
Piranha was not eaten. It was simply a book fault from white!Eduard wrote: ↑Wed Mar 08, 2023 7:58 pmYou live in a world where only statistics count. Can you still enjoy individual nice games and nice analyses?DrEinstein wrote: ↑Wed Mar 08, 2023 1:48 pmIt's obvious that Leptier has a wide search and is thus good for analysis, were time doesn't play a major role, and in solving test suites. The larger time to depth, however, should probably result in a worse game playing, The above two tournaments have too large error bars. So they don't tell us much!Eduard wrote: ↑Wed Mar 08, 2023 1:01 pm I didn't know that he tested my engine and why in this way and not otherwise.
Here is his homepage:
https://ipmanchess.yolasite.com/
The search means how quickly the engine reaches a certain search depth.
By the way, the day before yesterday, Leptir Analyzer took first place in the Blitz tournament on PlayChess, by points, with only 4 cores and about 3000 kns. 17 rounds without a loss, one win.
Again, why you do not make a STC tournament with 15 cores and stop it when the LOS is very close to 1. I would really like to know how much ELO Leptier is loosing in a h2h match vs Stockfish dev! If it's 5 to 10 Elo, who cares.
I'm afraid, that no one will ever do this test. Both sides seem to be afraid that a statistically correct result will not meet their expectations. And I mean the Elo difference, that LOS=1 in favour of SFdev should be clear for everyone, I hope.
The Ipman test is enough for me. It shows that the engine can play well even with short time controls.
In my EN test 2022, it is the best engine.
Its good on the server too, almost unbelievable! The day before yesterday a shared first place and today the sole winner (Engine Piranha was eaten!).
The hardware of SUPERCOMPUTER (from a game against me):
EMAN 8.70 CLUSTER 64-bit AVX2 1,016,613kN/s 12 x AMD EPCY 7B12+7662 64-Core Processor 2250MHz, (768 cores, 1536 threads)
Leptir Analyzer hardware (from a game against me, he is a friend of mine and plays with my books):
{Leptir Analyzer-avx2 (4 cores): 32.3 plies; 2.723kN/s Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz 2594MHz, (4 cores, 8 threads)
The best 10:
![]()
-
- Posts: 540
- Joined: Tue Feb 04, 2014 12:25 pm
- Location: Gower, Wales
- Full name: Colin Jenkins
Re: I'm disappointed with Stockfish dev.
Based on that statement alone, nothing obviously; fill your boots and have fun as long as you respect software licences.Eduard wrote: ↑Thu Mar 09, 2023 7:24 am I have a question for my critics:
I built the Leptir Analyzer engine for deep analysis, and I think the engine is good for that purpose. You can even have fun while playing (I would never use the engine myself for blitz games with only 2700 kns, I'm surprised myself). What I have done wrong?
I suspect the push-back has been because you were "disappointed" in the performance of something in context X when in fact is was developed for context Y. It's like saying "I'm disappointed in the performance of this wood drill when I use it to drill holes in concrete".
-
- Posts: 143
- Joined: Wed Feb 28, 2018 2:50 pm
Re: I'm disappointed with Stockfish dev.
Eduard is always fishing for compliments. Every 3 days he comes along with a new super engine. COMPLETE BULLSHIT!!op12no2 wrote: ↑Thu Mar 09, 2023 12:38 pmBased on that statement alone, nothing obviously; fill your boots and have fun as long as you respect software licences.Eduard wrote: ↑Thu Mar 09, 2023 7:24 am I have a question for my critics:
I built the Leptir Analyzer engine for deep analysis, and I think the engine is good for that purpose. You can even have fun while playing (I would never use the engine myself for blitz games with only 2700 kns, I'm surprised myself). What I have done wrong?
I suspect the push-back has been because you were "disappointed" in the performance of something in context X when in fact is was developed for context Y. It's like saying "I'm disappointed in the performance of this wood drill when I use it to drill holes in concrete".
If his so called improvements would help Stockfish the guys from the SF team would implement these improvements.
-
- Posts: 10770
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: I'm disappointed with Stockfish dev.
I believe that they are best only because they have a huge hardware advantage relative to their opponents.CornfedForever wrote: ↑Wed Mar 08, 2023 10:06 pmEnough with what is essentially name calling rather than an argument. I'm talking about the data and not knowing with any real certainly how you get to a + elo or a -elo (outside of tollerance) because so much is tested together. I mean...if every patch was a positive...SF would be increasing in strength every week. It is not.syzygy wrote: ↑Wed Mar 08, 2023 9:34 pmNo, they see Dunning-Kruger at work.CornfedForever wrote: ↑Tue Mar 07, 2023 4:02 amAnd they wonder why I question how they can know which changes actually resulted in a positive change and which result in a negative change.![]()
I believe that they do not use their hardware efficiently.
I would like to see some engine that people test with the principle that every patch that they accept is tested in fixed number of games against previous versions in order to get an unbiased estimate of improvement.
I basically would like to see unbiased estimates of elo of version X+1 vs X based on 100,000 games and same of version X+2 against X+1 and of version X+2 against X and maybe we are going to find cases when elo is not transitive and in this case it may be interesting if we can repeat the test to show that it is not luck but the nature of changes.
-
- Posts: 391
- Joined: Tue Oct 08, 2019 11:39 pm
- Full name: Tomasz Sobczyk
Re: I'm disappointed with Stockfish dev.
I think I have an engine for you then! It's called Stockfish. They even do something better than a fixed number of games, they use SPRT which is more statistically sound. But if you still insist on fixed games matches then they conveniently do regular regression tests!
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.
Maybe you copied your stockfish commits from someone else too?
I will look into that.
-
- Posts: 1439
- Joined: Sat Oct 27, 2018 12:58 am
- Location: Germany
- Full name: N.N.
Re: I'm disappointed with Stockfish dev.
Thanks! So we agree with each other. What I do, I always do with the greatest fun. Some people say that clones are not necessary, we have Stockfish. Unfortunately I see it differently. Why shouldn't I try to improve things. Sure, if it's all about Elos, you should use Stockfish. However, I would like to point out that a reduction of 5 to 10 Elo does not matter in practice. I play on the servers and we play with books where all chess theory is included. A game usually starts in the middle game, and the positions probably don't have much to do with the Fishtest environment. Also all positions played at TCEC are too bad for our purposes. We don't have that many advantages. Little things often make the difference on the server.op12no2 wrote: ↑Thu Mar 09, 2023 12:38 pmBased on that statement alone, nothing obviously; fill your boots and have fun as long as you respect software licences.Eduard wrote: ↑Thu Mar 09, 2023 7:24 am I have a question for my critics:
I built the Leptir Analyzer engine for deep analysis, and I think the engine is good for that purpose. You can even have fun while playing (I would never use the engine myself for blitz games with only 2700 kns, I'm surprised myself). What I have done wrong?
I suspect the push-back has been because you were "disappointed" in the performance of something in context X when in fact is was developed for context Y. It's like saying "I'm disappointed in the performance of this wood drill when I use it to drill holes in concrete".
It looks like this. The following position arose in one of my server games and Stockfish lost it with black.

r3r2k/p1pq3p/R3bppN/2P5/1p1pP1P1/3P4/2P2P1P/3Q2RK b - - 0 30
Yes, it was just a blitz game. Nevertheless: I presented my new testsuite ENET-2023 today.
The very first engine to go through this test was Stockfish dev 080323, so I took a close look at the analyses. In the position on the board, Stockfish dev 080323 wanted the immediate losing move 30...Qb5?? even after 60s play. After all I had 20 threads and about 18000 kns. I think that's too much. I don't want to play with such an engine. I prefer to do without 5 Elo and fix the search in my engine. This is my opinion.
-
- Posts: 137
- Joined: Sat Dec 04, 2010 5:31 pm
- Location: 223
Re: I'm disappointed with Stockfish dev.
Would you mind posting the complete game ?
Judge without bias, or don't judge at all...
-
- Posts: 1439
- Joined: Sat Oct 27, 2018 12:58 am
- Location: Germany
- Full name: N.N.
Re: I'm disappointed with Stockfish dev.
I don't want to give names. Notation should suffice.
[Event "Rated game, 5 min"]
[Site "Engine Room"]
[Date "2022.11.19"]
[Round "?"]
[White "ENG"]
[Black "ENG"]
[Result "1-0"]
[ECO "C65"]
[WhiteElo "2950"]
[BlackElo "2949"]
[PlyCount "69"]
[EventDate "2022.01.01"]
[SourceTitle "playchess.com"]
1. e4 e5 2. Nf3 Nc6 3. Bb5 Nf6 4. d3 Bc5 5. Bxc6 dxc6 6. O-O Nd7 7. Nbd2 O-O 8. Nc4 Re8 9. Bd2 f6 10. Nh4 g6 11. Kh1 Nb6 12. Na5 Bf8 13. a4 Nd7 14. g3 Nc5 15. Nc4 Ne6 16. Ng2 b5 17. Na5 Bd7 18. Ne3 Bg7 19. Bc3 Nd4 20. axb5 cxb5 21. Nb7 Qc8 22. Nc5 Bh3 23. Rg1 Bf8 24. b4 Bxc5 25. bxc5 Be6 26. g4 Qd7 27. Ra6 b4 28. Bxd4 exd4 29. Nf5 Kh8 30. Nh6 Qb5 31. Qf3 Kg7 32. e5 Qxa6 33. Qxf6+ Kxh6 34. g5+ Kh5 35. Qf3+ 1-0
On my Ryzen 2700 after 3 Minutes:
r3r2k/p1pq3p/R3bppN/2P5/1p1pP1P1/3P4/2P2P1P/3Q2RK b - - 0 1
Analysis by Stockfish dev-20230308-39da50ed:
30...Qb5 31.Qf3 Kg7 32.Rxe6 Rxe6 33.Nf5+ Kf7 34.Qh3 Kg8 35.Nxd4 Qd7 36.Nxe6 Qxe6 37.Qe3 a5 38.Qd4 Rb8 39.Qd5 Qxd5 40.exd5 a4 41.Kg2 Rd8 42.Rb1 Rxd5 43.Rxb4 Rxc5 44.Rxa4 Rxc2 45.Kf3 Rc5 46.Ke3 h5 47.h3 Kf7 48.Kd2 Ke6 49.f4 hxg4 50.hxg4 Rc6 51.Ra8 f5 52.g5 Kd5 53.Rd8+ Rd6 54.Rh8 Kd4 55.Re8 Ra6 56.Rd8+ Rd6 57.Re8
+/= (0.36 --) Depth: 50/71 00:02:53 1252MN, tb=88738
White is slightly better
Stockfish dev is blind. I don't like such engines, sorry. Better to have 5 Elo less in Bullet.
After Qb5??
Analysis by Stockfish dev-20230308-39da50ed:
31.Qf3 Qd7 32.Qxf6+ Qg7 33.g5 Bc8 34.Rc6 Qxf6 35.gxf6 Re6 36.f7 Kg7 37.Nf5+ Kf8 38.Rxc7 Rf6 39.Nd6 Bh3 40.e5 Rxf2 41.Ne4 Rxf7 42.Rxf7+ Kxf7 43.Ng5+ Ke7 44.Nxh3 a5 45.Ra1 Ke6 46.Re1 Rc8 47.Ng5+ Ke7 48.Nf3 Rxc5 49.Nxd4 Ke8 50.e6 Rd5 51.Re4 Rc5 52.e7 Rc8 53.Re5 a4 54.Ra5 a3 55.Ra7 b3 56.Nxb3 Rxc2
+- (3.37) Depth: 26/61 00:00:05 46402kN, tb=2523
White has a decisive advantage
It is easy to make a draw.
After Qe7:
Analysis by Stockfish dev-20230308-39da50ed:
31.Qd2 Bc8 32.Ra5 Rb8 33.Kg2 Rf8 34.Rb1 a6 35.h4 f5 36.Qg5 Qe6 37.gxf5 gxf5 38.exf5 Qd5+ 39.Kh2 Qe5+ 40.Qg3 Qf6 41.Ng4 Qxf5 42.f3 Bd7 43.Qe5+ Qxe5+ 44.Nxe5 Bb5 45.Re1 Rf5 46.Re4 Rd8 47.c6 Rd5 48.f4 Bxd3 49.Rxd5 Bxe4 50.Rxd4 Bxc2 51.Rxb4 Rf8 52.Rb7
= (0.25 --) Depth: 37/50 00:00:57 450MN, tb=15812
White has an edge
[Event "Rated game, 5 min"]
[Site "Engine Room"]
[Date "2022.11.19"]
[Round "?"]
[White "ENG"]
[Black "ENG"]
[Result "1-0"]
[ECO "C65"]
[WhiteElo "2950"]
[BlackElo "2949"]
[PlyCount "69"]
[EventDate "2022.01.01"]
[SourceTitle "playchess.com"]
1. e4 e5 2. Nf3 Nc6 3. Bb5 Nf6 4. d3 Bc5 5. Bxc6 dxc6 6. O-O Nd7 7. Nbd2 O-O 8. Nc4 Re8 9. Bd2 f6 10. Nh4 g6 11. Kh1 Nb6 12. Na5 Bf8 13. a4 Nd7 14. g3 Nc5 15. Nc4 Ne6 16. Ng2 b5 17. Na5 Bd7 18. Ne3 Bg7 19. Bc3 Nd4 20. axb5 cxb5 21. Nb7 Qc8 22. Nc5 Bh3 23. Rg1 Bf8 24. b4 Bxc5 25. bxc5 Be6 26. g4 Qd7 27. Ra6 b4 28. Bxd4 exd4 29. Nf5 Kh8 30. Nh6 Qb5 31. Qf3 Kg7 32. e5 Qxa6 33. Qxf6+ Kxh6 34. g5+ Kh5 35. Qf3+ 1-0
On my Ryzen 2700 after 3 Minutes:
r3r2k/p1pq3p/R3bppN/2P5/1p1pP1P1/3P4/2P2P1P/3Q2RK b - - 0 1
Analysis by Stockfish dev-20230308-39da50ed:
30...Qb5 31.Qf3 Kg7 32.Rxe6 Rxe6 33.Nf5+ Kf7 34.Qh3 Kg8 35.Nxd4 Qd7 36.Nxe6 Qxe6 37.Qe3 a5 38.Qd4 Rb8 39.Qd5 Qxd5 40.exd5 a4 41.Kg2 Rd8 42.Rb1 Rxd5 43.Rxb4 Rxc5 44.Rxa4 Rxc2 45.Kf3 Rc5 46.Ke3 h5 47.h3 Kf7 48.Kd2 Ke6 49.f4 hxg4 50.hxg4 Rc6 51.Ra8 f5 52.g5 Kd5 53.Rd8+ Rd6 54.Rh8 Kd4 55.Re8 Ra6 56.Rd8+ Rd6 57.Re8
+/= (0.36 --) Depth: 50/71 00:02:53 1252MN, tb=88738
White is slightly better
Stockfish dev is blind. I don't like such engines, sorry. Better to have 5 Elo less in Bullet.
After Qb5??
Analysis by Stockfish dev-20230308-39da50ed:
31.Qf3 Qd7 32.Qxf6+ Qg7 33.g5 Bc8 34.Rc6 Qxf6 35.gxf6 Re6 36.f7 Kg7 37.Nf5+ Kf8 38.Rxc7 Rf6 39.Nd6 Bh3 40.e5 Rxf2 41.Ne4 Rxf7 42.Rxf7+ Kxf7 43.Ng5+ Ke7 44.Nxh3 a5 45.Ra1 Ke6 46.Re1 Rc8 47.Ng5+ Ke7 48.Nf3 Rxc5 49.Nxd4 Ke8 50.e6 Rd5 51.Re4 Rc5 52.e7 Rc8 53.Re5 a4 54.Ra5 a3 55.Ra7 b3 56.Nxb3 Rxc2
+- (3.37) Depth: 26/61 00:00:05 46402kN, tb=2523
White has a decisive advantage
It is easy to make a draw.
After Qe7:
Analysis by Stockfish dev-20230308-39da50ed:
31.Qd2 Bc8 32.Ra5 Rb8 33.Kg2 Rf8 34.Rb1 a6 35.h4 f5 36.Qg5 Qe6 37.gxf5 gxf5 38.exf5 Qd5+ 39.Kh2 Qe5+ 40.Qg3 Qf6 41.Ng4 Qxf5 42.f3 Bd7 43.Qe5+ Qxe5+ 44.Nxe5 Bb5 45.Re1 Rf5 46.Re4 Rd8 47.c6 Rd5 48.f4 Bxd3 49.Rxd5 Bxe4 50.Rxd4 Bxc2 51.Rxb4 Rf8 52.Rb7
= (0.25 --) Depth: 37/50 00:00:57 450MN, tb=15812
White has an edge
-
- Posts: 10770
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: I'm disappointed with Stockfish dev.
SPRT does not give unbiased estimate rating change.Sopel wrote: ↑Thu Mar 09, 2023 1:51 pmI think I have an engine for you then! It's called Stockfish. They even do something better than a fixed number of games, they use SPRT which is more statistically sound. But if you still insist on fixed games matches then they conveniently do regular regression tests!
You only know that the change is probably improvement but you have no idea if it is 1 elo improvement or 5 elo improvement.
I write "probably improvement" because there is a small probability that a change with no improvement pass and if you test many 0 elo patches then it means that this probability is probabily higher because if you choose many patches that give 0 elo then statistics tell me that some can pass the SPRT tests.
Regular regression tests are not for version X against X+1 and X against X+2 and they only do it only after some patches pass so they are basically something like X against X+20 (when X+20 means 20 patches after X)