Some thoughts on QS

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Uri Blass
Posts: 10298
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Some thoughts on QS

Post by Uri Blass »

Don wrote:
Uri Blass wrote: Don, I believe you that komodo scales better relative to Critter and Houdini when you go from bullet to blitz and need smaller time advantage relative to them to score 50%.

It does not prove that it is continues to scale better when you go from blitz to longer time control.
This is an argument that is infinitely extendable and thus cannot be debated by reasonable people. If you refuse to use inference, then you have an infinite number of time controls that you have to prove.

In fact, the rating lists don't mean a thing. The program that is number 200 on the list may actually be the strongest program at 5 minutes + 4 seconds - nobody ever checked that exact time control so how you know for sure?

Don
The difference is that we have rating list of CCRL 40/40 to compare with faster time control and so far the results does not give supportive evidence that Komodo scales better at longer time control than CEGT 40/20
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Some thoughts on QS

Post by Don »

Uri Blass wrote:
Don wrote:
Uri Blass wrote: Don, I believe you that komodo scales better relative to Critter and Houdini when you go from bullet to blitz and need smaller time advantage relative to them to score 50%.

It does not prove that it is continues to scale better when you go from blitz to longer time control.
This is an argument that is infinitely extendable and thus cannot be debated by reasonable people. If you refuse to use inference, then you have an infinite number of time controls that you have to prove.

In fact, the rating lists don't mean a thing. The program that is number 200 on the list may actually be the strongest program at 5 minutes + 4 seconds - nobody ever checked that exact time control so how you know for sure?

Don
The difference is that we have rating list of CCRL 40/40 to compare with faster time control and so far the results does not give supportive evidence that Komodo scales better at longer time control than CEGT 40/20
I gave the evidence, and showed how it gained a lot of ELO but you did what you always do and made up some reason that you thought in reality Houdini was really the one increasing with time.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
syzygy
Posts: 5566
Joined: Tue Feb 28, 2012 11:56 pm

Re: Some thoughts on QS

Post by syzygy »

Don wrote:I gave the evidence, and showed how it gained a lot of ELO but you did what you always do and made up some reason that you thought in reality Houdini was really the one increasing with time.
As far as I see, Uri does not "make up a reason", but bases himself on rating lists. Now those don't have to speak the truth, but Uli does not claim to know the truth. (I did not look at these lists, and I have no opinion on who is "right" here. edit: hmm, results seem to be coming from different rating lists. Whether those are comparable I am not going to check.)
Uri Blass
Posts: 10298
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Some thoughts on QS

Post by Uri Blass »

syzygy wrote:
Don wrote:I gave the evidence, and showed how it gained a lot of ELO but you did what you always do and made up some reason that you thought in reality Houdini was really the one increasing with time.
As far as I see, Uri does not "make up a reason", but bases himself on rating lists. Now those don't have to speak the truth, but Uli does not claim to know the truth. (I did not look at these lists, and I have no opinion on who is "right" here. edit: hmm, results seem to be coming from different rating lists. Whether those are comparable I am not going to check.)
Note that I talked about Komodo5 and not about Komodo3
For Komodo3 all the evidence that I saw suggested that is scaled better


So far results in the CCRL are relatively better for Komodo5 with more games but I thought not only about the CCRL list but also about the 120+3 games when Komodo5 was not succesful in them.

After 614 games we have in the CCRL rating list

Komodo 5 64-bit 3256 +25 −25 62.5% −103.8 51.6% 614
Komodo 4 64-bit 3246 +13 −13 57.9% −68.4 48.1% 2425
Komodo 3 64-bit 3243 +17 −17 61.3% −98.1 42.5% 1435

earlier I had

Komodo 5 64-bit 3120 +24 −24 61.5% −73.5 53.4% 500
Komodo 4 64-bit 3117 +12 −12 57.9% −52.7 48.1% 2425
Komodo 3 64-bit 3114 +15 −15 61.3% −75.1 42.5% 1435

Note that I do not understand the more than 100 elo improvement in
40/40 and suddenly the top programs have better rating in 40/40 relative to 40/4 and the difference between programs in the 40/40 became bigger(weak programs did not earn rating points in the 40/40 list but all the top programs earned rating points)
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Some thoughts on QS

Post by Sven »

Uri Blass wrote:Note that I do not understand the more than 100 elo improvement in
40/40 and suddenly the top programs have better rating in 40/40 relative to 40/4 and the difference between programs in the 40/40 became bigger(weak programs did not earn rating points in the 40/40 list but all the top programs earned rating points)
Obviously some change has happened in the CCRL rating lists very recently. I think it is related to using different BayesElo parameters. The change affects the whole list. If you look to the bottom of the rating list you will see some engines rated around 1600 (e.g. my old engine "Surprise" and Julien's "Prédateur") which previously had around ELO 1950-2000 prior to the "offset -100" change some weeks ago, and afterwards around ELO 1850-1900. Also my newer engine "KnockOut" which had been around 2250 and then 2150 is now below 2000.

So the overall scaling is now different.

Another point regarding your post is that you must never compare ratings from two different rating lists even if both are from CCRL, i.e. any comparison of 40/40 ratings with 40/4 ratings is meaningless. You can compare ratings within each of these lists, and you can compare the relative ranking between the two lists, but the absolute rating numbers are always bound to exactly one list since each list represents an own pool of games, and in case of 40/40 vs. 40/4 the game pools are even fully disjoint.

Also absolute rating differences should not be compared between 40/40 and 40/4 CCRL lists since the scaling might differ somehow, and with the new BayesElo parameters I think the scaling even depends on properties of the corresponding set of games itself.

Sven
Uri Blass
Posts: 10298
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Some thoughts on QS

Post by Uri Blass »

Sven Schüle wrote:
Uri Blass wrote:Note that I do not understand the more than 100 elo improvement in
40/40 and suddenly the top programs have better rating in 40/40 relative to 40/4 and the difference between programs in the 40/40 became bigger(weak programs did not earn rating points in the 40/40 list but all the top programs earned rating points)
Obviously some change has happened in the CCRL rating lists very recently. I think it is related to using different BayesElo parameters. The change affects the whole list. If you look to the bottom of the rating list you will see some engines rated around 1600 (e.g. my old engine "Surprise" and Julien's "Prédateur") which previously had around ELO 1950-2000 prior to the "offset -100" change some weeks ago, and afterwards around ELO 1850-1900. Also my newer engine "KnockOut" which had been around 2250 and then 2150 is now below 2000.

So the overall scaling is now different.

Another point regarding your post is that you must never compare ratings from two different rating lists even if both are from CCRL, i.e. any comparison of 40/40 ratings with 40/4 ratings is meaningless. You can compare ratings within each of these lists, and you can compare the relative ranking between the two lists, but the absolute rating numbers are always bound to exactly one list since each list represents an own pool of games, and in case of 40/40 vs. 40/4 the game pools are even fully disjoint.

Also absolute rating differences should not be compared between 40/40 and 40/4 CCRL lists since the scaling might differ somehow, and with the new BayesElo parameters I think the scaling even depends on properties of the corresponding set of games itself.

Sven
ranking also has a statistical error and it is hard to find 2 programs
when A is significantly better than B at 40/40 when B is significantly better than A.

Not impossible but hard.

one example that I remember is movei and Colossus 2008b
40/40 has the following:
Movei 00.8.438 (10 10 10) 2668 +13 −13
Colossus 2008b 2634 +15 −15

40/4 has the following

Colossus 2008b 2638 +9 −9 45.6%
Movei 00.8.438 (10 10 10) 2623 +9 −8

People may argue that 2638-9<2623+9 so it is not a very good example
I think that based on the result it is possible to get 95% confidence that Collossus is better than movei at blitz because the possible error of the difference is not 18 elo but some thing like 13 elo but I do not think 95% confidence is enough here when I search for cases when program A is better than B in 40/4 and the opposite is at 40/40 because in small part of the cases I may get what I look for not because it is the case but because of statistical error.

Note that I believe that it is possible to show that movei has better ranking in long time control relative to blitz even after including statistical error but I see no simple way to calculate statistical error for ranking.

I would like to have not one number for the ranking in the 40/4 rating list but something like
1)Houdini 2.0c 64-bit 3258-3294
2-3)Komodo 5 64-bit 3193-3240
2-4)Strelka 5.1 64-bit 3186-3226
2-5)Critter1.4 64 bits 3182-3218
4-8)Stockfish 2.2.2 64-bit 3155-3191
5-8)Rybka 4.1 64-bit 3154-3180

The idea is that maybe the ranking of komodo is 2-3 with 95% confidence(it is only a speculation and I do not see a simple way to calculate ranking with 95% confidence)

I can look at the lower bound of Komodo 3193 and claim that it may be weaker than Strelka or Critter but not weaker than more programs but
in theory it may be not fair for Komodo because even if
I cannot be sure that komodo is stronger than strelka with 95% confidence and the same for Critter maybe I can be sure with 95% confidence that it is stronger than at least one of them so I can be sure with 95% confidence that it is place 2-3(I do not say that it is the case and only explain the idea).
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: Some thoughts on QS

Post by diep »

Rebel wrote:
diep wrote:I invented reductions in 1999.
I used them since 1996 with Rebel 8. You stole them from me Vince?

:mrgreen:

And BTW, I did not invent reductions, the one who put me on track was Erik van Riet Paap.
To quote GCP : "inventing reductions is trivial"

They didn't work well for me Ed, it won 2 ply for me in 1999, just like today, but didn't give elo as it tactical weakened Diep; back in 1999 something that tactical weakened engine a lot was on border of suicide (not as bad anymore as in 1997).

It was Bruce (Moreland) who had tactical the strongest engine by far in world champs 1999 by the way.

As i could test some positions on it here at home at his world championship box, i can be your witness of that :)
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: Some thoughts on QS

Post by diep »

Don wrote:
Rebel wrote:
diep wrote:I invented reductions in 1999.
I used them since 1996 with Rebel 8. You stole them from me Vince?

:mrgreen:

And BTW, I did not invent reductions, the one who put me on track was Erik van Riet Paap.
I experimented with reductions without knowing anything about them from anyone else. But I was too stupid to stick with it even though I was getting some tantalizing big speedups. This was in the late 80's - and I have no idea whether anyone else had started using them by then but I would be very surprised if not.
Well i'm not convinced they would have given anything in 80s.

We can prove they weaken you tactical and they require overhead - basically you need a good working transpositiontable for reductions.

From what i remember Ed was using them with reductionfactor 2, which is double what i was doing, that should've been more effective back then than today.

Please note there is a lot of alternatives in how you reduce and whether some methods you can see as a form as reductions or simply an alternative to alfabeta.

If we use the same conventions like they use in patent industry, then basically we have already 100 alternatives to alfabeta :)

Even PVS isn't strictly spoken alfabeta in such case...

Some methods from back then which worked magnificent end of 80s, they simply might have major problems if you'd revive them.

Most of those systems remain unpublished however.

All those systems from back then share they prune a lot more than LMR does at shallow depths - they do NOT cooperate well with hashtable though most of the time.

Around summer 2004 i invented a new system, it's pretty complex - maybe i'll do effort one of these days to get it to work. So far i didn't use it. Despite sharing it with some other programmers - you never get feedback from those guys. That's the sick reality of computerchess.

To quote Stefan MK from world champs 2004 : "it's so much work to get a new deep searching system to work, and if it works well, then they steal it within a few days debugging Shredder".

Yet with todays hardware a lot more is possible than 80s and 90s, as you simply didn't have the system time to use complex algorithms that require an overhead.

I tend to believe though that it's some of the more complex systems that will do best objectively seen - but at the same time the realization is there that it's very difficult to ever see someone do the effort for that.

What we see now basically is all cheapo tricks getting tested.
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: Some thoughts on QS

Post by diep »

Don wrote:
diep wrote: Additionally it's you who in 90s already posted the indication that at bigger search depths the elowin for basically anything is smaller.

Claiming the opposite now is contradicting that claim, and rather naive claim.
I think you misunderstood something about what I said either then or now. I believe that with depth any superiority is reduced generally - the ELO gap closes between a weak and strong program in general, but a terribly written unscalable program may actually lose ground with depth. You can easily write a program that does not scale well and loses ground to other programs with depth. That is not an absolute that it can never happen.

The programs of 30 years ago - play them against Komodo and handicap Komodo to be equal in strength - then keep doubling the time control for each program and you will see Komodo's ELO increase relative to them with each doubling.
We're not speaking about the past here however. You did do a claim it's 150 elopoints for LMR at a small search depth that's not realistic and THEREFORE you claimed it would be giving even more elo at slower time controls than the superbullet you tested.

I really have to see that first and i really note that even in this reply you admit that t osummerize your statement here: 'in general with depth superiority gets reduced'.

So that's contradictary to your claim that with increased depth it wins more than 150 elo.

I seriously doubt that claim.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Some thoughts on QS

Post by Don »

diep wrote:
Don wrote:
diep wrote: Additionally it's you who in 90s already posted the indication that at bigger search depths the elowin for basically anything is smaller.

Claiming the opposite now is contradicting that claim, and rather naive claim.
I think you misunderstood something about what I said either then or now. I believe that with depth any superiority is reduced generally - the ELO gap closes between a weak and strong program in general, but a terribly written unscalable program may actually lose ground with depth. You can easily write a program that does not scale well and loses ground to other programs with depth. That is not an absolute that it can never happen.

The programs of 30 years ago - play them against Komodo and handicap Komodo to be equal in strength - then keep doubling the time control for each program and you will see Komodo's ELO increase relative to them with each doubling.
We're not speaking about the past here however. You did do a claim it's 150 elopoints for LMR at a small search depth that's not realistic and THEREFORE you claimed it would be giving even more elo at slower time controls than the superbullet you tested.

I really have to see that first and i really note that even in this reply you admit that t osummerize your statement here: 'in general with depth superiority gets reduced'.

So that's contradictary to your claim that with increased depth it wins more than 150 elo.

I seriously doubt that claim.
The level I tested at was quite fast so I admit that I do not know how it would work out at long time controls. With increasing time it's true that general superiority is reduced due to more draws and the fact that you get closer to perfect play. So I cannot say for sure that you are wrong.

But this is an experiment ANYONE can do with Komodo - Komodo has an option to turn off LMR. So perhaps someone would be willing to play Komodo LMR against Komodo no LMR at something like 1+1 fischer, which is substantially longer than I ran but fast enough to hope to get a few hundred games in a couple of days or so.

Don
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.