IPON ratings calculation

Adam Hair · Post by **Adam Hair** » Fri Dec 30, 2011 4:06 am

IWB wrote:Hi Larry,

lkaufman wrote:... We did not design our time control for ponder on games, and I'm thinking this was a big mistake. This probably hurts our results in your testing and IPON. Maybe we can correct this. ...
For a very simple reason I am a bit suprised by this statement. Try to think the other way around! Basicaly engines are used only for two things:

1. Analysis (mainly)
2. To play against (OTB or on a server)

In case 2 the question is who is playing ponder OFF? The only people who are doing this are a few rating lists (for historic reasons - and now they dont want to trow away the games). Everyone else (!) is always plaing PON (and is loosing, therefore a good method to limit playing strength is important as well)! So, a good Ponder ON time management is much more important than the Ponder off thing!
I consider Ponder off as completly artifical and useless, sorry. You are right with the number of games, but that is an argument coming from times where there where onyl single CPUs. Nowadays it is possible to play a sufficiant number of games with ponder on. (I admit that engine development, with very short time controls is more practicable with POFF, but that has nothing to do with real game play - the other devices where ponder off might be used are smartphones or other mobile devices to save energy, but there, against humans, the timing of that ponder off games is less important ...)

Again, any real chess game played by humans, in a computer WC or at a server is Ponder ON. I consider this as real chess and ponder OFF as some kind of subgroup for special purposes.

Regards and a few more "happy holidays"
Ingo

EDIT: If you start to make developments to please the rating list and not the users this will backfire! Someone will come up with a new, better method of testing (it already happened and will happen again imho) and then you have to adapt again, and again ...

If your total focus is testing the top engines and you are fortunate enough to have multiple computers available for testing, then why not use ponder on? It is easy enough when you only have to stay on top of a couple of dozen engines. If you can afford to do it, then do it.

However, when you try to maintain multiple lists containing 200 to 300 engines (and adding more all of the time), ponder off makes a lot of sense. In addition, when you compare the results of ponder off testing with ponder on testing, it is hard to discern much difference. Given the differences in focus between IPON and CEGT/CCRL and the lack of truly demonstrative proof that ponder off is less accurate in practice than ponder on, I find the statement "I consider Ponder off as completly artifical and useless, sorry." to be off the mark.

Frank Quisinsky · Post by **Frank Quisinsky** » Fri Dec 30, 2011 4:33 am

Hi Adam,

I am playing so many years eng-eng. The double of time as CEGT or CCRL is available and all this with ponder = on. Only with the first winboard compatibilty engines I played on AMD K6 with ponder = off.

In the first times with Dual Pentium III 733Mhz, later with Dual Xeon 2.8 GHz. In ATL-League (also a rating list in times CCRL / CEGT started) I try out on AMD 3800+ games with ponder = off. I had no fun on such "half" games because if you like it to looking live in the games, ponder = on games will give you a complete other level, a normaly level.

Ponder = off games have nothing to do with realistic chess. Noboday set his brain = off if the other one calculate his move. I think this have nothing to do with chess and must get an other name, perhaps PoChessOff. Means, this is an other variant of chess, like FRC / Chess960.

In the first times of Winboard and the compatibilty fight engines calculate ponder hits for his time management. Today I believe the most programmers don't do it.

Many engines have problems with time loses or GUI crashes with ponder = on.

A lot of more problems you will have if you play with ponder = on. And again more problems you will have with resign = off

Examples: Ktulu, Booot, Bison
Hard work to create ponder=on games with this group of strong engines. Too many GUI crashes!

You have right if you say ...
OK, but the results from ponder = off games are the same as ponder = on games. Clear, the most engines have the same ELO statistics with the half on time.

Looking in IPON and SWCR, SWCR has the double of time. You will find only differents for Junior, perhaps Zappa.

In the first beginning only a hand full testers can buy Dual Processor machines (very expensive). Most are playing with ponder = off. Today most are playing with ponder = off because more games can be produced. I can understand it, but never I will do that again because ponder = on games are much more interesting for looking, this is real chess! Ponder off is nothing, not interesting material for analyzes. Interesting only for statistics.

Again, try out a match with 10 games and ponder = on if you have time for looking live in the games. You can see what I mean, complete other chess, much more interesting.

Best
Frank

Graham Banks · Post by **Graham Banks** » Fri Dec 30, 2011 4:42 am

Frank Quisinsky wrote:Hi Adam,

I am playing so many years eng-eng. The double of time as CEGT or CCRL is available and all this with ponder = on. Only with the first winboard compatibilty engines I played on AMD K6 with ponder = off.

In the first times with Dual Pentium III 733Mhz, later with Dual Xeon 2.8 GHz. In ATL-League (also a rating list in times CCRL / CEGT started) I try out on AMD 3800+ games with ponder = off. I had no fun on such "half" games because if you like it to looking live in the games, ponder = on games will give you a complete other level, a normaly level.

Ponder = off games have nothing to do with realistic chess. Noboday set his brain = off if the other one calculate his move. I think this have nothing to do with chess and must get an other name, perhaps PoChessOff. Means, this is an other variant of chess, like FRC / Chess960.

In the first times of Winboard and the compatibilty fight engines calculate ponder hits for his time management. Today I believe the most programmers don't do it.

Many engines have problems with time loses or GUI crashes with ponder = on.

A lot of more problems you will have if you play with ponder = on. And again more problems you will have with resign = off

Examples: Ktulu, Booot, Bison
Hard work to create ponder=on games with this group of strong engines. Too many GUI crashes!

You have right if you say ...
OK, but the results from ponder = off games are the same as ponder = on games. Clear, the most engines have the same ELO statistics with the half on time.

Looking in IPON and SWCR, SWCR has the double of time. You will find only differents for Junior, perhaps Zappa.

In the first beginning only a hand full testers can buy Dual Processor machines (very expensive). Most are playing with ponder = off. Today most are playing with ponder = off because more games can be produced. I can understand it, but never I will do that again because ponder = on games are much more interesting for looking, this is real chess! Ponder off is nothing, not interesting material for analyzes. Interesting only for statistics.

Again, try out a match with 10 games and ponder = on if you have time for looking live in the games. You can see what I mean, complete other chess, much more interesting.

Best
Frank

Each to his or her own Frank.

Adam does make a good point about the range of engines tested by CEGT and CCRL though.
The ponder on rating lists only seem to test a limited range of engines. However, there are some enthusiasts and engine authors who are interested in the weaker engines too.
Not a criticism, just an observation.

Cheers,
Graham.

Frank Quisinsky · Post by **Frank Quisinsky** » Fri Dec 30, 2011 5:06 am

Hi Graham,

that is a good point, yes!
I can hold on 4 Quad Core machines perhaps 40 different engines in actual version, not more. I need 8 Quad Core systems to test all of new engines with the conditions I used for SWCR. This is bad, yes because I can't test all the interesting amateur engines on weaker level.

But why I should gave ponder = off my attention if I have a lot of more fun in looking ponder = on games?

No computer chess fan will play on chess server with ponder = off. No chess player will used for a match vs. an engine ponder = off. But the most people which used ponder = off for producing more games search and search reasons for an explanation for ownself.

Again, for stats is not very important to use ponder = off or ponder = on. But I wil never understand why people do such things because the fun factor is 50% only.

Believe me, if you are not working for a rating list with ponder = off and you don't need many games never you would play with ponder = off on modern hardware. Nobody do such things. Only for producing fast and many results. Perhaps people which do that will give his own results for the others in chess fora, I don't know.

Most problem computer chess people have is ...

PATIENCE
Patience isn't a strength from males.

A good result need time and good conditions.

Most cant build patience!

Engine x is available and 5 seconds later the first need a rating

Crazy!!

For for this group of people we have IPON

Best
Frank

Adam Hair · Post by **Adam Hair** » Fri Dec 30, 2011 5:14 am

Frank Quisinsky wrote:Hi Adam,

I am playing so many years eng-eng. The double of time as CEGT or CCRL is available and all this with ponder = on. Only with the first winboard compatibilty engines I played on AMD K6 with ponder = off.

In the first times with Dual Pentium III 733Mhz, later with Dual Xeon 2.8 GHz. In ATL-League (also a rating list in times CCRL / CEGT started) I try out on AMD 3800+ games with ponder = off. I had no fun on such "half" games because if you like it to looking live in the games, ponder = on games will give you a complete other level, a normaly level.

Ponder = off games have nothing to do with realistic chess. Noboday set his brain = off if the other one calculate his move. I think this have nothing to do with chess and must get an other name, perhaps PoChessOff. Means, this is an other variant of chess, like FRC / Chess960.

In the first times of Winboard and the compatibilty fight engines calculate ponder hits for his time management. Today I believe the most programmers don't do it.

Many engines have problems with time loses or GUI crashes with ponder = on.

A lot of more problems you will have if you play with ponder = on. And again more problems you will have with resign = off

Examples: Ktulu, Booot, Bison
Hard work to create ponder=on games with this group of strong engines. Too many GUI crashes!

You have right if you say ...
OK, but the results from ponder = off games are the same as ponder = on games. Clear, the most engines have the same ELO statistics with the half on time.

Looking in IPON and SWCR, SWCR has the double of time. You will find only differents for Junior, perhaps Zappa.

In the first beginning only a hand full testers can buy Dual Processor machines (very expensive). Most are playing with ponder = off. Today most are playing with ponder = off because more games can be produced. I can understand it, but never I will do that again because ponder = on games are much more interesting for looking, this is real chess! Ponder off is nothing, not interesting material for analyzes. Interesting only for statistics.

Again, try out a match with 10 games and ponder = on if you have time for looking live in the games. You can see what I mean, complete other chess, much more interesting.

Best
Frank

Hi Frank,

I agree wholeheartedly with you. Ponder off is for statistics. Ponder on is for the games, as well as for competition. I know that your focus is on the games, especially how well an engine does in each phase of its games. Your rating list is not your sole concern. However, this does mean you focus on the top engines. That's great. Most people are interested in those engines. But there are hundreds of other engines, such as AmMon, that would not be tested nowdays if everyone focused on the top engines.

Anyway, to each his own, as Graham said.

May the new year be good to you,

Adam

lkaufman · Post by **lkaufman** » Fri Dec 30, 2011 5:25 am

Frank Quisinsky wrote:Hi Larry,

I am visiting a lot of Komodo games live and I am looking in each game on the clock before the second or third time control started (40 in 10 repeatedly I used with ponder = on).

I think in case of Komodo it is to 95% perfect. No lost on time games from Komodo in actual versions and if 35-38 moves or 75-78 moves played Komodo have enough time on the clock.

From my point of view nothing to do here in Komodo. Also Komodo is playing the first moves after opening not to fast or to slow. A very fine work again if I am looking on time controls and ponder in Komodo games.

BTW:
Hash-Tables it the topic.
Komodo is very strong in endgames.

Hash-Tables are more important for endgames, not for the middle games. So it make sense to do the following:

If 32 pieces on the board = xMb for hash, example 128Mb
If 16 pieces on the board = xMb for hash, example 256Mb
If 08 pieces on the board = xMb for hash, example 512Mb

Could give Komodo again a little jumping with perhaps 2-3 ELO.

Easy to set as UCI option:
Variable Hash-Tables: yes / no

If yes ...
Hash from beginning = xMb
Hash with 16 pieces = xMb
Hash with 08 pieces = xMb

No program have it, I can't understand why?

Perhaps the UCI programmer have to set such an option in UCI protocol.

2.
The high move average in Komodo games comes with the problem that Komodo don't support endgame databases. My wish is the support Gaviota endgame databases.

At the moment two SWCR games are still running with KR vs. KR. Made no sense but I am playing without resign.

An interesting stat for yourself:
In 212 of 160.000 SWCR games much engine gave an advantage from +6. In most of cases bishop endgames with wrong pawns (not to win). Without resign game ended correct with 50-moves rule. With resign game will be ended with 1:0 or 0:1.

3.
Open positions are very danger for chess engines.
It made sense to calculate possible moves for each position.

If 50 moves are possible ... for an example after move 24. an engine should get a time bonus, should used a bit time more for calculate the answere. if 15 moves are possible after move 50 for an example the engine should play faster. This could be again 2-3 ELO. I think in blitz games much more.

In my analyzes I find out, that in open positions engines play to fast. For a progam like Komodo with all the positional strengths could be such an option 2-3 ELO, not sure. Never a program should play after a ponder hit directly if a lot of moves are possible. Better is to give the engine more time for such complicated positions.

I am not a programmer, tester only ...
Perhaps your team have interest to thinking about this ideas.

Best
Frank

Regarding hash tables, I thought that all the testers specify a hash table size. If a program has variable hash tables, that would make a problem for testing against programs that do not have it. How would it be set?
It's surprising to me that our time use is good in ponder on games. I think our values are reasonably good for no-ponder games, and since we did not make different values for ponder on, I would expect that we would play too fast early on, keeping too much time for the endgame. But you say this is not the case.
Regarding using more time in middlegame positions than endgame positions, we already use more time when we have more time, so this automatically makes the engine speed up in the endgame. But perhaps it could be improved.
Thanks for your suggestions. Maybe one or two will prove to be practical.

Best regards
Larry

Frank Quisinsky · Post by **Frank Quisinsky** » Fri Dec 30, 2011 5:30 am

Adam,

I don't like what I have to say now ...
I like more the amateurs on AnMon levels.
This bring me in a conflict again and again.

I have more fun with my Oldie-Mix tourney as to test Komodo and Critter. Komodo and Critter are to strong and often I am thinking I understand nothing if one combination hunt the next.

In all the time I have interest on computer chess I like tactic strong engines, like WChess, Kallisto, later Phalanx the first Gandalf versions, AnMon and so one and much more the weaker amateurs as the best available.

It's right what you say!
And for this reason CEGT and CCRL are much more important as SWCR or IPON.

For 2 1/2 years I started a work with Grandmaster Jörg Hickl for his magazine SCHACHWELT. I am working on interviews and information about computer chess after a longer break of three years. For this work I need own material and for Jörg Hickl are the best chess programs important. So I start SWCR (Schachwelt Computer Ratings). Months later the magazin in only online available and I lost my interest on it. Because in my interest is to give the club players with like to hold a magazine in the hand information about computer chess. Online magazines are enough available.

After my work for Jörg I am thinking ... you have the hardware, you gave so many own money for the hardware and so many time. It make no sense to closed SWCR. So my systems are playing and playing matches and tourneys again and again.

OK, I have fun with looking in the LIVE games at home but now after two years I saw really enough.

I think CCRL / CEGT are very important and both rating list should try to produce many games with many engines. This is the goal we have in these works. From my point of view my rating list is very good, OK but the others are more important and SWCR. I have more interest to inform the others and I like to look in the games or to create statistics, I like Excel

The rating list and the results I have was never my main point because to many important amateurs not play in SWCR.

With Dirty or Deuterium, Philou I have a lot of fun here for examples.

Best
Frank

On the other hand ...
I am not a fun from ponder = off games after so many years eng-eng with ponder = on.

lkaufman · Post by **lkaufman** » Fri Dec 30, 2011 5:33 am

IWB wrote:Hi Larry,

lkaufman wrote:... We did not design our time control for ponder on games, and I'm thinking this was a big mistake. This probably hurts our results in your testing and IPON. Maybe we can correct this. ...
For a very simple reason I am a bit suprised by this statement. Try to think the other way around! Basicaly engines are used only for two things:

1. Analysis (mainly)
2. To play against (OTB or on a server)

In case 2 the question is who is playing ponder OFF? The only people who are doing this are a few rating lists (for historic reasons - and now they dont want to trow away the games). Everyone else (!) is always plaing PON (and is loosing, therefore a good method to limit playing strength is important as well)! So, a good Ponder ON time management is much more important than the Ponder off thing!
I consider Ponder off as completly artifical and useless, sorry. You are right with the number of games, but that is an argument coming from times where there where onyl single CPUs. Nowadays it is possible to play a sufficiant number of games with ponder on. (I admit that engine development, with very short time controls is more practicable with POFF, but that has nothing to do with real game play - the other devices where ponder off might be used are smartphones or other mobile devices to save energy, but there, against humans, the timing of that ponder off games is less important ...)

Again, any real chess game played by humans, in a computer WC or at a server is Ponder ON. I consider this as real chess and ponder OFF as some kind of subgroup for special purposes.

Regards and a few more "happy holidays"
Ingo

EDIT: If you start to make developments to please the rating list and not the users this will backfire! Someone will come up with a new, better method of testing (it already happened and will happen again imho) and then you have to adapt again, and again ...

I believe that testing with ponder on won't give much different results than testing with ponder off, and I would rather see the time limit doubled instead. For purposes of determining which engine analyzes better, ponder on is pointless, while for playing humans any rating list is pointless, as all top engines clobber all humans. But it doesn't matter which is "best", we have CCRL and CEGT testing one way, and IPON and SWCR the other way, so we need to have good time management at both.

LucenaTheLucid · Post by **LucenaTheLucid** » Fri Dec 30, 2011 5:35 am

From watching Komodo play with ponder=off on my computer it seems to me like Komodo uses very little of its time and has too much left at the end.

As far as changing hash table sizes mid-game I don't know if the UCI protocol would allow it because I *think* the GUI sends the command for hash size but I could be wrong.

Graham Banks · Post by **Graham Banks** » Fri Dec 30, 2011 5:37 am

Frank Quisinsky wrote:Adam,

I don't like what I have to say now ...
I like more the amateurs on AnMon levels.
This bring me in a conflict again and again.

I have more fun with my Oldie-Mix tourney as to test Komodo and Critter. Komodo and Critter are to strong and often I am thinking I understand nothing if one combination hunt the next.

In all the time I have interest on computer chess I like tactic strong engines, like WChess, Kallisto, later Phalanx the first Gandalf versions, AnMon and so one and much more the weaker amateurs as the best available.

It's right what you say!
And for this reason CEGT and CCRL are much more important as SWCR or IPON.

For 2 1/2 years I started a work with Grandmaster Jörg Hickl for his magazine SCHACHWELT. I am working on interviews and information about computer chess after a longer break of three years. For this work I need own material and for Jörg Hickl are the best chess programs important. So I start SWCR (Schachwelt Computer Ratings). Months later the magazin in only online available and I lost my interest on it. Because in my interest is to give the club players with like to hold a magazine in the hand information about computer chess. Online magazines are enough available.

After my work for Jörg I am thinking ... you have the hardware, you gave so many own money for the hardware and so many time. It make no sense to closed SWCR. So my systems are playing and playing matches and tourneys again and again.

OK, I have fun with looking in the LIVE games at home but now after two years I saw really enough.

I think CCRL / CEGT are very important and both rating list should try to produce many games with many engines. This is the goal we have in these works. From my point of view my rating list is very good, OK but the others are more important and SWCR. I have more interest to inform the others and I like to look in the games or to create statistics, I like Excel

The rating list and the results I have was never my main point because to many important amateurs not play in SWCR.

With Dirty or Deuterium, Philou I have a lot of fun here for examples.

Best
Frank

On the other hand ...
I am not a fun from ponder = off games after so many years eng-eng with ponder = on.

No rating list is more important than any other. They all have value in contributing to the overall picture.

That is why I won't criticise what others do. Each enthusiast must do what provides the most satisfaction or fun for him or her.

IPON ratings calculation

Re: Not realistic!

Re: Not realistic!

Re: Not realistic!

Re: Not realistic!

Re: Not realistic!

Re: Komodo with ponder and 2 ideas for chess programmers!

Re: Not realistic!

Re: Not realistic!

Re: Not realistic!

Re: Not realistic!