Colossus 2007a - early impression

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

Uri Blass
Posts: 10783
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Colossus 2007a - early impression

Post by Uri Blass »

tiger wrote:
Uri Blass wrote:
Kirill Kryukov wrote:I did a quick test of Colossus 2007a under CCRL 40/4 conditions. The rating is 2686 ELO points, after 224 games. This makes it #16 engine in CCRL 40/4 Free Single-CPU list (my version which includes only stable public releases with default settings).

It's a bit early to make conclusion, ranking may change after more games (which are running right now). Still I hope this improvement can stand. :-)

Those few games were enough to get 97.3% LOS (Likelihood of Superiority) over previous version - 2006f which is rated 2644 (42 points difference).

All results of Colossus 2007a to date.

Comparison of 3 Colossus versions we tested

What makes me more happy personally is that new version does not crash when accessing tablebases on my Vista machine like 2006f did. :-)

Best,
Kirill
Note that movei personality 10 10 10 that is free and can be used by everyone is not in the list(only in the complete list) inspite of the fact that the tests suggest that it is better than the default(not enough games to know but other tests that I did also support it)

Movei 0.08.403(10 10 10) 2646 +27 −27 46.6% +22.0 25.2% 476
50.9%
Movei 0.08.403 2635 +20 −20 52.7% −22.6 30.1% 860

Uri

Uri, I seem to remember that this "10 10 10" stuff is somewhat related to "progress". Can you explain the concept? I have been playing with what I believe is a similar concept and I wanted to know about yours.


// Christophe
The concept is that movei evaluates the path and not only the leaf position(progress 0 0 0 means no path dependent evaluation).

Unfortunately it seems not to help much and my guess is that it gives me only 30 elo improvement even with best parameters.

I am sure it is possible to improve it by code changes(different path evaluation) and I did not do a lot of tests of different ideas of path dependent evaluations but
I think that it is better if I will try to use hash more efficiently (for pruning) because I probably can earn more from effective use of hash
even if I need to avoid progress.

I already had one try not to use progress in previous version 383 but unfortunately I failed to use hash for pruning after that version(I tried only using hash for pruning in qsearch to save qsearch nodes and it did not save nodes and instead of investigating the problem I prefered to try other ideas)


progress 10 10 10 means that the path evaluation can be 0.1 pawns or 0.1+0.1 pawns or 0.1+0.1+0.1 pawns different than the static evaluation that is dependent only on the leaf.

Uri
User avatar
tiger
Posts: 819
Joined: Sat Mar 11, 2006 3:15 am
Location: Guadeloupe (french caribbean island)

Re: Colossus 2007a - early impression

Post by tiger »

Uri Blass wrote:
tiger wrote:
Uri Blass wrote:
Kirill Kryukov wrote:I did a quick test of Colossus 2007a under CCRL 40/4 conditions. The rating is 2686 ELO points, after 224 games. This makes it #16 engine in CCRL 40/4 Free Single-CPU list (my version which includes only stable public releases with default settings).

It's a bit early to make conclusion, ranking may change after more games (which are running right now). Still I hope this improvement can stand. :-)

Those few games were enough to get 97.3% LOS (Likelihood of Superiority) over previous version - 2006f which is rated 2644 (42 points difference).

All results of Colossus 2007a to date.

Comparison of 3 Colossus versions we tested

What makes me more happy personally is that new version does not crash when accessing tablebases on my Vista machine like 2006f did. :-)

Best,
Kirill
Note that movei personality 10 10 10 that is free and can be used by everyone is not in the list(only in the complete list) inspite of the fact that the tests suggest that it is better than the default(not enough games to know but other tests that I did also support it)

Movei 0.08.403(10 10 10) 2646 +27 −27 46.6% +22.0 25.2% 476
50.9%
Movei 0.08.403 2635 +20 −20 52.7% −22.6 30.1% 860

Uri

Uri, I seem to remember that this "10 10 10" stuff is somewhat related to "progress". Can you explain the concept? I have been playing with what I believe is a similar concept and I wanted to know about yours.


// Christophe
The concept is that movei evaluates the path and not only the leaf position(progress 0 0 0 means no path dependent evaluation).

Unfortunately it seems not to help much and my guess is that it gives me only 30 elo improvement even with best parameters.

I am sure it is possible to improve it by code changes(different path evaluation) and I did not do a lot of tests of different ideas of path dependent evaluations but
I think that it is better if I will try to use hash more efficiently (for pruning) because I probably can earn more from effective use of hash
even if I need to avoid progress.

I already had one try not to use progress in previous version 383 but unfortunately I failed to use hash for pruning after that version(I tried only using hash for pruning in qsearch to save qsearch nodes and it did not save nodes and instead of investigating the problem I prefered to try other ideas)


progress 10 10 10 means that the path evaluation can be 0.1 pawns or 0.1+0.1 pawns or 0.1+0.1+0.1 pawns different than the static evaluation that is dependent only on the leaf.

Uri

So how do you evaluate the path?

My idea was that if it is possible to reach a position by 2 different paths, it is best to follow the "safest" path. The idea is that you may discover later that the position you wanted to reach is not good because after reaching it something bad happens and the horizon effect has hidden it. In this case, you will have to change your mind. But you will have to find a new variation in the middle of one of the paths leading to the position you wanted to reach initially, because you realize the position is bad only after making a few moves, using one path or another.

My assumption is that it is easier to find a new safe position to reach from a path consisting of "safe" positions than from a path consisting of "unsafe" positions.

For example assume that in some position it is possible to exchange several pieces, the exchange leading to a better position for you, and there are two ways of doing it:
- first way ("path") is to sacrifice 3 pieces, then you get them back with a deep combination
- second path is to exchange them one after the other (you capture, your opponent is forced to recapture, and so on...)
In this position, a path independant evaluation will chose one path AT RANDOM!
A path dependant evaluation will chose on purpose the second way, because at no point during the variation the program is behind in material. So if it turns out after starting the exchanges that the sequence cannot go on because of some unseen threat (it was too deep to be seen before the exchanges started), then at least the program can look for an alternative from a position where it is not behind in material.

By extension, the same idea applies when it is possible to reach not the same position by two different path, but also when you can reach two positions that have the same evaluation by two different paths.

And it can go as far as including the path into the evaluation, so a program could choose to reach a position with a lower evaluation just because the path leading to it is safer.

In current chess programs, there is only one reason holding us from doing path dependant evaluation: it is the recognition of transpositions thru the use of the hash table. A path dependant evaluation gives a different definition of a transposition, and this definition is not compatible with how we use the hash table.

However it does not mean that a path dependant evaluation is fundamentally incompatible with transposition detection by the use of a hash table.

The most obvious way to avoid the problem is to stop using the hash table for transposition detection. Maybe the gains of transposition detection would be largely compensated by the gains of path dependant evaluation. As I understand, you do not use transposition detection at this time in Movei anyway.

But I think it is possible to design a transposition detection system that would work well together with path dependant evaluation. For example if the path dependant part of the evaluation is the same for two paths leading to the same position, then the transposition detection can be used as usual (the search simply returns the exact score stored in the hash table, if it has one, for the position).

Also, if the path dependant part of the evaluation is constrained between known bounds (for example [-0.50;+0.50]), then the scores stored in the hash table can also be used as bounds for beta cutoffs.

What I could do is run test matches using a version of Chess Tiger that would not use the hash table for transposition detection (it would just use it for move ordering) and see how much elo would be lost from not detecting transpositions. That would give a lower estimate of how much a path dependant evaluation should gain in order to overcome the loss of transposition detection.


// Christophe
Uri Blass
Posts: 10783
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Colossus 2007a - early impression

Post by Uri Blass »

tiger wrote:
Uri Blass wrote:
tiger wrote:
Uri Blass wrote:
Kirill Kryukov wrote:I did a quick test of Colossus 2007a under CCRL 40/4 conditions. The rating is 2686 ELO points, after 224 games. This makes it #16 engine in CCRL 40/4 Free Single-CPU list (my version which includes only stable public releases with default settings).

It's a bit early to make conclusion, ranking may change after more games (which are running right now). Still I hope this improvement can stand. :-)

Those few games were enough to get 97.3% LOS (Likelihood of Superiority) over previous version - 2006f which is rated 2644 (42 points difference).

All results of Colossus 2007a to date.

Comparison of 3 Colossus versions we tested

What makes me more happy personally is that new version does not crash when accessing tablebases on my Vista machine like 2006f did. :-)

Best,
Kirill
Note that movei personality 10 10 10 that is free and can be used by everyone is not in the list(only in the complete list) inspite of the fact that the tests suggest that it is better than the default(not enough games to know but other tests that I did also support it)

Movei 0.08.403(10 10 10) 2646 +27 −27 46.6% +22.0 25.2% 476
50.9%
Movei 0.08.403 2635 +20 −20 52.7% −22.6 30.1% 860

Uri

Uri, I seem to remember that this "10 10 10" stuff is somewhat related to "progress". Can you explain the concept? I have been playing with what I believe is a similar concept and I wanted to know about yours.


// Christophe
The concept is that movei evaluates the path and not only the leaf position(progress 0 0 0 means no path dependent evaluation).

Unfortunately it seems not to help much and my guess is that it gives me only 30 elo improvement even with best parameters.

I am sure it is possible to improve it by code changes(different path evaluation) and I did not do a lot of tests of different ideas of path dependent evaluations but
I think that it is better if I will try to use hash more efficiently (for pruning) because I probably can earn more from effective use of hash
even if I need to avoid progress.

I already had one try not to use progress in previous version 383 but unfortunately I failed to use hash for pruning after that version(I tried only using hash for pruning in qsearch to save qsearch nodes and it did not save nodes and instead of investigating the problem I prefered to try other ideas)


progress 10 10 10 means that the path evaluation can be 0.1 pawns or 0.1+0.1 pawns or 0.1+0.1+0.1 pawns different than the static evaluation that is dependent only on the leaf.

Uri

So how do you evaluate the path?

My idea was that if it is possible to reach a position by 2 different paths, it is best to follow the "safest" path. The idea is that you may discover later that the position you wanted to reach is not good because after reaching it something bad happens and the horizon effect has hidden it. In this case, you will have to change your mind. But you will have to find a new variation in the middle of one of the paths leading to the position you wanted to reach initially, because you realize the position is bad only after making a few moves, using one path or another.

My assumption is that it is easier to find a new safe position to reach from a path consisting of "safe" positions than from a path consisting of "unsafe" positions.

For example assume that in some position it is possible to exchange several pieces, the exchange leading to a better position for you, and there are two ways of doing it:
- first way ("path") is to sacrifice 3 pieces, then you get them back with a deep combination
- second path is to exchange them one after the other (you capture, your opponent is forced to recapture, and so on...)
In this position, a path independant evaluation will chose one path AT RANDOM!
A path dependant evaluation will chose on purpose the second way, because at no point during the variation the program is behind in material. So if it turns out after starting the exchanges that the sequence cannot go on because of some unseen threat (it was too deep to be seen before the exchanges started), then at least the program can look for an alternative from a position where it is not behind in material.

By extension, the same idea applies when it is possible to reach not the same position by two different path, but also when you can reach two positions that have the same evaluation by two different paths.

And it can go as far as including the path into the evaluation, so a program could choose to reach a position with a lower evaluation just because the path leading to it is safer.

In current chess programs, there is only one reason holding us from doing path dependant evaluation: it is the recognition of transpositions thru the use of the hash table. A path dependant evaluation gives a different definition of a transposition, and this definition is not compatible with how we use the hash table.

However it does not mean that a path dependant evaluation is fundamentally incompatible with transposition detection by the use of a hash table.

The most obvious way to avoid the problem is to stop using the hash table for transposition detection. Maybe the gains of transposition detection would be largely compensated by the gains of path dependant evaluation. As I understand, you do not use transposition detection at this time in Movei anyway.

But I think it is possible to design a transposition detection system that would work well together with path dependant evaluation. For example if the path dependant part of the evaluation is the same for two paths leading to the same position, then the transposition detection can be used as usual (the search simply returns the exact score stored in the hash table, if it has one, for the position).

Also, if the path dependant part of the evaluation is constrained between known bounds (for example [-0.50;+0.50]), then the scores stored in the hash table can also be used as bounds for beta cutoffs.

What I could do is run test matches using a version of Chess Tiger that would not use the hash table for transposition detection (it would just use it for move ordering) and see how much elo would be lost from not detecting transpositions. That would give a lower estimate of how much a path dependant evaluation should gain in order to overcome the loss of transposition detection.


// Christophe
I will not give exact details of my implementation but in my case the idea is simply to compare the evaluation of the position with the evaluation of earlier positions in the same path and if I see progress in the numbers I give a bonus.

The idea is that you can improve good positions but you cannot improve bad positions so the fact that there is an improvement inside the path suggests that the position is good.

Note that I evaluate every node because I use the evaluation numbers also to decide if to prune or not to prune.

Uri
Uri Blass
Posts: 10783
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Colossus 2007a - early impression

Post by Uri Blass »

I can add that my path dependent evaluation is different than your path dependent evaluation idea.

If I sacrifice material and get it back I will prefer it relative to line when I do not sacrifice material because I get improvement in the end of the line.

I think that it is logical to do it because it is possible that practically I win more material and not searching deep enough hides it.

Uri
Anabolic Karpov

Re: Colossus 2007a - early impression

Post by Anabolic Karpov »

Referring back to original post that seemed very quickly to transform into movei post!?

Colossus 2007a seems a good tactical engine that should be useful for position analysis. One of the few engines to score 100% in Richter.epd test set and scored 22/27 in BS2830.epd, same as Deep Fritz10! Nice one Martin!!