I wonder if people did the following test for evaluation

Discussion of chess software programming and technical issues.

Moderator: Ras

Uri Blass
Posts: 10787
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

I wonder if people did the following test for evaluation

Post by Uri Blass »

1)delete productive knowledge from your evaluation that is relevant for all stages of the game(for example evaluating passed pawns).

2)Call the version without the knowledge version A and the version with the knowledge version B

3)play a match between A and B when you give A enough time advantage to score 49-51% against B in a match of 1000 games.

4)repeat the match from the same positions at longer time control
when you multiply the times of A and B by 10.

I expect B to win at longer time control but I wonder if there is an evidence that it really happens.

Uri
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: I wonder if people did the following test for evaluation

Post by Michael Sherwin »

I doubt that most of us have the time needed for such a test. Maybe Bob can try it if you can tell him why it would be worthwhile to do. What important question are you trying to have answered or is this just for curiosity?
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Uri Blass
Posts: 10787
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: I wonder if people did the following test for evaluation

Post by Uri Blass »

Michael Sherwin wrote:I doubt that most of us have the time needed for such a test. Maybe Bob can try it if you can tell him why it would be worthwhile to do. What important question are you trying to have answered or is this just for curiosity?
The question is if it is correct that general evaluation knowledge helps more at long time control.

I believe that it helps more but it is only an opinion with no evidence.

Note that when I say general evaluation knowledge I mean to evaluation knowledge that is relevant not only for specific endgames but for most of the game.

It can be knowledge about pawn structure or knowledge about mobility or
knowledge about correct values for the pieces.

I remember that Dann Corbit believed that lower material values are better at long time control.

He claimed that material values of 0 may be best at very long time control.

My opinion is also that slightly smaller material values may be better at long time control but I think that significantly lower material values are worse at longer time control.

It may be interesting to test to find out.
Movei is not a good candidate here because I do pruning decisions that are based on evaluation so simply dividing material values by 2 may change the search and even multiplying all the evaluation by 2 is going to give different results.

In order to check the theory we need to use a simpler program that multiplying the evaluation by 2 change nothing.

Uri
User avatar
Bill Rogers
Posts: 3562
Joined: Thu Mar 09, 2006 3:54 am
Location: San Jose, California

Re: I wonder if people did the following test for evaluation

Post by Bill Rogers »

I think the point of confusion is what is really gained in long time controls.
Let me put it this way which would be better a 4-ply search or a 6-ply search. What if during a 6-ply search you discovered a good path to mate
and that in a 4-ply search you could not. To me this is almost the same as the quesicense search. I doubt if anyone could or would refute the advante of using it.
Just my opinion.
Bill
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: I wonder if people did the following test for evaluation

Post by Michael Sherwin »

It may be a bell curve that Dann, is noticing only half of. I would think that a 40 ply search would be hurt by smaller piece values (same as higher positional values, is it not) that would land the program in a lost position if the larger positional factor was not justified. So, at short time limits due to poor ability to see tactics, large positional factors are not all that useful and at 40 ply search times they are not that useful. That is my guess.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Dann Corbit
Posts: 12777
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: I wonder if people did the following test for evaluation

Post by Dann Corbit »

In order for the material values to equal zero, you would have to see clear to the end of the game (so 12000 plies, give or take).

For instance, when you can see clear to the end of the game (1,1/2, or 0) then the material is "immaterial".

I think that the reduction in material will be gradual.

The reason I think that material values are less important with deep search is that the program can see the damage inflicted by the heavy pieces. But it will require extreme depths to get a very clear picture.
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: I wonder if people did the following test for evaluation

Post by Michael Sherwin »

One ply before a confirmed (by rules) result material is still very valuable as the result can not be seen.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Pradu
Posts: 287
Joined: Sat Mar 11, 2006 3:19 am
Location: Atlanta, GA

Re: I wonder if people did the following test for evaluation

Post by Pradu »

Uri wrote:The question is if it is correct that general evaluation knowledge helps more at long time control.
A cheap eval might make your engine have twice the nps as it would have with an expensive eval and perhaps leads to a consistent extra ply. I think the important thing is to know how much the ELO gain diminishes for this extra ply at higher search depths. Also we will have to think about the granuality of the eval. Generally, the coarser your eval, the more cutoffs you will have and it might perhaps change the branching factor of your search. Your search advantage will increase with a lower branching factor at higher time controls so a coarser eval might even be better than a very fine eval at higher time controls. Anyways this is just my intuition; I could be wrong, but I think a cheap (only the most important factors) and reasonably coarse eval will do better at extremely long time controls.
Uri Blass
Posts: 10787
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: I wonder if people did the following test for evaluation

Post by Uri Blass »

Pradu wrote:
Uri wrote:The question is if it is correct that general evaluation knowledge helps more at long time control.
A cheap eval might make your engine have twice the nps as it would have with an expensive eval and perhaps leads to a consistent extra ply. I think the important thing is to know how much the ELO gain diminishes for this extra ply at higher search depths. Also we will have to think about the granuality of the eval. Generally, the coarser your eval, the more cutoffs you will have and it might perhaps change the branching factor of your search. Your search advantage will increase with a lower branching factor at higher time controls so a coarser eval might even be better than a very fine eval at higher time controls. Anyways this is just my intuition; I could be wrong, but I think a cheap (only the most important factors) and reasonably coarse eval will do better at extremely long time controls.

I talk only about knowledge that help at fast time control.
quote from Vasik:
"chess knowledge wins chess games. If it doesn't, it isn't knowledge."

Do you suggest that knowledge that help at fast time control can be counter productive at long time control?

My intuition is that it is not the case.
I also do not think that your search advantage is going to increase with inferior evaluation.

Nominal depth may be misleading because with inferior evaluation you may often miss that a move is a threat in case that you use null move pruning so you will not consider it.

Uri
Alessandro Scotti

Re: I wonder if people did the following test for evaluation

Post by Alessandro Scotti »

I tested something like this in an older version of Hamsters, playing 200 or 300 blitz games for each match. Results were 65% for version with passed pawns and 55% for version with king attack (this result doesn't look very good). I have no data at longer time controls.