Return on investments of different techniques and heuristic

mathmoi · Post by **mathmoi** » Mon Feb 01, 2010 5:42 am

Hi,

Some years ago, someone posted in CCC a list of heuristics with and estimated implementation cost and and estimated benefits. The list was then ordered so things like AB and Null-move were at the top because they have the best benefits/investment ratio.

I can't find this post. Does anyone else remember this and could help me finding it?

mathmoi · Post by **mathmoi** » Mon Feb 01, 2010 6:01 am

Nevermind, I found it here : http://www.stmintz.com/ccc/index.php?id=374313

BubbaTough · Post by **BubbaTough** » Mon Feb 01, 2010 6:15 am

mathmoi wrote:Nevermind, I found it here : http://www.stmintz.com/ccc/index.php?id=374313

An interesting, useful, and highly controversial table. I am not sure what the final verdict was. Here is one table I dug out:

Here's the revised table based on recent discussion:

EFFORT RETURN RETURN/EFFORT Feature
1 10 10 capture ordering
3 4 1.33 null move
4 4 1 null move with verification
4 4 1 search pv first
2 1 .5 fractional extensions
4 6 1.5 static exchange evaluator
4 5 1.25 transposition table
4 5 1.25 transposition table with 2-tier repl.
3 3 1 history heuristic, killers, other ordering
2 1 .5 aspiration
4 5 1.25 book learning
3 3 1 position learning
2 2 1 iterative deepening
3 1 .33 internal iterative deepening
2 2 1 pawn hashing w/ complex pawn evaluation
3 3 1 king+pawn hashing w/ king safety evaluation
3 2 .66 capture extension
1 5 5 check extension
1 1 1 pawn to 6th/7th extension
2 3 1.5 futility in qsearch
3 3 1 futility on frontier
3 3 1 futility on pre-frontier
3 2 .66 razoring
5 3 .6 mate-at-a-glance

-Sam

Sven · Post by **Sven** » Mon Feb 01, 2010 11:37 am

BubbaTough wrote:
mathmoi wrote:Nevermind, I found it here : http://www.stmintz.com/ccc/index.php?id=374313
An interesting, useful, and highly controversial table. I am not sure what the final verdict was. Here is one table I dug out:

Here's the revised table based on recent discussion:

EFFORT RETURN RETURN/EFFORT Feature
1 10 10 capture ordering
3 4 1.33 null move
4 4 1 null move with verification
4 4 1 search pv first
2 1 .5 fractional extensions
4 6 1.5 static exchange evaluator
4 5 1.25 transposition table
4 5 1.25 transposition table with 2-tier repl.
3 3 1 history heuristic, killers, other ordering
2 1 .5 aspiration
4 5 1.25 book learning
3 3 1 position learning
2 2 1 iterative deepening
3 1 .33 internal iterative deepening
2 2 1 pawn hashing w/ complex pawn evaluation
3 3 1 king+pawn hashing w/ king safety evaluation
3 2 .66 capture extension
1 5 5 check extension
1 1 1 pawn to 6th/7th extension
2 3 1.5 futility in qsearch
3 3 1 futility on frontier
3 3 1 futility on pre-frontier
3 2 .66 razoring
5 3 .6 mate-at-a-glance

-Sam

Same table as the one you quoted but ordered first by RETURN/EFFORT and second by RETURN, and slightly formatted:

Code: Select all

EFFORT RETURN RETURN/EFFORT Feature
     1     10        10.00  capture ordering
     1      5         5.00  check extension
     4      6         1.50  static exchange evaluator
     2      3         1.50  futility in qsearch
     3      4         1.33  null move
     4      5         1.25  transposition table
     4      5         1.25  transposition table with 2-tier repl.
     4      5         1.25  book learning
     4      4         1.00  null move with verification
     4      4         1.00  search pv first
     3      3         1.00  history heuristic, killers, other ordering
     3      3         1.00  position learning
     3      3         1.00  king+pawn hashing w/ king safety evaluation
     3      3         1.00  futility on frontier
     3      3         1.00  futility on pre-frontier
     2      2         1.00  iterative deepening
     2      2         1.00  pawn hashing w/ complex pawn evaluation
     1      1         1.00  pawn to 6th/7th extension
     3      2         0.67  capture extension
     3      2         0.67  razoring
     5      3         0.60  mate-at-a-glance
     2      1         0.50  fractional extensions
     2      1         0.50  aspiration
     3      1         0.33  internal iterative deepening

Sven

diep · Post by **diep** » Mon Feb 01, 2010 12:00 pm

Very ugly list.

Probably i would do better with some thinking but it would be like,
if we skip evaluation and all forms of tuning/testing and repetition detection:

1. quiescencesearch (not trivial to invent)
2. alfabeta (please note alfabeta is trivial to invent)
3. move ordering
4. hashtable (not just transpositiontable - also move ordering)
5. nullmove
6. extensions

really things like PVS and all other stuff is far less relevant.

Yet the above is pretty well known how to do it to some reasonable extend.

However the reality is that evaluation is far more important once you have done the above bugfree.

if we'd inject into above search diep's latest evaluation and then
grab rybka's search and get the best evaluation there was from the 90s,
and test it single core versus single core at the 90s time control of 3 minutes a move. Hell even 90s hardware if you want to.

You face 100% scores. the evaluation wins everything.

Now you'll argue that you need modern hardware.

So we do 3 minutes a move at a single core core2 2.4ghz or something, or a K8 whatever.

diep evaluation with simple search from 90s. Even improvements of the above search i made past years we can wash away.

So the above is like 10 ply search max versus 20 ply for todays deep searching engines.

if you pick the best evaluation function from 90s and let it play todays evaluation function from Diep, you total hammer away any search improvement.

So none of all those search improvements has any value without an up to date evaluation function. Without it, it's all elo 0.

It's really 100% score for todays evaluation function against 90s.

20 ply, 30 ply for 90s software, doesn't matter. You're dead against todays eval.

I noticed this effect very clearly in a big test we did do in the year 2000, after i had improved diep's evaluation to for what was back then modern standards. Jan Louwman at his 36 computers did do a lot of the testing for that.

You can definitely say i was quite shocked with the big progress been made and how world top engines from a few years before got total slaughtered.

Please note other evaluations than diep are more tricky, as they rely bigtime upon search for kingsafety. Most have shitzero kingsafety, diep has huge kingsafety knowledge obviously, so a small search depth for its kingsafety doesn't matter, where it would matter really bigtime for the fruit 2.1+ generation engines such as rybka, stockfish etc.

BubbaTough wrote:
mathmoi wrote:Nevermind, I found it here : http://www.stmintz.com/ccc/index.php?id=374313
An interesting, useful, and highly controversial table. I am not sure what the final verdict was. Here is one table I dug out:

Here's the revised table based on recent discussion:

EFFORT RETURN RETURN/EFFORT Feature
1 10 10 capture ordering
3 4 1.33 null move
4 4 1 null move with verification
4 4 1 search pv first
2 1 .5 fractional extensions
4 6 1.5 static exchange evaluator
4 5 1.25 transposition table
4 5 1.25 transposition table with 2-tier repl.
3 3 1 history heuristic, killers, other ordering
2 1 .5 aspiration
4 5 1.25 book learning
3 3 1 position learning
2 2 1 iterative deepening
3 1 .33 internal iterative deepening
2 2 1 pawn hashing w/ complex pawn evaluation
3 3 1 king+pawn hashing w/ king safety evaluation
3 2 .66 capture extension
1 5 5 check extension
1 1 1 pawn to 6th/7th extension
2 3 1.5 futility in qsearch
3 3 1 futility on frontier
3 3 1 futility on pre-frontier
3 2 .66 razoring
5 3 .6 mate-at-a-glance

-Sam

Uri Blass · Post by **Uri Blass** » Mon Feb 01, 2010 1:24 pm

diep wrote:Very ugly list.

Probably i would do better with some thinking but it would be like,
if we skip evaluation and all forms of tuning/testing and repetition detection:

1. quiescencesearch (not trivial to invent)
2. alfabeta (please note alfabeta is trivial to invent)
3. move ordering
4. hashtable (not just transpositiontable - also move ordering)
5. nullmove
6. extensions

really things like PVS and all other stuff is far less relevant.

Yet the above is pretty well known how to do it to some reasonable extend.

However the reality is that evaluation is far more important once you have done the above bugfree.

if we'd inject into above search diep's latest evaluation and then
grab rybka's search and get the best evaluation there was from the 90s,
and test it single core versus single core at the 90s time control of 3 minutes a move. Hell even 90s hardware if you want to.

You face 100% scores. the evaluation wins everything.

Now you'll argue that you need modern hardware.

So we do 3 minutes a move at a single core core2 2.4ghz or something, or a K8 whatever.

diep evaluation with simple search from 90s. Even improvements of the above search i made past years we can wash away.

So the above is like 10 ply search max versus 20 ply for todays deep searching engines.

if you pick the best evaluation function from 90s and let it play todays evaluation function from Diep, you total hammer away any search improvement.

So none of all those search improvements has any value without an up to date evaluation function. Without it, it's all elo 0.

It's really 100% score for todays evaluation function against 90s.

20 ply, 30 ply for 90s software, doesn't matter. You're dead against todays eval.

I noticed this effect very clearly in a big test we did do in the year 2000, after i had improved diep's evaluation to for what was back then modern standards. Jan Louwman at his 36 computers did do a lot of the testing for that.

You can definitely say i was quite shocked with the big progress been made and how world top engines from a few years before got total slaughtered.

Please note other evaluations than diep are more tricky, as they rely bigtime upon search for kingsafety. Most have shitzero kingsafety, diep has huge kingsafety knowledge obviously, so a small search depth for its kingsafety doesn't matter, where it would matter really bigtime for the fruit 2.1+ generation engines such as rybka, stockfish etc.

I do not understand your claim of 100% scores

Can you give exact conditions that people can test to see if they get 100% score?

What is the book that you use for both programs?

Are you talking about 10 plies against 20 plies?

I may try to test 20 plies of strelka with primitive evaluation against 10 plies of rybka3 if you are interested(hope that 20 plies of strelka with primitive evaluation does not take too much time and note that 10 plies of rybka3 are practically 12 plies and not 10 plies)

Edit:Note that when I say primitive evaluation I do not mean to strelka2's evaluation but to a modified version of strelka2 that has only average of piece square table between opening and endgame and has no pawn structure evaluation no mobility evaluation no king safety evaluation and no imbalance material tables.

Uri

Return on investments of different techniques and heuristic

Return on investments of different techniques and heuristic

Re: Return on investments of different techniques and heuris

Re: Return on investments of different techniques and heuris

Re: Return on investments of different techniques and heuris

Re: Return on investments of different techniques and heuris

Re: Return on investments of different techniques and heuris