I think some people equate a "tactical" style with a willingness to take risk or an unwillingness to draw. Can we assume that the program that is better tactically also be the more "aggressive" player? I say no. In fact I put quotes around "tactical" and "aggressive" because I don't really know how to define that, it's a subjective judgement call.
The experiment is to play a series of round robins between programs that are time adjusted to play equal strength. When you have enough games, assuming the scores are the same, you should see that some programs have more wins and losses than others, even though they should all have the same 50% score. I'm not sure what to call this playing characteristic but it's one aspect of a programs style. It's a willingness to play in a such a way that you will lose more games to win more. So lt's call it "risk adverseness." One definition I saw applied this to investments:
Code: Select all
Definition of 'Risk Averse'
A description of an investor who, when faced with two investments with a similar expected return (but different risks), will prefer the one with the lower risk.
In the match Houdini has lost more games than the other 2, which implies that it is the least risk averse. Stockfish is the most risk averse - which means it would prefer to draw. But the difference between Stockfish and Komodo is very small, they are virtually the same in this regard and there is enough noise in the data that this could change as I run more games. I have always viewed Stockfish as more aggressive risk taker than Komodo but that was not based on any serious logic, just my highly subjective impression and my knowledge that it is quite strong tactically - but tactical skill has nothing to do with risk adverseness.
I'm still running the test, but here is the results so far - it was no small trick getting them to be so evenly matched and it took a few false starts:
Code: Select all
Rank ELO +/- Games Score Player
---- ------- ------ -------- -------- ----------------------------
1 3001.1 7.5 5679 50.264 kdev-4518.00
2 3000.0 7.5 5679 50.026 hou3
3 2998.5 7.5 5678 49.709 sf23
w/l/d: 3092 2409 3017 35.42 percent draws
Here are the results based on Decisive game percentages, wins and draws:
Code: Select all
Decisive Wins Losses Draws Player
-------- -------- -------- -------- -------------------
66.65 33.35 33.30 33.35 hou3
63.60 32.07 31.54 36.40 kdev-4518.00
63.49 31.45 32.04 36.51 sf23
I don't make any claims about what it all means. I don't even know for sure how to make a program behave one way or the other, how to create weights that cause the program to lose more games without weakening it. But I would assume that it's about the evaluation weights of dynamic terms more than anything else. Perhaps erring on the side of making the weight too high as opposed to too low? I don't know.
I would also like to explore other "style" measurements. The similarity tester compares 2 programs for some measure of stylistic similarity but it does not try to categorize a programs style - this test at least tries to measure one aspect of it.