Just to be crystal clear.... a not insignificant amount of my time is spent watching MF try difficult positions with long search times and a fixed amount of hash. Over and over and over again... for a few years now.
But like I said previously Elo gain in self play can not be transferred to expected results vs Alpha Zero, as its wins had very little to do with SF's search depth (not anything 2 or 3 more ply would fix anyway).
Indeed Elo is not transitive at all, but that doesn't stop us humans from using it that way even when we know it is meaningless.
hgm wrote:1GB hash seems pretty big. Have you actually measured if there is any advantage whatsoever of increasing the hash size?
And how much would AlphaZero gain by using a book, and not having to run at fixed time per move, in your estimate?
1GB hash is far too low. I have immense respect for the deepmind team, but this was a pretty serious error.
At 70 million positions/second, that 1GB of hash is being overwritten once every second (10 bytes per entry = 700MB overwritten/second). Even worse, it's being contended by all 64 threads on the same 1GB of RAM. To give some context, we usually play our 60 sec games with 64Mb of hash, with one thread!
In these games, at 1 minute/move, the hash table is all but useless. It should probably have been 32 or 64GB, and the machine certainly had that much RAM.
Based on the Komodo recommendation of a 40% fill rate per move, I estimate about 128 GB is about right assuming a reasonably fast 64 core machine.
So, if someone has a 64 core machine (or even 32 core), we could try this:
a. machine 1 uses1 GB Hash, Stockfish 8, no opening book. No Syzygy.
b. machine 2 uses the latest Stockfish, 64 GB or 128 GB of hash, with Syzygy. And a decent opening book.
It is unclear how they kept the programs from constantly playing the first few moves the same. We could use some very small opening book for the run.
Just use fixed time of 60 seconds per move. Assuming a typical games lasts 60 moves, then it would take 2 hours per games, or 200 hours total. I am interested in this enough to donate some to cover machine rental on Amazon EC2 or some suitable machine.
Mark
Why would you want someone else to do this and donate ? Dont you (and your team) have some hardware to run these tests on
No, the biggest machine we has is a 24 core and it is several years old. We buy machines in general without huge memory since the time controls we test out do not need 128 GB. Plus the run would take more than a week, and we keep the machines running Komodo games to improve it. My interest is personal and not specifically for Komodo. So I would help fund such a run.
If I get enough time to see how to run it on EC2, then maybe I can do it that way. But I was hoping someone here had a spare machine, and was equally interested in the result.
jhellis3 wrote:Just to be crystal clear.... a not insignificant amount of my time is spent watching MF try difficult positions with long search times and a fixed amount of hash. Over and over and over again... for a few years now.
Well, mere watching doesn't cut it. Show us some quantitative data similar to the graph above, so we can see where the knee occurs.
Note that 100% hashfull should be reported after the tree already has many more nodes than fit in the hash, as most nodes are duplicats. So multiplying nps with time to get tree nodes will ot tell you when the hash fills up.
Show us some quantitative data similar to the graph above, so we can see where the knee occurs.
SF and cutechess are both open source and free. You can download/compile the software and run the tests yourself (no need to take my word for my tests results, especially if you are unwilling to trust me in the first place). One difference in MF is I use 64 bit keys. Whether that has a significant affect on the outcome I couldn't say...
In my experience, people are very good at finding what they are looking for....
Oh, I trust you all right, but I don't trust any observation just made by 'watching', not even if I had been watching myself. And especially where it is not even clear what exactly you have been watching. (Reported tree nodes > hash entries, hashfull > 99%...)
As the data I collected is pretty generic, and exactly what is expected because of the reason I mentioned, the matter is decided to my satisfaction. No reason to waste my time doing it again with Stockfish.
I would not advise anyone to attach much value on impressions caused by watching.
As the data I collected is pretty generic, and exactly what is expected because of the reason I mentioned, the matter is decided to my satisfaction. No reason to waste my time doing it again with Stockfish.
I would not advise anyone to attach much value on impressions caused by watching.
Like I said, people are very good at finding what they are looking for.
Lyudmil Tsvetkov wrote:the opening book was by far the most important factor in the match
Nonsense. At 1 minute per move, Stockfish has quite some calculation depth on that hardware. If Stockfish willingly heads into positions it cannot handle well, then this is simply an engine shortcoming of Stockfish.
You can not call nonsense something which is 100% true.
Out of the 10 games available to us, Alpha won precisely none=0 in the endgame.
Its only 2 middlegame wins were in the Ruy Lopez after SF made some stupid trades, underevaluating the bishop pair and a minor piece for pawns.
All the rest were early opening losses, where Alpha used its book.
Make the conclusions yourself, but if you think Alpha won all the games in the endgame, this exercise will be meaningless.
Lyudmil Tsvetkov wrote:
In what way is it strong?
Now, you have a weightlifter pulling 300 kilos from the ground, SF.
Then, you have Alpha, pulling 50. It is much weaker.
Then you add up 10 Alphas to pull the same weight, and they outperform SF.
In what way is this strong?
I don't understand, what the hell NN means.
You don't say. Gosh golly, I'd never have guessed.
You know, it's really a shame that A0 beat Stockfish. It means that people have inane diacussions about what tweaks can give Stockfish a 5 elo edge. Meanwhile the real point passes over their head in orbit.
It does not matter if A0 is 100 elo above or below Stockfish. It wouldn't matter if it scored on-par with, say, Hiarcs, or Crafty: it'd still be a massive breakthrough. If you don't understand why that is so, go educate yourself.