Esay: +100 =0 -0.Michael Sherwin wrote:So if Romi would have trained against Crafty 100 games in every position that was in a human database 10,000 times or more how do you think Romi would have done against Crafty in a follow up match if Crafty used its tournament book?Rodolfo Leoni wrote:But against Crafty that specific position was.... startposition!Michael Sherwin wrote:Hi Rodolfo! Yes I remember those experiments. Starting from a new learn file Romi was able to win 100 game matches against both Rybka and Crafty when starting from a specific position. Thanks for reminding me!Rodolfo Leoni wrote:Hi Mike,Michael Sherwin wrote:In January of 2006 IIRC (not exactly sure) I released RomiChess ver P2a. The new version had learning. It had two types of learning, monkey see monkey do and learning adapted from Pavlov's dog experiments. I did not know it at the time but the second type of learning was called reinforcement learning. I just found out very recently that reinforcement learning was invented for robotics control in 1957 the year that I was born, strange. Anyway, as far as I know I reinvented it and was the first to put reinforcement learning into a chess program. The reason i'm apparently patting myself on the back is rather to let people know that I recognise certain aspects of this AlphaZero phenom. For example, using Glaurung 2.x as a test opponent Romi played 20 matches against Glaurung using the ten Nunn positions. On pass one Romi scored 5% against Glaurung. On the 20th pass Romi scored 95%. That is how powerful the learning is! The moves that Romi learned to beat Glaurung were very distinctive looking. They are learned moves so they are not determined by a natural chess playing evaluation but rather an evaluation tweaked by learned rewards and penalties. Looking at the games between AlphaZero and Stockfish I see the same kind of learned moves. In RomiChess one can start with a new learn.dat file and put millionbase.pgn in the same directory as Romi and type merge millionbase.pgn and Romi will learn from all those games. When reading about AlphaZero there is mostly made up reporting. That is what reporters do. They take one or two known facts and make up a many page article that is mostly bunk. The AlphaZero team has released very little actual info. They released that it uses reinforcement learning and that a database of games were loaded in. Beyond that not much is known. But looking at the games against Stockfish it looks as though AlphaZero either trained against Stockfish before the recorded match or entered a pgn of Stockfish games. Stockfish does have some type of randomness to its moves so it can't be totally dominated like Romi dominated Glaurung that had no randomness. So basically take an engine about as strong as Stockfish and give it reinforcement learning and the result is exactly as expected!
It's always a pleasure to see you .
Don't forget the matches Romi-Rybka on a theme variation and Romi-Crafty on full standard games... Romi won all of them on 100 games matches, with empty learning file.
We shouldn't forget they were different times for computer chess. On single CPUs (deterministic chess) it's easier to find opponent's weaknesses. With multicore engines it becomes a bit harder because engines often change their PVs. So I guess Romi would win but it'd suffer some lost.
About AlphaZ, I think that's an hardware revolution and engines strenght (or learning) has nothing to do with the result. It's a different way to build a software, a different pattern of evaluation, and a learning which is much more similar to KnightCap than any other. With a difference: at those times, KnightCap learning could never work.
It'd have been far more interesting a match AlphaZ-Stockfish 9 (when released), but if you give SF9 some learning features. Romi style or Critter style, it doesn't matter. We'd have a learning vs. learning in a match with engines of similar level. Or maybe SF9 would have been strong enough to win the match...
We'll never know becaure that was mere marketing so they needed to win... That doesn't mean pruduct is bad. It's probably great, but if you want to sell a great (and expensive) product you need to do a lot of advertising about an unbeliveable preformance. So you spend a lot of money because you want to earn a lot more.
Just two, max three cents.