Page 1 of 1
MMTO for evaluation learning
Posted: Sun Jan 25, 2015 4:11 pm
by jdart
This algorithm (MMTO), or a variant of it, is now used by all the top Shogi programs. For an objective function, it uses agreement between low-depth search results and game moves actually made by strong players.
There is some brief discussion of its application to chess and an experiment using Crafty in this paper:
https://www.jair.org/media/4217/live-4217-7792-jair.pdf
(see p. 555).
--Jon
Re: MMTO for evaluation learning
Posted: Sun Jan 25, 2015 5:22 pm
by jdart
Here is a less technical intro PPT, explains the error function better:
http://www.logos.ic.i.u-tokyo.ac.jp/~mi ... ummer.pptx
Re: MMTO for evaluation learning
Posted: Sun Jan 25, 2015 10:34 pm
by Ferdy
Thanks Jon, that is interesting. Something that it combines minimizing the error in score and at the same time guiding the score to choose the best move considered in a position.
Re: MMTO for evaluation learning
Posted: Mon Jan 26, 2015 3:35 am
by wgarvin
Weird that they mention KnightCap and the temporal-difference learning experiments that were done (Jonathan Baxter and Andrew Tridgell) but give a date of 2000 for some reason... I think they had published papers about it as early as 1998 (I was in university then and remember reading them).
This one for example:
http://citeseerx.ist.psu.edu/viewdoc/su ... .1.36.7885
Anyways, it looks interesting, thanks for the links!