MMTO for evaluation learning

jdart · Post by **jdart** » Sun Jan 25, 2015 4:11 pm

This algorithm (MMTO), or a variant of it, is now used by all the top Shogi programs. For an objective function, it uses agreement between low-depth search results and game moves actually made by strong players.

There is some brief discussion of its application to chess and an experiment using Crafty in this paper:

https://www.jair.org/media/4217/live-4217-7792-jair.pdf

(see p. 555).

--Jon

jdart · Post by **jdart** » Sun Jan 25, 2015 5:22 pm

Here is a less technical intro PPT, explains the error function better:

http://www.logos.ic.i.u-tokyo.ac.jp/~mi ... ummer.pptx

Ferdy · Post by **Ferdy** » Sun Jan 25, 2015 10:34 pm

Thanks Jon, that is interesting. Something that it combines minimizing the error in score and at the same time guiding the score to choose the best move considered in a position.

wgarvin · Post by **wgarvin** » Mon Jan 26, 2015 3:35 am

Weird that they mention KnightCap and the temporal-difference learning experiments that were done (Jonathan Baxter and Andrew Tridgell) but give a date of 2000 for some reason... I think they had published papers about it as early as 1998 (I was in university then and remember reading them).

This one for example: http://citeseerx.ist.psu.edu/viewdoc/su ... .1.36.7885

Anyways, it looks interesting, thanks for the links!

MMTO for evaluation learning

MMTO for evaluation learning

Re: MMTO for evaluation learning

Re: MMTO for evaluation learning

Re: MMTO for evaluation learning