categorical cross entropy for value

Discussion of chess software programming and technical issues.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Post Reply
chrisw
Posts: 1871
Joined: Tue Apr 03, 2012 2:28 pm

categorical cross entropy for value

Post by chrisw » Mon Feb 18, 2019 12:02 am

categorical cross entropy for value. win/loss/draw

I think doesn't make sense. These aren't disconnected discrete categories, like cat, dog, horse. they're on a continuum, where win greater than draw greater than loss (this french AZERTY keyboard doesn't have greater than or less than symbols), and if one represents win/loss/draw to a network like this when learning, then the network is being denied the relative information.

AlvaroBegue
Posts: 919
Joined: Tue Mar 09, 2010 2:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: categorical cross entropy for value

Post by AlvaroBegue » Mon Feb 18, 2019 5:51 pm

You don't propose an alternative, so it's hard to discuss pros and cons. However, these matters always come down to an empirical question.

Having a scheme that produces distinct probabilities for the outcomes has the advantage that they can be combined however necessary when we use the network. For instance, if you are at 9-10 at the end of a 20-game match, drawing is as bad for you as losing in the last game, so you can make your value function be P(winning), while under normal circumstances you would use P(winning)+0.5*P(draw).

Perhaps a reasonable alternative to try would be to have two sigmoid outputs, corresponding to P(white win or draw) and P(black win or draw). You can use the sum of their cross-entropy losses. But, again, I would only accept empirical evidence to decide between these types of options.

User avatar
hgm
Posts: 23510
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: categorical cross entropy for value

Post by hgm » Tue Feb 19, 2019 8:14 am

The value of a draw can be dependent on circumstances outside the game. Such as results of previous games in a match. If you can only be aware of 'expected average result', there is no way to distinguish between risky ways to get a better result and risk-free ways.

User avatar
Kotlov
Posts: 209
Joined: Fri Jul 10, 2015 7:23 pm
Location: Russia

Re: categorical cross entropy for value

Post by Kotlov » Tue Feb 19, 2019 9:21 am

hgm wrote:
Tue Feb 19, 2019 8:14 am
The value of a draw can be dependent on circumstances outside the game. Such as results of previous games in a match. If you can only be aware of 'expected average result', there is no way to distinguish between risky ways to get a better result and risk-free ways.

I distinguish equal position from theoretical draw.

Code: Select all

    SCORE e=M+EVAL_TEMP+(stat_eval_mg*P+stat_eval_eg*(PH_MAX-P))/PH_MAX;

    if(M>=MAT_PAWN)
        e-=R;
    if(M<=-MAT_PAWN)
        e+=R;
    if(e==0)           <--- here
        return 1;

    return e;
Very useful for determine type scores in the TT.
Eugene Kotlov
Hedgehog 2.0 64-bit coming soon...

chrisw
Posts: 1871
Joined: Tue Apr 03, 2012 2:28 pm

Re: categorical cross entropy for value

Post by chrisw » Tue Feb 19, 2019 9:52 am

AlvaroBegue wrote:
Mon Feb 18, 2019 5:51 pm
You don't propose an alternative, so it's hard to discuss pros and cons. However, these matters always come down to an empirical question.
when I hear the word empirical, I release the safety catch on my AK47.

the problem with empirical and NN training/results is, lacking DeepmInd hardware, it takes a very very long time to get an answer on meta questions. Hence, try to philosphise it out first, or enquire on the experience of others.
Having a scheme that produces distinct probabilities for the outcomes has the advantage that they can be combined however necessary when we use the network. For instance, if you are at 9-10 at the end of a 20-game match, drawing is as bad for you as losing in the last game, so you can make your value function be P(winning), while under normal circumstances you would use P(winning)+0.5*P(draw).

Perhaps a reasonable alternative to try would be to have two sigmoid outputs, corresponding to P(white win or draw) and P(black win or draw).
Yup. I think the binary categories ought to work, especially if we are training on game result, and don’t have the luxury of some sort of continuous evaluation to train on. The the value has two bars sticking out of it that are in mutual zero sum, it will perceive those less “categorically” than it would perceive three categories. If we categorise cats and dogs, then the network is ultimately generating an idea of a hybrid cat-dog creature, on a continuous scale. But if we categorise cats, dogs and monkeys (win loss draw), how does the network make a cat-dog-monkey scale? How does it know that monkey is hybrid cat-dog, as draw is hybrid win-loss?
That was the point I was trying to make. Can a triple categoriser work for a binary result game? In training. I know we can just make a value out of three category outputs when predicting, but it’s the training that has the problem. I think.

You can use the sum of their cross-entropy losses. But, again, I would only accept empirical evidence to decide between these types of options.

AlvaroBegue
Posts: 919
Joined: Tue Mar 09, 2010 2:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: categorical cross entropy for value

Post by AlvaroBegue » Tue Feb 19, 2019 11:57 am

I understand your concern, but I suspect in practice it won't matter. Looking at enough examples, the network can figure out that strong positions for white end up being either won by white or draws the vast majority of the time, even if it doesn't know that loss < draw < win.

Daniel Shawul
Posts: 3724
Joined: Tue Mar 14, 2006 10:34 am
Location: Ethiopia
Contact:

Re: categorical cross entropy for value

Post by Daniel Shawul » Tue Feb 19, 2019 1:53 pm

I am not clear about the question. One can put a value on any discrete outcome, for instance, based on 'cuteness scale' one could say: cat > dog > horse. Why should that information be fed to the network in anyway? The NN is just trying to model the frequencies of the discrete outcomes given the inputs. The reason why WDL maybe better, besides the point Alvaro raised, is that a "draw model" is taken into account. That is for example what differentiates bayeselo from elostat ...

chrisw
Posts: 1871
Joined: Tue Apr 03, 2012 2:28 pm

Re: categorical cross entropy for value

Post by chrisw » Tue Feb 19, 2019 2:26 pm

Hmmm, yes, maybe you guys are right. I’ll put the three options on my testing todo list.

Post Reply