* vs +

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

* vs +

Post by stegemma »

Anybody has try the idea to multiply parameters instead of add? The idea is to use parameters as a probability to win, when some feature occurs. For sample, if I find that having a passed pawn lead to victory in 110% of the times, the value of the position can be multiply by 1.10. If I have a doubled pawn (and maybe some other event occurs) I can multiply positional value by 0.95 and so on. This maybe could give a bigger interrelation between parameters than just adding them.

I would like to try this on the next version of satana, using the random move generator to collect statistics... but I'm afraid that surely dott. Hyatt have already try in '60 before I was born ;)
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: * vs +

Post by hgm »

Well, multiplying numbers is equivalent adding their logarithms. So it isn't really different. You just have to apply some logarithmic (i.e. saturating) correction to the evaluation.

Not that it matters very much, though, as any monotonous mapping of the scores will eventually give the same search. It only matters if you compare scores that have a different mapping. Or apply a fixed additive delayed-loss bonus to them. This actually makes some silly behavior (like sacrificing a Rook on ply 1 to avoid the loss of a Queen on ply 20) go away.
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: * vs +

Post by hgm »

Well, multiplying numbers is equivalent adding their logarithms. So it isn't really different. You just have to apply some logarithmic (i.e. saturating) correction to the evaluation.

Not that it matters very much, though, as any monotonous mapping of the scores will eventually give the same search. It only matters if you compare scores that have a different mapping. Or apply a fixed additive delayed-loss bonus to them. This actually makes some silly behavior (like sacrificing a Rook on ply 1 to avoid the loss of a Queen on ply 20) go away.
mvk
Posts: 589
Joined: Tue Jun 04, 2013 10:15 pm

Re: * vs +

Post by mvk »

stegemma wrote:Anybody has try the idea to multiply parameters instead of add?
Multiplication is the same as adding the logarithms, so I think it only changes the scale of representation and not the relative ordering of positions. Unless you want to mix the operators. In a way we are already comparing probabilities, except that we silently map them onto a piece value scale using the (inverse of the) logistic function. In the range where games get won or lost (say, -2p to +2p), this mapping is almost linear. Outside that range it really doesn't matter what you do.
[Account deleted]
User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

Re: * vs +

Post by stegemma »

hgm wrote:Well, multiplying numbers is equivalent adding their logarithms. So it isn't really different. You just have to apply some logarithmic (i.e. saturating) correction to the evaluation.

Not that it matters very much, though, as any monotonous mapping of the scores will eventually give the same search. It only matters if you compare scores that have a different mapping. Or apply a fixed additive delayed-loss bonus to them. This actually makes some silly behavior (like sacrificing a Rook on ply 1 to avoid the loss of a Queen on ply 20) go away.
Ah ok, that's right, I've not think from this point of view. Maybe it could be done if you have to start from the probability of a win, computed at runtime for any parameter in actual position. It is an experiment that I want to do but only hell knows what it leads to... and hell is the house of satana! :)
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: * vs +

Post by mar »

hgm wrote:Well, multiplying numbers is equivalent adding their logarithms.
Old Yamaha FM chips used this trick so they added logarithms and finally trasnformed the signal back using exp-table lookup :)
So actually they did the opposite to avoid multiplication.
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: * vs +

Post by matthewlai »

stegemma wrote:
hgm wrote:Well, multiplying numbers is equivalent adding their logarithms. So it isn't really different. You just have to apply some logarithmic (i.e. saturating) correction to the evaluation.

Not that it matters very much, though, as any monotonous mapping of the scores will eventually give the same search. It only matters if you compare scores that have a different mapping. Or apply a fixed additive delayed-loss bonus to them. This actually makes some silly behavior (like sacrificing a Rook on ply 1 to avoid the loss of a Queen on ply 20) go away.
Ah ok, that's right, I've not think from this point of view. Maybe it could be done if you have to start from the probability of a win, computed at runtime for any parameter in actual position. It is an experiment that I want to do but only hell knows what it leads to... and hell is the house of satana! :)
I am doing exactly that with my engine (Giraffe, based on machine learning).

Probabilities can only be multiplied if they are independent, though, and most won't be.

For example, from an equal position, a queen odd would be worth almost infinity (you are almost surely going to win regardless of what the current eval is). But from a KPPPPvK situation, it would be worth about 1 (doesn't change the probability of winning, since the probability of winning was close to 1 to begin with).
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
User avatar
Kotlov
Posts: 266
Joined: Fri Jul 10, 2015 9:23 pm
Location: Russia

Re: * vs +

Post by Kotlov »

stegemma wrote:Anybody has try the idea to multiply parameters instead of add? The idea is to use parameters as a probability to win, when some feature occurs. For sample, if I find that having a passed pawn lead to victory in 110% of the times, the value of the position can be multiply by 1.10. If I have a doubled pawn (and maybe some other event occurs) I can multiply positional value by 0.95 and so on. This maybe could give a bigger interrelation between parameters than just adding them.

I would like to try this on the next version of satana, using the random move generator to collect statistics... but I'm afraid that surely dott. Hyatt have already try in '60 before I was born ;)
"+" or "*" equal for represent, but "+" much easy calculate for processor.
User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

Re: * vs +

Post by stegemma »

matthewlai wrote:
stegemma wrote:
hgm wrote:Well, multiplying numbers is equivalent adding their logarithms. So it isn't really different. You just have to apply some logarithmic (i.e. saturating) correction to the evaluation.

Not that it matters very much, though, as any monotonous mapping of the scores will eventually give the same search. It only matters if you compare scores that have a different mapping. Or apply a fixed additive delayed-loss bonus to them. This actually makes some silly behavior (like sacrificing a Rook on ply 1 to avoid the loss of a Queen on ply 20) go away.
Ah ok, that's right, I've not think from this point of view. Maybe it could be done if you have to start from the probability of a win, computed at runtime for any parameter in actual position. It is an experiment that I want to do but only hell knows what it leads to... and hell is the house of satana! :)
I am doing exactly that with my engine (Giraffe, based on machine learning).

Probabilities can only be multiplied if they are independent, though, and most won't be.

For example, from an equal position, a queen odd would be worth almost infinity (you are almost surely going to win regardless of what the current eval is). But from a KPPPPvK situation, it would be worth about 1 (doesn't change the probability of winning, since the probability of winning was close to 1 to begin with).
Thanks, I've missed your interesting post and today I've read it and played a little match. Your approach is very interesting, I've try with neural networks in the past, with no valid results, only something similar to random playing.

I think that a single neural networks is not enough, for chess. Maybe many neural networks should be used, one for any aspect of the game. of course your work is still impressive and I hope that it would be only a starting point for you.
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: * vs +

Post by matthewlai »

stegemma wrote:
matthewlai wrote:
stegemma wrote:
hgm wrote:Well, multiplying numbers is equivalent adding their logarithms. So it isn't really different. You just have to apply some logarithmic (i.e. saturating) correction to the evaluation.

Not that it matters very much, though, as any monotonous mapping of the scores will eventually give the same search. It only matters if you compare scores that have a different mapping. Or apply a fixed additive delayed-loss bonus to them. This actually makes some silly behavior (like sacrificing a Rook on ply 1 to avoid the loss of a Queen on ply 20) go away.
Ah ok, that's right, I've not think from this point of view. Maybe it could be done if you have to start from the probability of a win, computed at runtime for any parameter in actual position. It is an experiment that I want to do but only hell knows what it leads to... and hell is the house of satana! :)
I am doing exactly that with my engine (Giraffe, based on machine learning).

Probabilities can only be multiplied if they are independent, though, and most won't be.

For example, from an equal position, a queen odd would be worth almost infinity (you are almost surely going to win regardless of what the current eval is). But from a KPPPPvK situation, it would be worth about 1 (doesn't change the probability of winning, since the probability of winning was close to 1 to begin with).
Thanks, I've missed your interesting post and today I've read it and played a little match. Your approach is very interesting, I've try with neural networks in the past, with no valid results, only something similar to random playing.

I think that a single neural networks is not enough, for chess. Maybe many neural networks should be used, one for any aspect of the game. of course your work is still impressive and I hope that it would be only a starting point for you.
Yeah it's really all about feature representation. A lot of times the intuitive representation is far from the best. People often choose the most intuitive representation without much thinking, and wonder why neural nets don't work well (and it's really not surprising that they don't work well).

For example, for chess, many people represent the board as either 64 features each describing a square. Or, slightly but not much better, as a bunch of bitboards.

That works very poorly because neural networks require the feature space to be more or less smooth to work well, and that representation is not smooth at all (smooth here means similar inputs should produce similar outputs, most of the time).

A much better way to represent the board is as a list of piece coordinates.

Taking a single knight as an example (trying to have the network learn the piece-square table) -

If we represent it as a 64-dimensional input with only 1 feature set (where the knight is), it will essentially have to learn a lookup table. This is bad because it won't be able to generalize at all (if it learns that E4, E5, D4 are all good places for a knight to be, it won't be able to generalize that knowledge to D5). There's nothing telling the neural net that the output for D4 is probably much closer to E4 than it is to A1.

On the other hand, a knight can also be represented by which file and rank it's on. That's a much better representation because the piece-sq table can be reasonably approximated using a few piece-wise linear functions in 2D. So this representation will generalize much better. This is because distance between 2 points on the 2D input space corresponds to distance in eval.

That is the mistake I think most people made when trying to use neural nets to play chess. Unfortunately, trivial feature representations aren't going to work in practice with neural nets, except for very trivial toy problems (and chess is certainly not trivial!). Also unfortunately, many machine learning learning sources don't make that fact abundantly clear.

Using multiple nets is one solution to the issue of requiring different eval for different phases. However, it's got a very high computational cost since a lot of work would have to be done multiple times, so I'm trying to avoid that if possible. If you are interested in multi-net approaches, a very cool method is known as Mixture of Experts (http://www.cs.bham.ac.uk/~jxb/INC/l19.pdf), where n nets are trained to produce eval, and another net is trained to decide how to combine their outputs given the position.

The cool thing about that is the whole system can be trained together. You don't have to decide which net is responsible for what kind of positions. As training goes on, the nets will automatically specialize into niches.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.