designing neural networks

Gerd Isenberg · Post by **Gerd Isenberg** » Mon Sep 03, 2007 11:57 pm

Oliver wrote:Hi Gerd,

let me first say what i would do. I would simply export my trainingsdata into a giant csv-file (or a custom MySQL-Table) and then use prefabricated tools to solve this problem.
If you like to fiddle with Neural Networks you might consider SNNS
(http://www-ra.informatik.uni-tuebingen.de/SNNS/).
If you are open to other statistical tools, you may choose R
(http://www.r-project.org/).

But if i evaluate you correctly, you are not the person to like that. You like to program every thing out for yourself. (Thats a good thing).

Hi Oliver,

don't know, I need to go the "hard" way to understand things

Oliver wrote:Regarding the sampleproblem you gave, it is not clearly defined!
One must distinguish supervised and unsupervised learning.
See e.g. here:
http://en.wikipedia.org/wiki/Unsupervised_learning

So the question is: Do you have already high quality scores or not?
Please answers this first, it is important. If not, you have really a problem with it. Otherwise, your problem is conceptual trivial.

Oliver

No idea, if unsupervised is sufficient, fine. Otoh I would like to supervise or inspect the "learning" by computer aided tools.

Don't know how you define high quality scores.

I tried and will try my best. All three mentioned scores, king-safety, king-activity and pawn-scores may be winning / losing scores and may somehow volatile. One tempo or side to move may be decisive.

Transitions are always critical - often subject of a tactical (q)search. For instance a huge "winning" score of an outside passed pawn in a pawn-ending, which must be captured by the opponent king in the very last moment. Leaving the center and allowing a path for the own king to pick all opponent rammed or backward pawns on the other wing after some quite moves. Ideally, to avoid oscillations, the "winning" score of an outside passed pawn should transpose to a winning king-activity score, even actually with the former "winning" pawn down.

Assuming the scores are high quality, the issue is to scale those values by gamestate.

Of course the more advanced a passer is, the higher the score.
Occupancy and control of the stop- or advanced-stop-square is an issue as well. Supporting pawns, indirect defence/attack aka rook from behind etc.

Scaling passers by gamestate usually considers, that it is harder or more committed, to defend or occupy a stop-square or an advanced stop-square. Which usually makes passers more valuable, the less pieces are available. The more one already considers the ability to block or control stop-squares independently of the gamephase (which is implicitly harder in late endings), the lower the amplitude of the heuristic sigmoid curve.

Gerd

Oliver · Post by **Oliver** » Tue Sep 04, 2007 4:37 am

Hi Gerd,

thanks for your comprehensive answer. You haven't answer my question yet, but that doesen't matter since it was not exactly bright anyway. Since you mostly use "unsupervised learning" if you do not have targetvalues. Here you have plenty of them, your program computes them.

So you want to use an so called "Artificial Neural Network" (ANN for short) to compute gamestateevaluations or other helpful values!?

First you should describe the problem functional. What should the ANN compute? What are the inputterms? Inputterms can be categorized as either continuos or categorial. For a continuos variable x, every real value is allowed. E.g. a kingactivityscore. A categorical value takes only values from a finite set. E.g. a boolean value, indicating the presence of a passed pawn. Categorical values cannot directly modelled with ANN's and with most other models and must be emulated, therefore the distinction.

If you have decided on the function itself, you must generate samplevalues of your function. Write plenty of them out to a textfile. Try to cover all cases at least once. With ANNs you can process millions, since parameters are learned using iteration.

Next import your datafile into a statistical tool. You can then do all sort of things with it, creating graphs, making tests, do preprocess it, create models on the fly and train them.

Write out the parameters of your created model and implement it in your program.

Plan to reiterate. Here R comes in handy, since it is completely programmable with a wellfounded scriptinglanguage.

Some further hints:

Try to reduce the complexity of your model:

If there is a logical dependence f(x)=y between feature x and y, throw y out.

If there is a statistical correspondence between two or more features, that hurts too. Try to transform your inputvalues to eliminate this dependance.

If you know the value of a variable, write it down. You cannot estimate every parameter, at least not reliable. E.g. I think piecevalues are known nowadays.

Try to parametrize the inputspace to get rid of some parameters.
Example: You want to compute for each sq = (x,y) with 0<=x,y<=7 the merit of having a knight on this square k(sq). Normally you had to estimate 64 values.
Now let d = sqrt( (sq - (3.5,3.5) )^2), the euklidian distance of sq to the center of the board and k(sq,a,b) = exp(ad+b).
This way you have only to estimate two parameters a and b. If it works is another question... This is only an example.

It has been proven, that an ANN with one hidden layer (but of unbounded size) and an arbitrary sigmoidal activationfunction can approximate every continuos function. That makes many people expect wonders from ANNs, but these wonders seems to be the exception in practice. Some statisticans teach, that a wellchoosen simple model can outperform an ANN in many cases. On the other hand, it is said that ANNs do "generalize well", they often do not break down miserably, if tested with values not covered by the trainingsamples. Compare this with a polynom, the extrem other case. It crashes completely, if evaluated on values outside the supportingpoints.

If you have computed some samplevalues, feel free to mail them to me, if you like that. Since i am familiar with R, it is not a big deal for me, to construct a modell out of them.

Gerd Isenberg · Post by **Gerd Isenberg** » Tue Sep 04, 2007 7:03 pm

Hi Oliver,

sorry for not answering your question, since I still don't understand what you mean with high quality scores. Please consider that I am almost an illiterate with ANNs - and I may therefor not familar with some terms.

Your further hints are very reasonable - there might be more or less hidden dependencies and/or interactions between king-safety, king-activity and pawn-evaluation though. Usually an active king is less safe and vice versa. But the considerations are different. King-safety considers pawn-shield(s), (half)open rays as attacking trajectories, piece attacks and defends of sorrounding squares, escape square-pattern (e.g. baserank issues) castle-rights, opposite castle with pawn-storms and ability to open files etc., while king-activity ignores those features and it is about center-control, tropism to "pawn-center of gravity", square of the king issues to catch opposite passers or to support own passers etc.. King-safety is important during the middlegame, while it becomes less an issue already with queens off the board and disappears totally in late endings. King-activity is the other way around - almost nil in the middlegane, but important in the ending.

Of course, and that is an issue with "classical" eval approaches as well, one has to deal with interactions.
If in a pawn-ending - an opponent passer is inside or outside the square of the own king - is the own king more or less active - or is the passer weaker or stronger? Or should one consider such interactions as disjoint features (I guess yes)?

It will take some time to read and work through all that mentioned stuff (trying R for instance) and to become more familar with.
My new 64-bit program starts to play legal moves and I am about to design the evaluation. While I can enumerate zillions of features - the problem is to smoothly put all together to a final score. My old program's evaluation is a kind of degenerated spaghetti, with almost zero maintainability any longer and I will do it better now.

Cheers,
Gerd

Gerd Isenberg · Post by **Gerd Isenberg** » Wed Sep 05, 2007 9:56 pm

Gerd Isenberg wrote:Thanks for all your replies.

I want to make the whole application issue a bit more concrete. For instance Glaurung (and Fruit) interpolate explicite middlegame versus endgame scores by a fixed point number 0...1.0 == 128, representing the gamephase like a normalized sum of total material of both sides.
Code: Select all
score = &#40;middlegameValue*gamephase + endgameValue*&#40;128 - gamephase&#41;) / 128
I have an "improvement" of this interpolation in mind, eventually using some (simple) neural network approach - since it might be a nice didactial application to start with. To somehow "train" the "smoothness" of middlegame-endgame transitions. Dan's mentioned radial basis function networks seems to point in this direction, I guess. To make the function a more or less sigmoid function with scalable "bending" between the extremes straight and rectanglar curve - plus a scalable middle-endgame - turning point M.
Code: Select all
score m-opening/middlegame, e-endgame

 ^
 |  ( bending == 0 )
 e                                  m
 |   e                           m
 |       e                   m
 +           e           m
 |               e . m
 |               m | e
 +           m     |     e
 |       m         |         e
 |   m             |             e
 m-----------+-----|-----+-----------e---> gamephase &#40;0...128 => endgame...middlegame
                <- M -> &#40;middle-endgame - turning point&#41;

 ^
 |   ( bending == 0.5 )
 e                                   m
 |         e               m
 |             e       m
 +               e   m
 |                e.m
 |                m|e
 +               m | e
 |             m   |   e
 |         m       |       e
 m-----------+-----|-----+-----------e---> gamephase &#40;0...128 => endgame...middlegame
                <- M -> &#40;middle-endgame - turning point&#41;

 ^
 |   ( bending == 1.0 )
 eeeeeeeeeeeeeeeeee mmmmmmmmmmmmmmmm
 |                 |
 |                 |
 +                 |
 |                 .
 |                 |
 +                 |
 |                 |
 |                 |                 
 mmmmmmmmmmmmmmmmmm|eeeeeeeeeeeeeeee---> gamephase &#40;0...128 => endgame...middlegame
                <- M -> &#40;middle-endgame - turning point&#41;

Scaling the bending or shifting a sigmoid function is only a matter to add/mul x or y either. So the sigmoid functions might be the same in all hidden units of the net, since scaling and bias is considered elsewhere.

Code: Select all

                    +1 +    ++++++++++++++++++B
                       |  b
                       | +
                       |+
-----------------------+-----------------------
                      +|
                     + |
                    a  |
A++++++++++++++++++    + -1

Performing four times this

Code: Select all

  y = x / sqrt&#40;1.0 + x*x&#41;

sigmoid function with SSE2 vectors of 32-bit floats in one xmm-register might give some performance boost - specially with 4*4 sigmoid mappings in parallel.

Code: Select all

inst.    K10 lat throughput
MULPS         4     1/1
ADDPS         4     1/1
RSQRTPS       3     1/1
MULPS         4     1/1

chrisw · Post by **chrisw** » Fri Oct 19, 2007 2:42 pm

Gerd Isenberg wrote:I am interested in applying neural networks e.g. to scale some eval terms, but I am a totally newbie in that field. What would be a good didactical, eval related example to play with, to get first experience?

Might this be an practical example?

I like to scale some eval terms to a final score - more or less depending on gamestate {early, late}X{opening, middlegame, endgame} and closeness of the pawn structure and number of pawn versus piece ratio. I can feed in total material, material-balance, pawn-piece ratio and some "heuristic" numbers, somehow representing a pawn-structure measure of the close-/openness (and/or ability to open) of the center and both wings (e.g. based on number of pawns, number of open and halfopen (per side) files, number of closed files with rams, and number of closed or halfopen files with levers and lever options, etc).
Code: Select all
                  input  hidden    output layer
  side to move       &#91;&#93;\   
  white material sum &#91;&#93;\\=X&#91;~&#93;\
  black material sum &#91;&#93;\\\X&#91;~&#93;\\
  closeness-center   &#91;&#93;\\\X&#91;~&#93;\\\
  ...                &#91;&#93;>==X&#91;~&#93;-===>&#91;&#93;  score
  pawn/piece ratio   &#91;&#93;///X&#91;~&#93;///
  eval-term 1        &#91;&#93;///X&#91;~&#93;//
  ....               &#91;&#93;//=X&#91;~&#93;/
  eval-term N        &#91;&#93;/
What are the techniques to design such stuff? The topology of the net, the number of neurons, the initial weights, the characteristic curves and the backpropagation algorithm to adjust the weights after some training? I guess to design the hidden layer depends on how the input neurons interact, and which are independently. Is it about set of equations and desired characteristic curves in the first place?

Thanks for any help and advice!

Cheers,
Gerd

Hi Gerd,

I did some work with ANN's and chess several years ago. Chess is a very difficult game for ANN to deal with. A complex position that is win for white might be win for black if side to move changed, or some insignificant pawn were on a different square. The full game position is thus really difficult to find spatial patterns with, let alone produce an evaluation.

Of course the other problem is the number of fp calculations makes the thing just way off the page compared to human written intelligence if used as nodal evaluation.

My work now concentrates on stock market data where ANN work well.

However, I did find one area where ANN works for chess.

Think of the root moves. If unsorted you should find 'bestmove' 50% of the time in the first half of the list. The further down the list 'bestmove' or any change in 'bestmove' as a result of search, the longer the search is going to take to find it.

Hence you can effective speed up your rate of finding new 'bestmove' by having candidates higher up the root list.

The somewhat blundering technique I tried was to take every test suite I could find, present the positions one by one to ANN (you could use whatever way you think to present the board to the ANN) and train the output (I used desired 'to' square) on the testsuite answers.

I used 'to' because I figured tactical shots inbto the 'right' area were most likely to be found this way.

I forget exactly how well this worked, but I think I could push 'bestmove', on average, into the first 25% of the root in this way. Or course, the ANN calculation on the root takes zero% of your search time

Best wishes,

ChrisW

Gerd Isenberg · Post by **Gerd Isenberg** » Fri Oct 19, 2007 3:56 pm

chrisw wrote: Hi Gerd,

I did some work with ANN's and chess several years ago. Chess is a very difficult game for ANN to deal with. A complex position that is win for white might be win for black if side to move changed, or some insignificant pawn were on a different square. The full game position is thus really difficult to find spatial patterns with, let alone produce an evaluation.

Of course the other problem is the number of fp calculations makes the thing just way off the page compared to human written intelligence if used as nodal evaluation.

My work now concentrates on stock market data where ANN work well.

However, I did find one area where ANN works for chess.

Think of the root moves. If unsorted you should find 'bestmove' 50% of the time in the first half of the list. The further down the list 'bestmove' or any change in 'bestmove' as a result of search, the longer the search is going to take to find it.

Hence you can effective speed up your rate of finding new 'bestmove' by having candidates higher up the root list.

The somewhat blundering technique I tried was to take every test suite I could find, present the positions one by one to ANN (you could use whatever way you think to present the board to the ANN) and train the output (I used desired 'to' square) on the testsuite answers.

I used 'to' because I figured tactical shots inbto the 'right' area were most likely to be found this way.

I forget exactly how well this worked, but I think I could push 'bestmove', on average, into the first 25% of the root in this way. Or course, the ANN calculation on the root takes zero% of your search time

Best wishes,

ChrisW

Hi Chris,

thanks for your suggestions and welcome!

Root move sorting based on ANN's seems indeed an interesting application. After pv-move and eventually previous best moves sorted in front, I still use number of nodes searched in previous iteration as a measure. One problem with that, during the game with filled hash-tables - you often start the first "real" iteration already with some higher depth with takes considerable time.

What I had in mind, despite didactical sample with some pseudo code snippets to get some experience, was a more sigmoid middle- versus endgame scaling of certain eval terms, instead of the fruit-like linear scaling on sum of material.

I think that fp- or integer dot-products became quite interesting with simd instructions - for instance a 64-bit * 64-byte dotproduct will be less than 20 cycles on a K10 with sse2.

Cheers,
Gerd

Volker Annuss · Post by **Volker Annuss** » Sun Oct 21, 2007 3:36 pm

Hello Gerd,

in Hermann I use neural networks for timing and for material evaluation.

I have written something about timing in the old but still available newsticker at http://www.playwitharena.com. See Nr. 86 there. (You cannot put a direct link to the newsticker archive here.)
The way I use neural networks for timing has changed since then.

Material evaluation is very complicated and does not give much strength.

See you on friday
Volker

Gerd Isenberg · Post by **Gerd Isenberg** » Sun Oct 21, 2007 5:42 pm

Volker Annuss wrote:Hello Gerd,

in Hermann I use neural networks for timing and for material evaluation.

I have written something about timing in the old but still available newsticker at http://www.playwitharena.com. See Nr. 86 there. (You cannot put a direct link to the newsticker archive here.)
The way I use neural networks for timing has changed since then.

Material evaluation is very complicated and does not give much strength.

See you on friday
Volker

Hi Volker,

interesting, yes considering nodecount of root-moves seems a good measure to decide about easy moves and timing issues. What other factors do you feed in into your neural timing network? Amount, trend and/or oscillation of root-scores? Or whether the always best-move so far looks somehow suspicious due to immediate irreversible long term defects on pawn-structure or king-safety - according to some "analyses" aka pre-processing of the root-position?

One problem with rootmove-sorting or easy-move "singularity" based on node-count occurs as follows. I can force a draw, eg. by piece sac and perpetual with queen checks. That takes only a small, narrow subtree (let say < 0.1%) and is eventually persistant in the hash after some depth. One other move has a huge subtree (say > 60..90%) and leaves a small plus score so far, yielding in unclear, volatile positions. This kind of node-count "singularity" to terminate earlier seems dangerous to me.

Also - if this "singular" move suddenly fails below zero the first time, I like to try the forced draw-saving move before other moves, which may otherwise start to take a lot of time without solving the fail low.

See you in Leiden,
Are you ready to win the 200€?

Cheers,
Gerd

bob · Post by **bob** » Mon Oct 22, 2007 3:23 am

I tend to agree with your overall assessment. Many years ago, we had an ANN project here at UAB, dealing with target-tracking using RADAR data in an attempt to track targets from one frame to the next. Sounds easy enough until you see how far individual "things" can move from one frame to another. Even using speed, altitude, heading, rate of climb/descent, laws of physics (inertia) it was a difficult problem, and each new set of training data brought new difficulties to the table, and caused problems with prior results that some thought were "solved".

I think the case of a single tiny change altering the game outcome will turn into the hurdle that is difficult to overcome, unfortunately.

BTW good to see you posting here again...

Volker Annuss · Post by **Volker Annuss** » Mon Oct 22, 2007 12:57 pm

Hello Gerd,

Hermann no longer uses node counts for timing. (The public version 2.0 still does.)

It depends too much on what one finds in the hash table and besides the draw problems you mentioned, it does not work very well with ponder on where after a ponder miss you may still have many hash hits in some lines and very few in some others. It becomes even worse with my multi processor implementation with a shared hash table.

My new implementation uses
- some properties of the first moves of the principal variation like moving pieces and captured pieces
- differences and transpositions between the first moves of the PVs of the last two iterations
- scores from the last two iterations
- number of possible moves

Every programmer has a chance to win the 200€ in Leiden. Chances increase for newcomers and programs that made much progress, but one still needs some luck.

Greetings
Volker

designing neural networks

Re: designing neural networks

Re: designing neural networks

Re: designing neural networks

Re: designing neural networks

Re: designing neural networks

Re: designing neural networks

Re: designing neural networks

Re: designing neural networks

Re: designing neural networks

Re: designing neural networks