Ralf Müller wrote:Thanks for the reply!
...
But dependent on the ck the outcome is very different (f.e. with ck at 32). If I understand correctly the outcome should always be the same, only the speed on converging should be varify. Isn't that correct?
Unfortunately not. If you choose wrong parameters, you can mess it up completely. Furthermore, the spsa algorithm does
not find the global optimum. If you set up a problem that has multiple local extrema, you must expect to get different results between runs.
Let's take a closer look. There are several parameters to play with.
In the example configuration file
https://github.com/zamar/spsa/blob/master/aggr.conf we find:
Code: Select all
Iterations = 50000
A = 5000
Gamma = 0.101
Alpha = 0.602
Alpha and Gamma are set with recommendations from Spall, so maybe we should leave them at that. The "asymptotic optimal values" (i.e for very long experiments) are Alpha= 1.0 and Gamma=1/6.
You must have Alpha > Gamma, otherwise the values will simply blow up in magnitude.
Decide on the number of iterations you want to have. One iteration consists of two games with reversed colors.
Spall recommends to set A = 0.1*iterations, therefore this is presumably preset.
Next look at the parameters for each variable
https://github.com/zamar/spsa/blob/master/aggr.var
Code: Select all
Aggressiveness,30,0,200,10,0.0020,0
they are documented in section 4 of the README
https://github.com/zamar/spsa
Code: Select all
Variables file (name of the file is defined in configuration file) is a comma separated (CSV) file.
Columns are defined as follows:
Column 0: Variable name (alphanumeric string)
Column 1: Variable initial value (float)
Column 2: Variable minimum value (float)
Column 3: Variable maximum value (float)
Column 4: Perturbation "ck" for the last iteration (float)
Column 5: Relative apply factor "Rk" for the last iteration (float)
Column 6: For simulation mode, this defines the ELO decrease for point x = (+/-) 100 compared to point x = 0 (optimum) (float).
Notes:
- When ck is defined for the last iteration and the number of iterations is known, it's easy to derive a value for c.
- When ck and Rk are defined for the last iterations, it's easy to derive ak for the last iteration. Based on that we can derive value for a.
We learn, that the floats ck and Rk refers to the c_k and r_k of the last iteration. c_k is the stepsize tried with the parameter and R_k is the relative percentage of this step actually taken on success.
Therefore in this example you try a (clipped) step of 10 in each direction and later you add 0.002*10 = 0.02 into the direction of the winner.
You need therefore a difference of 50 won games to earn one point of "Aggressivness".