RKISS

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: RKISS

Post by mcostalba »

Don wrote: It makes me wonder even more how a few lines of code like this can be copyrighted, when it's not even a very usable general purpose RNG if it cannot be
Don, it can because is not an one hour effort: it has been troughly tested and verified against standard PRNG metric tools and has been proven to be a good PRNG, is not a casual attempt but a piece of non trivial intelectual property.

So I think is misleading looking at how many lines of code it has. Also an "Hello world" C++ program as more or less the same lines of code but the _effort_ to produce and validate it is completely different.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: RKISS

Post by Don »

mcostalba wrote:
Don wrote: It makes me wonder even more how a few lines of code like this can be copyrighted, when it's not even a very usable general purpose RNG if it cannot be
Don, it can because is not an one hour effort: it has been troughly tested and verified against standard PRNG metric tools and has been proven to be a good PRNG, is not a casual attempt but a piece of non trivial intelectual property.

So I think is misleading looking at how many lines of code it has. Also an "Hello world" C++ program as more or less the same lines of code but the _effort_ to produce and validate it is completely different.
Are you saying that part of the GPL wording is how much time and effort you spent verifying and thinking about the code? I don't see that anywhere in the GPL.

If I have a piece of code in my program that says:

s.a = 0xf1ea5eed;

but it's not related to random number generation, am I volating GPL? This line of code is in Stockish.

I'm sure the answer is no. But what if the next line just happens to be the same line of code that is in stockfish after the s.a line?

What if I just said, "x = 0xf1ea5eed;" and used different variable names everywhere else?

These are rhetorical questions and I don't expect an answer. But code size must surely be part of the picture. Also, the intellectual effort that went into this was not done by one person - I don't think Heinz van Saanen invented KISS.

So I suppose that I could change a constant and copyright it myself? And who would know how much intellectual effort I put into it? Is it suddenly valid or invalid based on this? I don't think so.

I think you are wrong about the intellectual effort - I think I could write 1000 lines of crap if I wanted to and put a GPL license on it, and it would be perfectly valid (even if nobody cared about it.)

I'm not trying to be confrontational, it just seems to me that there should be some kind of reasonable limits on what you can copyright - especially when you are building strongly on the works of others because shifting 4 numbers does not seem to be something that you should be able to forbid others from doing unless they follow your rules.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: RKISS

Post by mcostalba »

Don wrote: Are you saying that part of the GPL wording is how much time and effort you spent verifying and thinking about the code? I don't see that anywhere in the GPL.
Actually if we speak about GPL then things are much simpler than that: you are not allowed to import _any_ piece of GPL code without being forced to open your sources.

GPL license, as you know very well, does not mention in any way the maximum size of GPL code you can copy and still be allowed to avoid the GPL consequences. This is a very known fact and was designed exactly to be so just to avoid discussions like "yes, but I copied only 2 lines" and so on...

If 2 lines of code seems not enough for you then simply do _not_ import them in your engine and there is no issue at all.


P.S: BTW I think is clear to everybody, when driven by intelectual honesty, that copying some code and then raname the variables does not avoids you to escape GPL, if the origin code was GPL released. Here the key word is "derived". Stockfish is derived from Glaurung and so must obey GPL now and in the future even if the current 2.0.1 code is practicaly almost 100% changed from the original Glaurung one.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: RKISS

Post by Don »

mcostalba wrote:
Don wrote: Are you saying that part of the GPL wording is how much time and effort you spent verifying and thinking about the code? I don't see that anywhere in the GPL.
Actually if we speak about GPL then things are much simpler than that: you are not allowed to import _any_ piece of GPL code without being forced to open your sources.

GPL license, as you know very well, does not mention in any way the maximum size of GPL code you can copy and still be allowed to avoid the GPL consequences. This is a very known fact and was designed exactly to be so just to avoid discussions like "yes, but I copied only 2 lines" and so on...

If 2 lines of code seems not enough for you then simply do _not_ import them in your engine and there is no issue at all.
I think that every program violates GPL then. I think you can find 1 line of code in stockfish that is identical to a line of code in some other licensed program such as "x = 0;"

If you consider renaming of variable to "get around" the licence as you are saying, then even MORE lines of code are violation. If I have "x = 0;" then
EVERY line of code where I say some variable equal 0 is a violation.

At some point you have to consider the nature and quantity of the change and some human will have to make a judgement call. For example it seems pretty ludicrous to me that NO program can ever assign the value 0xd4e12c77 to a variable because RKISS has done it first. It's a forgone conclusion that if yo assign the value 0 to a variable, or some other low value less than 10 you are violating some GPL code somewhere.

Do you honestly think that is reasonable?

P.S: BTW I think is clear to everybody, when driven by intelectual honesty, that copying some code and then raname the variables does not avoids you to escape GPL, if the origin code was GPL released.
What about when someone completely rewrites a program by using a different program as their template? They just use their own style of programming but essentially they are copying the program they are based on. Is it ok if you make a number of gratuitous (and perhaps a few useful) changes?

This seems to get around the GPL and most any other license. For most people it isn't about what is honest, it's about what they can get away with and what behavior others will accept from them.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: RKISS

Post by mcostalba »

Don wrote: I think that every program violates GPL then. I think you can find 1 line of code in stockfish that is identical to a line of code in some other licensed program such as "x = 0;"
I think the key point here is the origin of that line of code. I think the program violates GPL if its programmer has copied/derived that line of code form a GPL source, not if just "it happens" that two lines of code in different programs are equal. I mean what is important here is that one programmer read some GPL sources and _then_ copies some code from them. Not that two lines of code are the same by accident, as could happen.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: RKISS

Post by bob »

Don wrote:
bob wrote:The PRNG I use (from the book "Numerical Recpies") uses an array of numbers that get twisted in odd ways. But a PRNG is _supposed_ to produce the same string of numbers each time it is initialized.

The easiest way to select random book moves is to use the PRNG to initialize everything, then obtain the system time in microseconds or something that is high-resolution, extract the rightmost N bits, and loop calling the PRNG that many times before you start to play the game. Then you start the same PRNG sequence as the PRNG is supposed to deliver, but you skip N numbers before using them for book.
That seems incredible inefficient if you want to have more than a few possible seed states but if your software only initializes the RNG once at program startup (which most but not all software needs to do) and speed is not an issue, you could probably still have at least a few thousands states without increasing startup time by more than a fraction of a second.
The problem would be that each "initial state" would have to be thoroughly tested for randomness. Which is a big effort. A good PRNG has a huge period. Sucking away the first 1000 values for the Zobrist keys (I have mine statically defined so that I can change the PRNG if I want) and then sucking away N where N is a random number from either /dev/random, the low N bits of the wall clock time or whatever, seems quite reasonable, as opposed to taking the chance of choosing a poor initial starting state that has lousy properties...

For selecting opening book moves only, this is not an issue of course. But it makes the generator impractical as a general purpose RNG.

It makes me wonder even more how a few lines of code like this can be copyrighted, when it's not even a very usable general purpose RNG if it cannot be initialized in constant time and it can only be initialized to a point very close to the beginning of it's enormous period length. O(n) puts a severe limit on the number of practical starting states you can have and it's not even distributed over the entire sequence or period of the generator!

In any application requiring a significant number of RNG calls it's going to always visit the same sequences once you are beyond first few thousand that you initialized it with. Here is an example:

Suppose I wanted to simulate 100 billion dice throws and I use the slow O(n) initializer to provide 1 million starting states. No matter how many times I run the simulation, I'm STILL going to generate almost the same results - the only difference being the tiny fraction of differences caused by where I start and end in the sequence.
I am not following. One can, once one has a reliable PRNG with a long period, discard the first N numbers to produce a different sequence. If you are talking about running a trial twice with two different sequences, with a 64 bit PRNG you can certainly discard the first 1,000,000,000 numbers and use that as a starting point for the second sequence.

That is way less dangerous that arbitrarily choosing seeds that might be very bad...

Anyway, don't get me wrong - I'm not being critical, now that I understand it I can see that it's practical and fine for what it's being used for.

These types of PRNGs are harder to "seed", although one can (in the PRNG I use) alter the initial values of the two array indices, which will certainly change the sequence. But the released version has been tested quite thoroughly, and such a change might easily shorten the PRNG period or randomness, which is undesirable.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: RKISS

Post by Don »

bob wrote:
Don wrote:
bob wrote:The PRNG I use (from the book "Numerical Recpies") uses an array of numbers that get twisted in odd ways. But a PRNG is _supposed_ to produce the same string of numbers each time it is initialized.

The easiest way to select random book moves is to use the PRNG to initialize everything, then obtain the system time in microseconds or something that is high-resolution, extract the rightmost N bits, and loop calling the PRNG that many times before you start to play the game. Then you start the same PRNG sequence as the PRNG is supposed to deliver, but you skip N numbers before using them for book.
That seems incredible inefficient if you want to have more than a few possible seed states but if your software only initializes the RNG once at program startup (which most but not all software needs to do) and speed is not an issue, you could probably still have at least a few thousands states without increasing startup time by more than a fraction of a second.
The problem would be that each "initial state" would have to be thoroughly tested for randomness. Which is a big effort. A good PRNG has a huge period. Sucking away the first 1000 values for the Zobrist keys (I have mine statically defined so that I can change the PRNG if I want) and then sucking away N where N is a random number from either /dev/random, the low N bits of the wall clock time or whatever, seems quite reasonable, as opposed to taking the chance of choosing a poor initial starting state that has lousy properties...
Yes, I fully understand WHY it's done that way for this particular random number generator. However a good general purpose RNG needs to have a way to provide millions or billions of seeds and the "state" should be distributed well over all possible states. The only way to seed this generator is with an O(n) function that moves to state extremely near the front of it's massive period.

For selecting opening book moves only, this is not an issue of course. But it makes the generator impractical as a general purpose RNG.

It makes me wonder even more how a few lines of code like this can be copyrighted, when it's not even a very usable general purpose RNG if it cannot be initialized in constant time and it can only be initialized to a point very close to the beginning of it's enormous period length. O(n) puts a severe limit on the number of practical starting states you can have and it's not even distributed over the entire sequence or period of the generator!

In any application requiring a significant number of RNG calls it's going to always visit the same sequences once you are beyond first few thousand that you initialized it with. Here is an example:

Suppose I wanted to simulate 100 billion dice throws and I use the slow O(n) initializer to provide 1 million starting states. No matter how many times I run the simulation, I'm STILL going to generate almost the same results - the only difference being the tiny fraction of differences caused by where I start and end in the sequence.
I am not following. One can, once one has a reliable PRNG with a long period, discard the first N numbers to produce a different sequence. If you are talking about running a trial twice with two different sequences, with a 64 bit PRNG you can certainly discard the first 1,000,000,000 numbers and use that as a starting point for the second sequence.

That is way less dangerous that arbitrarily choosing seeds that might be very bad...
I'm not arguing that point, I agree with you that you cannot reseed without really knowing what you are doing. What I'm arguing is that it's not a good general purpose generator. However I think it's perfect for what it's being used for, I just didn't understand this at first.

You gave 1 billion, but I specified 100 billion. But how long to skip even 1 billion states? There will be this big delay at the start of the run while the RNG generates 1 billion random numbers. That is not a desirable characteristic of good general purpose random number generator.

I did a test using a very fast KISS style generator, not the stockish one but a long period generator that Marsaglia posted. It took about 15 seconds to generate 1 billion random numbers. So it's not a very good seeding algorithm is it?

So seeding based on generating the first N is O(n) and very expensive. But even more to the point you are not getting a fresh set of random numbers, you should be able to set the seed to 1, or to 2 and expect to get a completely different set of numbers, but in this case you would get all the same numbers except the one you skipped and the one at the end. That's HORRIBLE behavior for a general purpose RNG but this RKISS is very good for what it's being used for.


Anyway, don't get me wrong - I'm not being critical, now that I understand it I can see that it's practical and fine for what it's being used for.

These types of PRNGs are harder to "seed", although one can (in the PRNG I use) alter the initial values of the two array indices, which will certainly change the sequence. But the released version has been tested quite thoroughly, and such a change might easily shorten the PRNG period or randomness, which is undesirable.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: RKISS

Post by bob »

Don wrote:
bob wrote:
Don wrote:
bob wrote:The PRNG I use (from the book "Numerical Recpies") uses an array of numbers that get twisted in odd ways. But a PRNG is _supposed_ to produce the same string of numbers each time it is initialized.

The easiest way to select random book moves is to use the PRNG to initialize everything, then obtain the system time in microseconds or something that is high-resolution, extract the rightmost N bits, and loop calling the PRNG that many times before you start to play the game. Then you start the same PRNG sequence as the PRNG is supposed to deliver, but you skip N numbers before using them for book.
That seems incredible inefficient if you want to have more than a few possible seed states but if your software only initializes the RNG once at program startup (which most but not all software needs to do) and speed is not an issue, you could probably still have at least a few thousands states without increasing startup time by more than a fraction of a second.
The problem would be that each "initial state" would have to be thoroughly tested for randomness. Which is a big effort. A good PRNG has a huge period. Sucking away the first 1000 values for the Zobrist keys (I have mine statically defined so that I can change the PRNG if I want) and then sucking away N where N is a random number from either /dev/random, the low N bits of the wall clock time or whatever, seems quite reasonable, as opposed to taking the chance of choosing a poor initial starting state that has lousy properties...
Yes, I fully understand WHY it's done that way for this particular random number generator. However a good general purpose RNG needs to have a way to provide millions or billions of seeds and the "state" should be distributed well over all possible states. The only way to seed this generator is with an O(n) function that moves to state extremely near the front of it's massive period.
Easier said than done. For example, for the classic multiplicative congruential types, choosing an even number for a seed is a problem, since any even number * any other number is an even number which eliminates 1/2 the possible numbers and shortens the period by 1/2. There have been lots of discussions in the past about the problem of users choosing "arbitrary seeds" and the inherent dangers that incurs...


For selecting opening book moves only, this is not an issue of course. But it makes the generator impractical as a general purpose RNG.

It makes me wonder even more how a few lines of code like this can be copyrighted, when it's not even a very usable general purpose RNG if it cannot be initialized in constant time and it can only be initialized to a point very close to the beginning of it's enormous period length. O(n) puts a severe limit on the number of practical starting states you can have and it's not even distributed over the entire sequence or period of the generator!

In any application requiring a significant number of RNG calls it's going to always visit the same sequences once you are beyond first few thousand that you initialized it with. Here is an example:

Suppose I wanted to simulate 100 billion dice throws and I use the slow O(n) initializer to provide 1 million starting states. No matter how many times I run the simulation, I'm STILL going to generate almost the same results - the only difference being the tiny fraction of differences caused by where I start and end in the sequence.
I am not following. One can, once one has a reliable PRNG with a long period, discard the first N numbers to produce a different sequence. If you are talking about running a trial twice with two different sequences, with a 64 bit PRNG you can certainly discard the first 1,000,000,000 numbers and use that as a starting point for the second sequence.

That is way less dangerous that arbitrarily choosing seeds that might be very bad...
I'm not arguing that point, I agree with you that you cannot reseed without really knowing what you are doing. What I'm arguing is that it's not a good general purpose generator. However I think it's perfect for what it's being used for, I just didn't understand this at first.

You gave 1 billion, but I specified 100 billion. But how long to skip even 1 billion states? There will be this big delay at the start of the run while the RNG generates 1 billion random numbers. That is not a desirable characteristic of good general purpose random number generator.

I did a test using a very fast KISS style generator, not the stockish one but a long period generator that Marsaglia posted. It took about 15 seconds to generate 1 billion random numbers. So it's not a very good seeding algorithm is it?

So seeding based on generating the first N is O(n) and very expensive. But even more to the point you are not getting a fresh set of random numbers, you should be able to set the seed to 1, or to 2 and expect to get a completely different set of numbers, but in this case you would get all the same numbers except the one you skipped and the one at the end. That's HORRIBLE behavior for a general purpose RNG but this RKISS is very good for what it's being used for.


Anyway, don't get me wrong - I'm not being critical, now that I understand it I can see that it's practical and fine for what it's being used for.

These types of PRNGs are harder to "seed", although one can (in the PRNG I use) alter the initial values of the two array indices, which will certainly change the sequence. But the released version has been tested quite thoroughly, and such a change might easily shorten the PRNG period or randomness, which is undesirable.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: RKISS

Post by Don »

bob wrote:
Don wrote:
bob wrote:
Don wrote:
bob wrote:The PRNG I use (from the book "Numerical Recpies") uses an array of numbers that get twisted in odd ways. But a PRNG is _supposed_ to produce the same string of numbers each time it is initialized.

The easiest way to select random book moves is to use the PRNG to initialize everything, then obtain the system time in microseconds or something that is high-resolution, extract the rightmost N bits, and loop calling the PRNG that many times before you start to play the game. Then you start the same PRNG sequence as the PRNG is supposed to deliver, but you skip N numbers before using them for book.
That seems incredible inefficient if you want to have more than a few possible seed states but if your software only initializes the RNG once at program startup (which most but not all software needs to do) and speed is not an issue, you could probably still have at least a few thousands states without increasing startup time by more than a fraction of a second.
The problem would be that each "initial state" would have to be thoroughly tested for randomness. Which is a big effort. A good PRNG has a huge period. Sucking away the first 1000 values for the Zobrist keys (I have mine statically defined so that I can change the PRNG if I want) and then sucking away N where N is a random number from either /dev/random, the low N bits of the wall clock time or whatever, seems quite reasonable, as opposed to taking the chance of choosing a poor initial starting state that has lousy properties...
Yes, I fully understand WHY it's done that way for this particular random number generator. However a good general purpose RNG needs to have a way to provide millions or billions of seeds and the "state" should be distributed well over all possible states. The only way to seed this generator is with an O(n) function that moves to state extremely near the front of it's massive period.
Easier said than done. For example, for the classic multiplicative congruential types, choosing an even number for a seed is a problem, since any even number * any other number is an even number which eliminates 1/2 the possible numbers and shortens the period by 1/2. There have been lots of discussions in the past about the problem of users choosing "arbitrary seeds" and the inherent dangers that incurs...
Yes, I am aware that this is a difficult problem. There are some very good random number generators that have had faulty code for the seeding part. I think an earlier version of MT was found to have a pretty limited number of starting states (even though it could be seeded with a 32 bit number) due to an initialization bug if I am remembering this correctly.

The lagged fibonacci RNG is one that can produce bad numbers if it's not initialized properly and it's very sensitive to this.

But still, for a good general purpose RNG this is a necessary feature. So certain types of RNG's, although good in themselves may not be appropriate.

Maybe that's why operating system libraries have always had crappy generators.

For selecting opening book moves only, this is not an issue of course. But it makes the generator impractical as a general purpose RNG.

It makes me wonder even more how a few lines of code like this can be copyrighted, when it's not even a very usable general purpose RNG if it cannot be initialized in constant time and it can only be initialized to a point very close to the beginning of it's enormous period length. O(n) puts a severe limit on the number of practical starting states you can have and it's not even distributed over the entire sequence or period of the generator!

In any application requiring a significant number of RNG calls it's going to always visit the same sequences once you are beyond first few thousand that you initialized it with. Here is an example:

Suppose I wanted to simulate 100 billion dice throws and I use the slow O(n) initializer to provide 1 million starting states. No matter how many times I run the simulation, I'm STILL going to generate almost the same results - the only difference being the tiny fraction of differences caused by where I start and end in the sequence.
I am not following. One can, once one has a reliable PRNG with a long period, discard the first N numbers to produce a different sequence. If you are talking about running a trial twice with two different sequences, with a 64 bit PRNG you can certainly discard the first 1,000,000,000 numbers and use that as a starting point for the second sequence.

That is way less dangerous that arbitrarily choosing seeds that might be very bad...
I'm not arguing that point, I agree with you that you cannot reseed without really knowing what you are doing. What I'm arguing is that it's not a good general purpose generator. However I think it's perfect for what it's being used for, I just didn't understand this at first.

You gave 1 billion, but I specified 100 billion. But how long to skip even 1 billion states? There will be this big delay at the start of the run while the RNG generates 1 billion random numbers. That is not a desirable characteristic of good general purpose random number generator.

I did a test using a very fast KISS style generator, not the stockish one but a long period generator that Marsaglia posted. It took about 15 seconds to generate 1 billion random numbers. So it's not a very good seeding algorithm is it?

So seeding based on generating the first N is O(n) and very expensive. But even more to the point you are not getting a fresh set of random numbers, you should be able to set the seed to 1, or to 2 and expect to get a completely different set of numbers, but in this case you would get all the same numbers except the one you skipped and the one at the end. That's HORRIBLE behavior for a general purpose RNG but this RKISS is very good for what it's being used for.


Anyway, don't get me wrong - I'm not being critical, now that I understand it I can see that it's practical and fine for what it's being used for.

These types of PRNGs are harder to "seed", although one can (in the PRNG I use) alter the initial values of the two array indices, which will certainly change the sequence. But the released version has been tested quite thoroughly, and such a change might easily shorten the PRNG period or randomness, which is undesirable.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: RKISS

Post by bob »

Don wrote:
bob wrote:
Don wrote:
bob wrote:
Don wrote:
bob wrote:The PRNG I use (from the book "Numerical Recpies") uses an array of numbers that get twisted in odd ways. But a PRNG is _supposed_ to produce the same string of numbers each time it is initialized.

The easiest way to select random book moves is to use the PRNG to initialize everything, then obtain the system time in microseconds or something that is high-resolution, extract the rightmost N bits, and loop calling the PRNG that many times before you start to play the game. Then you start the same PRNG sequence as the PRNG is supposed to deliver, but you skip N numbers before using them for book.
That seems incredible inefficient if you want to have more than a few possible seed states but if your software only initializes the RNG once at program startup (which most but not all software needs to do) and speed is not an issue, you could probably still have at least a few thousands states without increasing startup time by more than a fraction of a second.
The problem would be that each "initial state" would have to be thoroughly tested for randomness. Which is a big effort. A good PRNG has a huge period. Sucking away the first 1000 values for the Zobrist keys (I have mine statically defined so that I can change the PRNG if I want) and then sucking away N where N is a random number from either /dev/random, the low N bits of the wall clock time or whatever, seems quite reasonable, as opposed to taking the chance of choosing a poor initial starting state that has lousy properties...
Yes, I fully understand WHY it's done that way for this particular random number generator. However a good general purpose RNG needs to have a way to provide millions or billions of seeds and the "state" should be distributed well over all possible states. The only way to seed this generator is with an O(n) function that moves to state extremely near the front of it's massive period.
Easier said than done. For example, for the classic multiplicative congruential types, choosing an even number for a seed is a problem, since any even number * any other number is an even number which eliminates 1/2 the possible numbers and shortens the period by 1/2. There have been lots of discussions in the past about the problem of users choosing "arbitrary seeds" and the inherent dangers that incurs...
Yes, I am aware that this is a difficult problem. There are some very good random number generators that have had faulty code for the seeding part. I think an earlier version of MT was found to have a pretty limited number of starting states (even though it could be seeded with a 32 bit number) due to an initialization bug if I am remembering this correctly.

The lagged fibonacci RNG is one that can produce bad numbers if it's not initialized properly and it's very sensitive to this.

But still, for a good general purpose RNG this is a necessary feature. So certain types of RNG's, although good in themselves may not be appropriate.

Maybe that's why operating system libraries have always had crappy generators.

For selecting opening book moves only, this is not an issue of course. But it makes the generator impractical as a general purpose RNG.

It makes me wonder even more how a few lines of code like this can be copyrighted, when it's not even a very usable general purpose RNG if it cannot be initialized in constant time and it can only be initialized to a point very close to the beginning of it's enormous period length. O(n) puts a severe limit on the number of practical starting states you can have and it's not even distributed over the entire sequence or period of the generator!

In any application requiring a significant number of RNG calls it's going to always visit the same sequences once you are beyond first few thousand that you initialized it with. Here is an example:

Suppose I wanted to simulate 100 billion dice throws and I use the slow O(n) initializer to provide 1 million starting states. No matter how many times I run the simulation, I'm STILL going to generate almost the same results - the only difference being the tiny fraction of differences caused by where I start and end in the sequence.
I am not following. One can, once one has a reliable PRNG with a long period, discard the first N numbers to produce a different sequence. If you are talking about running a trial twice with two different sequences, with a 64 bit PRNG you can certainly discard the first 1,000,000,000 numbers and use that as a starting point for the second sequence.

That is way less dangerous that arbitrarily choosing seeds that might be very bad...
I'm not arguing that point, I agree with you that you cannot reseed without really knowing what you are doing. What I'm arguing is that it's not a good general purpose generator. However I think it's perfect for what it's being used for, I just didn't understand this at first.

You gave 1 billion, but I specified 100 billion. But how long to skip even 1 billion states? There will be this big delay at the start of the run while the RNG generates 1 billion random numbers. That is not a desirable characteristic of good general purpose random number generator.

I did a test using a very fast KISS style generator, not the stockish one but a long period generator that Marsaglia posted. It took about 15 seconds to generate 1 billion random numbers. So it's not a very good seeding algorithm is it?

So seeding based on generating the first N is O(n) and very expensive. But even more to the point you are not getting a fresh set of random numbers, you should be able to set the seed to 1, or to 2 and expect to get a completely different set of numbers, but in this case you would get all the same numbers except the one you skipped and the one at the end. That's HORRIBLE behavior for a general purpose RNG but this RKISS is very good for what it's being used for.


Anyway, don't get me wrong - I'm not being critical, now that I understand it I can see that it's practical and fine for what it's being used for.

These types of PRNGs are harder to "seed", although one can (in the PRNG I use) alter the initial values of the two array indices, which will certainly change the sequence. But the released version has been tested quite thoroughly, and such a change might easily shorten the PRNG period or randomness, which is undesirable.
Actually, an OS might well have a good one. Linux and /dev/random come to mind. But useless for Zobrist numbers since this is an unseedable string of random numbers that will start at a different place each time the system starts...

Certainly bad for Zobrist numbers and for debugging.