import random
import sys
def split_huge_file(file,out1,out2,percentage=0.75,seed=42):
random.seed(seed)
with open(file, 'r') as fin, \
open(out1, 'w') as foutBig, \
open(out2, 'w') as foutSmall:
for line in fin:
r = random.random()
if r < percentage:
foutBig.write(line)
else:
foutSmall.write(line)
split_huge_file(sys.argv[1], sys.argv[2], sys.argv[3], float(sys.argv[4]), int(sys.argv[5]))
import random
import sys
def split_huge_file(file,out1,out2,percentage=0.75,seed=42):
random.seed(seed)
with open(file, 'r') as fin, \
open(out1, 'w') as foutBig, \
open(out2, 'w') as foutSmall:
for line in fin:
r = random.random()
if r < percentage:
foutBig.write(line)
else:
foutSmall.write(line)
split_huge_file(sys.argv[1], sys.argv[2], sys.argv[3], float(sys.argv[4]), int(sys.argv[5]))
The file has been updated, 35millions positions (instead of 38), removing duplicates.
Very nice indeed. Would you mind sharing your modified version of lichess-quiet?
File is there.
I am afraid It is not your typical epd encoding, it has historical (dumb) reasons why it is not. It should still be pretty easy to parse just split every line at "|" and you can extract FEN and Game Result.
fabianVDW wrote: ↑Sun Aug 11, 2019 4:11 pm
I am afraid It is not your typical epd encoding, it has historical (dumb) reasons why it is not. It should still be pretty easy to parse just split every line at "|" and you can extract FEN and Game Result.
Thank you. I did a few search and replace operations, and now it's all set.