Felicity EGTB generators and probers

hgm · Post by **hgm** » Mon Jul 08, 2024 4:02 pm

In my experience it is counterproductive to try making small savings on index space during generation; it merely slows down the generation, and doesn't make the difference between fitting in RAM or not. It only becomes interesting when you can save a factor 2 or more.

Besides, most of the 'broken positions' can be very usefully employed during generation. When a white and a black piece coincide in a wtm position, you can identify it as a position where that white piece is not present, but has just been captured by the black position. By 'seeding' those positions from the smaller EGT before starting generation, you don't need any special code for handling captures.

The best way of indexing on 8x8 boards is to simply concatenate all the 6-bit square numbers. For Xiangqi it is a different matter, due to the very small fraction of the board accessible to defensive pieces. There I use a Palace state 0-118 and an Elephant state 0-28, and (tabulated) transitions between those states for move generation.

phhnguyen · Post by **phhnguyen** » Tue Jul 09, 2024 3:45 am

Since EGTB generators could create massive data, I have been looking for some ways to store, backup, and distribute data between me and whoever needs it. I have just checked and found the price of a USB stick 2 TB with Lenovo branch is amazingly cheap, under 12 USD (16.89 Australian dollars). That price is significantly cheaper than similar ones from SanDisk, Kingston... It says ambiguously that it has the speed of USB 3.0 but I doubt it is USB 3.0. BTW, that size is enough for Nalimov 6 men (1.2 TB) as well as an Xiangqi EGTB I may generate myself with my current hardware. Not sure if it’s good enough but I consider to give it a try.

Any suggestions and/or other solutions?

phhnguyen · Post by **phhnguyen** » Wed Jul 10, 2024 4:16 am

Attempt 20: Xiangqi generator

The code for the Xiangqi generator has been added. It can generate endgames with a side with up to 2 attackers and one side with no attacker (armless). E.g., kraabkaabb, kcnaabbkaabb, kcpakaab. It could work with more attackers but I haven't checked them yet. Both sides with attackers will be later work.

To generate those endgames, set the parameter name as 1, 2-0, cn-0. E.g.:

Code: Select all

fegtbxq.exe -d c:\xqegtb -n 2-0 -g -core 10

Here are some stats for all endgames with only one attacker (with all configurations of defenders):

Code: Select all

Total endgames: 324
Index space: 2'017'954'701 (2 G)
Data size: 392 MB
Compression ratio: 0.392 / (2 G x 2) = 0.1

The compression ratio of 0.1 (or 10 times smaller) is much better than that of chess (0.42). The data size is reasonably small.

However, compared with the data size of the first release one 6 years ago (2018), when the size of 1 attacker EGTB is only 14 MB, the current one is 28 times bigger (392 MB vs 14 MB).

Did we make an improvement or a serious disimprovements???

hgm · Post by **hgm** » Wed Jul 10, 2024 10:38 am

The number of board positions is the set of 1-vs-0 EGT is about 3.5*90*(119*29)^2 ~ 3.75e9, which with a 0.1 compression factor indeed would reduce to ~375MB. There is a wtm and a btm position for each, but the left-right symmetry earns back that factor 2.

What is your explanation for this factor 28 larger size? The number of positions in the EGT is a given, so it can only be due to compression, right? This previous release must then have had a compression facttor 280 instead of your 10. How is that possible? Do these EGTs have some regularity properties that your compression scheme failed to exploit?

Thinking about this I identified the following issues that could help compression:

* The more defenders, the larger the number of positions. Only 9 of the ~3.5k possible Palace+Elephant constellations have a bare King, only 102 have a King + 1 defender. Most of the attackers are not very strong, and could never beat a full defense on their own, no matter how unfavorably the defenders are scattered. At best they gobble up a number of defenders. But a single remaining Elephant is already enough to draw against Pawn or Horse. So it seems EGTs like KPdKAAEE and KHdKAAEE (and even KPdKEE and KHdKEE) where 'd' stands for any combination of A and E only contain draw positions. Which means it should be possible to compress those to almost nothing.

* Comparatively few constellations of the defenders of the strong side would obstruct that side's King or attacking piece. There are 1 + 7 + 21 = 29 ways to position 0, 1 or 2 Elephants, only 7 of those would have an Elephant at the 'eye'. An attacker that is already across the River would not be hindered by any of the Elephant constellations, and for constellations with the Elephants not in the most advanced position that holds for an even larger fraction of the board. The EGT would be exactly the same as that of the (29 times smaller) EGT without any Elephants on the strong side.

You could make use of the latter by introducing a special DTM code meaning 'as if no Elephants'. If the probe would return that DTM, you would do a re-probe for the position without any Elephants to get the actual DTM. You might argue that size-reducing schemes that require re-probing do not count as compression, and that re-probing in general is a no-no because probing is already slow. But this depends on how you organize the index. In EGTs one compresses small blocks independently, so that a probe that misses the EGT cache would not require unduly large disk-access and decompression time. But this depends on how you calculate the index. One could make the index such that the 29 possible Elephant constellations for a given constellation of the other pieces are adjacent, and always reside in the same compression block. Then the 'no-Elephants' DTM is always available in that block, and replacing the special DTM code that any other of the 28 Elephant constellations might have by the no-Elephants DTM can be considered part of the decompression operation for that block.

Because the special DTM code would be very abundant, the normal compression scheme should then be able to achieve very high compression factors. E.g. using run-length encoding for long stretches of this DTM code. This could be helped by ordering the Elephant constellations such that the least obstructive are adjacent (e.g. c0, g0, c0+g0, a2, i2, a2+c0, a2+g0, i2+c0, i2+g0, a2+i2, ..., e2, e2+c0, e2+g0, e2+a2, e2+i2, e2+c4, e2+g4). Then the DTM for each Elephantless position would typically be followed by a long stretch of these special codes indicating they have that DTM too, and the compression would only have to indicate how many of these there are. For the positions with the attacker across the River (which is half the positions) that would typically be 20 times, so that a Huffman like compression scheme could reduce that to a single bit.

I suppose that the positions with an Elephant on e2 in general will have a DTM that is one larger, to move that Elephant away. So the (uncompressed) EGT would consist mostly out of a value repeated 21 times, followed by that same value increased by one 7 times. A compression scheme should be able to exploit that, by introducing a single code for such a partly-increased 20+7 repeat of the previous DTM, before the normal compression.

Of course many Palace constellations would be equivalent to the bare-King case as well; you could use a similar trick for those. E.g. by using the strong-side's Palace constellation as the next-higher contribution to the index, and order the least-obstructive Palace positions first, it could be that an entire section of (29) Elephant constellations can be repeated 12 times. Again you could introduce a (frequently occurring) symbol for such a 'block-repeat'.

phhnguyen · Post by **phhnguyen** » Fri Jul 12, 2024 1:46 pm

hgm wrote: ↑Wed Jul 10, 2024 10:38 am How is that possible?

The old EGTB omitted one side of the data, thus it was actually 14 times smaller than the current new one (which has both sides of the data). It also combined multiple configurations of elephants of the attacker side into one, say, combines krabkaabb, krabbkaabb… into kramkaabb. It used a small lookup table to solve some issues with that kind of combination.

That way worked well with the EGTB of one attacker only. However, it didn't work well with more attackers (the data is still smaller but not much as I wish). I will verify that method later.

The old code, data and old posts are still there:

https://github.com/nguyenpham/FelicityEgtb/releases
viewtopic.php?t=66337

hgm · Post by **hgm** » Fri Jul 12, 2024 5:34 pm

I don't understand what you mean by a 'side of the data', nor why doubling the number of sides would multiply the size by 14.

It is true that using tricks with the strong side's defensive pieces to reduce their number of essentially different constellation only works for end-games where only one side has attackers. But some of these can still be very large, in particular the 3-0. KdPPPKd and KdHPPKd should be of interest.

One can gain an enormous factor by mapping equivalent Palace+Elephant states to the same index. And I don't think you have to break the uniformity of the way you treat the end-games for this. It is just a matter of clever indexing and clever compression. Design the index such that positions that could be equivalent are contiguous, and make sure the compression makes uses an especially compact way for representing stretches of 22 or 7 identical DTM, or other patterns that are likely to arise because of a possible equivalence. Then it will use that in end-games where it can, in the parts of it where it can to get a huge compression for those.

tmokonen · Post by **tmokonen** » Sat Jul 13, 2024 10:57 pm

phhnguyen wrote: ↑Tue Jul 09, 2024 3:45 am Since EGTB generators could create massive data, I have been looking for some ways to store, backup, and distribute data between me and whoever needs it. I have just checked and found the price of a USB stick 2 TB with Lenovo branch is amazingly cheap, under 12 USD (16.89 Australian dollars). That price is significantly cheaper than similar ones from SanDisk, Kingston... It says ambiguously that it has the speed of USB 3.0 but I doubt it is USB 3.0. BTW, that size is enough for Nalimov 6 men (1.2 TB) as well as an Xiangqi EGTB I may generate myself with my current hardware. Not sure if it’s good enough but I consider to give it a try.

Any suggestions and/or other solutions?

That deal sounds too good to be true. Buyer beware.

phhnguyen · Post by **phhnguyen** » Mon Jul 15, 2024 4:06 am

hgm wrote: ↑Fri Jul 12, 2024 5:34 pm I don't understand what you mean by a 'side of the data', nor why doubling the number of sides would multiply the size by 14.

I store an endgame in 2 files, one for the White side turn and the other file for the Black side turn. One of them could be omitted and use search 1 fly instead.

phhnguyen · Post by **phhnguyen** » Mon Jul 15, 2024 4:16 am

tmokonen wrote: ↑Sat Jul 13, 2024 10:57 pm
phhnguyen wrote: ↑Tue Jul 09, 2024 3:45 am Since EGTB generators could create massive data, I have been looking for some ways to store, backup, and distribute data between me and whoever needs it. I have just checked and found the price of a USB stick 2 TB with Lenovo branch is amazingly cheap, under 12 USD (16.89 Australian dollars). That price is significantly cheaper than similar ones from SanDisk, Kingston... It says ambiguously that it has the speed of USB 3.0 but I doubt it is USB 3.0. BTW, that size is enough for Nalimov 6 men (1.2 TB) as well as an Xiangqi EGTB I may generate myself with my current hardware. Not sure if it’s good enough but I consider to give it a try.

Any suggestions and/or other solutions?
That deal sounds too good to be true. Buyer beware.

Yes, it looks like a scam!

BTW, I gave it a try by ordering it and it will come to me within a few days. We need some different ways to distribute huge data. I think there are some scenarios of being scammed:
- Feek branch: not easy to detect
- Feek speed: it says USB3 speed but it may be slower. Not easy to detect
- Feek volume: I will copy data to full to check and I may format it too

tmokonen · Post by **tmokonen** » Mon Jul 15, 2024 10:47 am

phhnguyen wrote: ↑Mon Jul 15, 2024 4:16 am Yes, it looks like a scam!

BTW, I gave it a try by ordering it and it will come to me within a few days. We need some different ways to distribute huge data. I think there are some scenarios of being scammed:
- Feek branch: not easy to detect
- Feek speed: it says USB3 speed but it may be slower. Not easy to detect
- Feek volume: I will copy data to full to check and I may format it too

From what I've read, it's a fairly common scam on online shopping sites. A cheap flash drive with only a few gigabytes of space has its controller reprogrammed, or is specially formatted, to fool the OS into thinking the drive has more capacity than what it can actually hold. Copying seems to work at first, but once you exceed the real capacity of the drive, files are lost, sometimes with no error messages.

I even saw a video on YouTube about someone buying one of these drives, knowing it was a scam, and when he plugged it into his computer, the drive turned out to have ransomware pre-installed on it. The guy at least had the good sense not to plug the device into his main rig. Of course, YouTube isn't exactly a paragon of truth and accuracy, so who knows in regards to malware, but seeing that is what made me concerned about the safety of your data.

Also, I went on Lenovo's home page, and while they sell a variety of flash drives, none of them were Lenovo brand.

Felicity EGTB generators and probers

Re: Felicity EGTB generators and probers

Re: Felicity EGTB generators and probers

Attempt 20: Xiangqi generator

Re: Felicity EGTB generators and probers

Re: Felicity EGTB generators and probers

Re: Felicity EGTB generators and probers

Re: Felicity EGTB generators and probers

Re: Felicity EGTB generators and probers

Re: Felicity EGTB generators and probers

Re: Felicity EGTB generators and probers