The generation process is actually not that difficult.DTM50 conceptually stores a distance-to-mate value for each pair (position, rule50_counter). So that is 100x as much raw data as for DTM. The sequence of values for a particular position is far from random, but it is not immediately obvious how to find an efficient encoding that fits into a good compression scheme. And generating these values efficiently is another problem.
Let's write DTM(p, n) for the distance to mate of position p if there are n ply left on the 50-move counter.
So DTM(p, 1) = k > 0 if position p has a mate in 1 (in which case k=1) or if position p has a capture or pawn move leading to a losing position for the other side to move.
DTM(p, 1) = k < 0 if the player to move in position p cannot avoid playing a losing pawn move or capture.
This means that a table of all DTM(p, 1) values can be generated by 1 ply of retrograde analysis from all mate positions and all positions after a capture or pawn move.
Doing another ply of retrograde analysis leads to a table of all DTM(p, 2) values.
Yet another ply will give a table of all DTM(p, 3) values. In this step, some of the DTM(p, 1) winning values can be overwritten by a smaller winning value. Actually, this can already happen during the generation of the DTM(p, 2) table: instead of playing a winning pawn move (1 ply to a 50-move reset), in rare position there will be a shorter path to mate by first sacking the queen (2 plies to a 50-move rest). Or instead of a capture (1 ply to a reset), the shortest mate might require a check that forces the losing side to move a pawn (2 plies to a reset).
Repeating this until DTM(p, 100), we end up with the distance to mate for each position while respecting the 50-move rule, assuming the 50-move counter has just been reset. This is a very useful table, because entering the table is possible only through a capture, and that capture resets the 50-move counter. The only problem of the table is that it does not contain sufficient information to find the complete DTM50-optimal mating line. For that we also need the DTM(p, c) tables for 1 <= c <= 99.
So we need to save DTM(p, 100) and we need to save information describing exactly the values that were overwritten with smaller values in the process of generating DTM(p, 1), DTM(p, 2), DTM(p, 3), ..., DTM(p, 100).
The DTM(p, 100) tables are the tables that will be probed during the search (because they will be probed immediately after a capture) and they are the tables that are used by the generator to seed the capture positions. Their size will be comparable to normal DTM tables and they can be in the same format as the existing Syzygy tables.
The information encoding the values being overwritting during the generation of DTM(p, 1), ..., DTM(p, 100) would only be probed at the root. So access time can be longer, which means we can allow its decompression to be more expensive, and it can also be stored on a slower disk if necessary. But I don't know yet what kind of format would be best.
The generation process I describe above will need 2 bytes per position. (My (unreleased) DTM generator only needs 1 byte per position but it uses an approach that does not keep track of the number of plies since the last zeroing move.) For 6-piece tables using 2 bytes is not a problem. For 7-piece tables it is a bit more problematic but not necessarily fatal (a more efficient indexing scheme during generation will reduce requirements again).