To start, partition the problem in half. Using the example of kbbk, take the downloaded versions of the file pair, decompress them, then compare the uncompressed files. If the uncompressed files match with what you have generated afresh, then the problem is not tbgen, but datacomp.
And, that's what I find.
Code: Select all
$ ls -l OLD
total 2076
-rw-rw-r-- 1 jkominek jkominek 873642 May 4 11:54 kbbk.nbb
-r--r--r-- 1 jkominek jkominek 205549 Feb 24 2002 kbbk.nbb.emd
-rw-rw-r-- 1 jkominek jkominek 789885 May 4 11:54 kbbk.nbw
-r--r--r-- 1 jkominek jkominek 249216 Feb 24 2002 kbbk.nbw.emd
$ ls -l NEW
total 2152
-rw-rw-r-- 1 jkominek jkominek 873642 May 4 11:40 kbbk.nbb
-rw-rw-r-- 1 jkominek jkominek 176132 May 4 11:41 kbbk.nbb.emd
-rw-rw-r-- 1 jkominek jkominek 789885 May 4 11:40 kbbk.nbw
-rw-rw-r-- 1 jkominek jkominek 289158 May 4 11:41 kbbk.nbw.emd
-rw-rw-r-- 1 jkominek jkominek 1386 May 4 11:40 kbbk.tbs
-rw-rw-r-- 1 jkominek jkominek 28644 May 4 11:40 kbk.nbb
-rw-rw-r-- 1 jkominek jkominek 27243 May 4 11:40 kbk.nbw
-rw-rw-r-- 1 jkominek jkominek 99 May 4 11:40 kbk.tbs
The Kadatach compression code is difficult to read and comprehend -- at least for me it is. It would take a programmer of the caliber of Dann or Ronald to narrow down the issue and debug. Not that I would expect either of them to try, but this fragment from complib.c caught my eye, even if it's not the culprit.
Code: Select all
if (p->bits > 15)
{
/* I am lazy bastard -- package-merge algorithms is too complicated */
/* so I'll simply scale frequences down; sooner or later maximal */
/* bit length will not exceed 31 */
/* In the production-quality library I used [-1 +3 -2] transform */
/* but as long as 31-bit boundary is unlikely to be reached it is */
/* not necessary to bother about better [but complicated] solutions */
assert (p == first_sorted);
q = p;
do
q->freq = (q->freq + 1) >> 1;
while ((q = q->son[0]) != 0);
goto restart;
}
For Jonathan, are you sure you want to generate the tables rather than just download?