www.data-compression.info The Data Compression Resource on the Internet
Distance Coding (DC)
Distance Coding (DC) is an algorithm proposed by Edgar Binder in 2000. There is no official paper published about DC yet, but some messages from Edgar Binder in the comp.compression newsgroup are available. DC is a replacement of the Move To Front stage (MTF) within the Burrows-Wheeler Compression Algorithm. Similar to the IF stage, the output of the DC stage consits of distances of indices, which can be greater than 255. The DC stage scans the input sequence sequentially for each symbol C and outputs the distance between the current index and the next occurence of C.
This publication of Sebastian Deorowicz in "Software-Practice and Experience" from 2002 give a quite complete overview of the post BWT stages used within the Burrows-Wheeler Compression Algorithm. Besides his own Weighted Frequency Count Algorithm and other post BWT stages, Sebastian describes briefly but clearly variantions of the Move To Front scheme, the Inversion Frequencies algorithm and the Distance Coding algorithm with the 3 properties of the newsgroup posting. This is one of my favourite BWCA papers. This BWCA approach achieves a compression rate of 2.25 bps for the Calgary Corpus.
A quite similar publication from 2000 to the article "Second step algorithms in the Burrows-Wheeler compression algorithm" from Sebastian Deorowicz, describing his Weighted Frequency Count Algorithm. This BWCA approach achieves a compression rate of 2.25 bps for the Calgary Corpus.