www.data-compression.info The Data Compression Resource on the Internet
Run Length Encoding (RLE)
Run Length Encoding (RLE) is a simple and popular data compression algorithm. It is based on the idea to replace a long sequence of the same symbol by a shorter sequence and is a good introduction into the data compression field for newcomers. The sequence of length l of a repeated symbol 's' is replaced by a shorter sequence, usually containing one or more symbols of 's', a length information and sometimes an escape symbol. RLE algorithms differ from each other mainly in three points: the threshold t, the marking of the start of a run and the coding of the length information. If l is smaller than t, the run stays unchanged, and if l is greater or equal t, the run is replaced. The start of a run can be indicated by a threshold run or an escape symbol 'c'. If a threshold run is used the start is characterized by a small sequence of 's', which has a length greater or equal than t. If an escape symbol 'c' indicates the start of a run, 's' is normally put behind 'c' in order to characterize the run symbol. 'c' must not be an element of the alphabet or occurrences of 'c' have to be coded in a way, that they are not mixed up with the start of a run.
Another important paper of Bernhard Balkenhol and Yuri Shtarkov from 1999 with focus on improvements of the Burrows-Wheeler Compression Algorithm (BWCA). It includes some interesting aspects for the RLE field too. An important property of the output of the BWT is the presents of many runs, which results in overestimating the probability of symbols outside the run. This problem is called "Pressure of Runs" and can be decreased by RLE. Besides that, a modification of the MTF algorithm is described which moves the next symbol to the second place in the list instead of the first place, except the old position of the new symbol was 0 or 1. If the old position of the new symbol was 1 it is moved to the first place only if the last output number was different from 0. This paper is my favourite paper from Bernhard. This BWCA approach achieves a compression rate of 2.26 bps for the Calgary Corpus.
The Data Compression Reference Center presents a tutorial from 2000 about RLE, which is based on an escape symbol. The tutorial includes a flow chart of the RLE algorithm together with some commercial examples.
Michael Maniscalco describes in 2000 a RLE algorithm for usage within the Burrows-Wheeler Compression Allgorithm, which is based on a fixed length threshold run and a symbol run which is about half as long as the original run.
This RLE algorithm from 2001 by Michael Maniscalco is based on a variable length threshold run, which defines the length of the binary representation of the threshold run and a mantissa part, which is stored in a separate data buffer.
Bernhard studied mathematics at the University of Bielefeld and works for the mediaWays GmbH in Germany. He is interested in information theory, data compression and encryption and has published several compression articles, which are available on his internet site.