www.data-compression.info The Data Compression Resource on the Internet
Move To Front (MTF)
The basic idea of the MTF scheme is to maintain a list, which represents the symbols of an alphabet. Frequenty used symbols are located near the front of the list. A symbol is encoded as the current index of that symbol in the list, id est as the number of symbols that precede it in the list. Every time the symbol occurs the index is output and the symbol is moved to the front of the list, id est to index 0. The MTF scheme is a typical representative of a List Update Algorithm (LUA) and is used as a Global Structure Transformation (GST) inside the Burrows-Wheeler Compression Algorithm (BWCA).
This paper of Bernhard Balkenhol, Stefan Kurtz and Yuri Shtarkov from 1999 describes a modification of the MTF algorithm, which moves the next symbol to the second place in the list instead of the first place, except the old position of the new symbol was 0 or 1, then it is moved to the first place. This BWCA approach achieves a compression rate of 2.30 bps for the Calgary Corpus.
Another important paper of Bernhard Balkenhol and Yuri Shtarkov from 1999 with focus on improvements of the Burrows-Wheeler Compression Algorithm (BWCA). It includes some interesting aspects for the RLE field too. An important property of the output of the BWT is the presents of many runs, which results in overestimating the probability of symbols outside the run. This problem is called "Pressure of Runs" and can be decreased by RLE. Besides that, a modification of the MTF algorithm is described which moves the next symbol to the second place in the list instead of the first place, except the old position of the new symbol was 0 or 1. If the old position of the new symbol was 1 it is moved to the first place only if the last output number was different from 0. This paper is my favourite paper from Bernhard. This BWCA approach achieves a compression rate of 2.26 bps for the Calgary Corpus.
The original MTF paper of Jon Bentley, Daniel Sleator, Robert Tarjan and Victor Wei from 1986. The paper descibes a word based MTF scheme as a data compression scheme, that exploits locality of reference. It is based on a simple heuristic for self-organizing sequential search.
Michael Burrows and David Wheeler introduce the Burrows-Wheeler Transform (BWT) and the Burrows-Wheeler Compression Algorithm (BWCA) in a research report from 1994 at the Digital Systems Research Center. The paper explains the transformation and why the output sequence compresses well. Some variants of the algorithm, such as a modified MTF, are named too. This paper is a "must" for everyone interested in BWCAs. This BWCA approach achieves a compression rate of 2.43 bps for the Calgary Corpus.
This publication of Sebastian Deorowicz in "Software-Practice and Experience" from 2002 give a quite complete overview of the post BWT stages used within the Burrows-Wheeler Compression Algorithm. Besides his own Weighted Frequency Count Algorithm and other post BWT stages, Sebastian describes briefly but clearly variantions of the Move To Front scheme, the Inversion Frequencies algorithm and the Distance Coding algorithm with the 3 properties of the newsgroup posting. This is one of my favourite BWCA papers. This BWCA approach achieves a compression rate of 2.25 bps for the Calgary Corpus.
This technical paper is one of Peter’s most important BWCA papers published in 1996. Peter describes the basic transformation by an example and several improvements of the basic algorithm, like hierarchical coding models, structured coding models and Wheeler’s Run Length Encoding for zeros (RLE0). He also gives some sort and MTF improvements and a interpretation of the BWT output sequence. This is a "must", if you are interested in the BWCA field. This BWCA approach achieves a compression rate of 2.34 bps for the Calgary Corpus.
The paper from 2003 by Peter Fenwick, Mark Titchener and Michelle Lorenz describes a cached MTF modeling scheme with a foreground (more frequent symbols) and background (less frequent symbols) model. It also introduces the so called T-entropy as a measurement for entropies after each BWCA stage.
Michael Schindler developed a sorting algoritm, which had a limited order context in order to speed up the sorting process in his paper from 1996. He also describes a MTF scheme, which works like the MTF-1 scheme of Bernhard Balkenhol, Stefan Kurtz and Yuri Shtarkov from 1999.
Bernhard studied mathematics at the University of Bielefeld and works for the mediaWays GmbH in Germany. He is interested in information theory, data compression and encryption and has published several compression articles, which are available on his internet site.
Beside 200 published articles Jon Bentley is the author of the famous programming books "Programming Pearls", which is a collection of 13 essays that were published in Communications of the ACM in the early 80s, and "More Programming Pearls". Jon Bentley received in 2000 the Dr. Dobb's "Excellence in Programming Award". The award is presented to individuals who, in the spirit of innovation and cooperation, have made significant contributions to the advancement of software development.
Peter is a Professor at the University of Auckland, New Zealand. His primary interests are in computer architecture (including arithmetic), communications and networks, and text compression. Beside many other papers Peter has published several papers for "symbol ranking" text compression and in the Burrows-Wheeler compression field with a very nice paper "Block Sorting Text Compression - Final Report".
Mark Titchener works at the Computer Science Department, University of Auckland, New Zealand. He is interested in deterministic information theory, measurement of entropy, data communications and coding and Data compression.