|
Logo
|
Title
|
Description
|
|
|
Grundlagen des Burrows-Wheeler-Kompressionsalgorithm us
|
A German article from 2003 by Jürgen Abel about the principles of the Burrows-Wheeler Compression Algorithm, written for the journal "Informatik - Forschung und Entwicklung", Springer-Verlag Heidelberg, Germany, Association for Informatics (Gesellschaft für Informatik eV.). The article describes a BWCA implementation with a RLE and WFC stage. The BWCA approach PIK03 achieves a compression rate of 2.25 bps for the Calgary Corpus.
|
|
|
Verlustlose Datenkompression auf Grundlage der Burrows-Wheeler-Transformation
|
Another German article from 2003 by Jürgen Abel about the basics of the Burrows-Wheeler Compression Algorithm, written for the journal "PIK - Praxis der Informationsverarbeitung und Kommunikation", Saur Verlag, Germany, Association for Informatics (Gesellschaft für Informatik eV.). The paper describes a simple but fast BWCA implementaion PIK03 with a MTF, RLE0 and EC stage, which has the same compression speed as GZIP and BZIP2. The BWCA approach PIK03 achieves a compression rate of 2.35 bps for the Calgary Corpus.
|
|
|
Improvements to the Burrows-Wheeler Compression Algorithm: After BWT Stages
|
A BWCA paper from 2003 by Jürgen Abel, which gives a short introduction into the BWCA field and proposes several improvements for the WFC stage and the IF stage and introduces a new RLE scheme for bypassing the run length information around the WFC stage. This paper is the basis of the program ABC. This BWCA approach achieves a compression rate of 2.24 bps for the Calgary Corpus.
|
|
|
Universal Text Preprocessing for Data Compression
|
This paper from 2003 by Jürgen Abel and Bill Teahan presents several preprocessing algorithms for textual data, which work with BWT, PPM and LZ based compression schemes. The algorithms need no external dictionary and are language independent. The average compression gain is in the range of 3 to 5 percent for the text files of the Calgary Corpus and between 2 to 9 percent for the text files of the large Canterbury Corpus.
|
|
|
Record Preprocessing for Data Compression
|
Another preprocessing paper from 2003 by Jürgen Abel. The paper reveals a preprocessing algorithm which exploits the structure of record based files. The compression rate of these files can be enhanced if the file is byte-wise transposed before compression and after decompression by the record length. The approach is able to detect files with such a structure and to determine the corresponding record length.
|
|
|
DNA sequence compression using the Burrows-Wheeler Transform
|
Don Adjeroh, Yong Zhang, Amar Mukherjee, Matt Powell and Tim Bell present in 2002 a proposal for compressing DNA sequences by using an off-line dictionary oriented approache in combination with a BWCA.
|
|
|
Text Compression with the Burrows-Wheeler Transform
|
James Allen gives a introduction of the BWT and presents a simple BWCA implementation with source code in C on his internet site from 2002. He uses two coding stages I haven't seen anywhere else in the BWCA field: Zero Running (ZR) after the MTF stage and Elias's gamma coding as the EC stage.
|
|
|
Lexical Permutation Sorting Algorithm
|
This article by Ziya Arnavut and Spyros Magliveras from 1996 describes a "Lexical Permutation Sorting Algorithm" (LPSA), which is a generalization of the Burrows-Wheeler Compression Algorithm (BWCA).
|
|
|
Block Sorting and Compression
|
This article by Ziya Arnavut and Spyros Magliveras from 1997 describes a "Lexical Permutation Sorting Algorithm" (LPSA), which is a generalization of the Burrows-Wheeler Compression Algorithm (BWCA), and a new post-BWT stage, called Inversion Frequencies (IF).
|
|
|
Move-to-Front and Inversion Coding
|
This conference proceeding paper from Ziya Arnavut at DCC 2000 compares the MTF stage with the IF stage. The IF stage, called inversion coding (ranking) technique, yields better compression gain than the recency ranking (MTF coder) for almost all the test data files.
|
|
|
Investigation of Block-Sorting of Multiset Permutations
|
Ziya and Meral Arnavut investigate in this 2004 article special cases of permutations, called multiset permutations, together with a transform different to the BWT, called Linear Ordering Transformation (LOT). LOT sorts rows of a cyclic rotated matrix with respect to the first element of each row.
|
|
|
Inversion Coder
|
A paper from 2004 by Ziya Arnavut presents and compares different inversion ranking schemes (IF) with recency ranking schemes (MTF). For large files, image and DNA files, IF achieves better compression rate than MTF schemes.
|
|
|
LIPT: A reversible lossless text transform to improve compression performance
|
In this paper from 2001 by Fauzia Awan, Nan Zhang, Nitin Motgi, Raja Iqbal and Amar Mukherjee a text pre-processing algorithm is described for PPM and BWCA, which leads to a better compression rate. The gain for BWCA is around 5% on the Calgary Corpus. The algorithm replaces words by sequneces of "*" and letters. By using an extern dictionary it is dependent on a specific language.
|
|
|
A new text preprocessing algorithm for bzip2 and PPM
|
This is the same article from 2001 than "LIPT: A reversible lossless text transform to improve compression performance" by Fauzia Awan, Nan Zhang, Nitin Motgi, Raja Iqbal and Amar Mukherjee about text pre-processing for PPM and BWCA.
|
|
|
Selective Application of Burrows-wheeler Transformation for Enhancement of JPEG Entropy Coding
|
Hyun-Kee Baik, Dong Ha, Hyun-Gyoo Yook, Sung-Chul Shin and Myong-Soon Park describe in their paper from 1999 a variant of a BWCA in order to improve jpg-image compression.
|
|
|
A New Method to Improve the Performance of JPEG Entropy Coding Using Burrows-Wheeler Transformation
|
Another paper by Hyun-Kee Baik, Hyun-Gyoo Yook, Sung-Chul Shin, Myong-Soon Park and Dong Sam Ha from 1999 about using BWCA for jpg-image compression.
|
|
|
Universal Data Compression Based on the Burrows-Wheeler Transformation: Theory and Practice
|
Bernhard Balkenhol and Stefan Kurtz describe in their article from 1998 a BWCA which uses a BWT, MTF, RLE0 and EC stage. This BWCA approach achieves a compression rate of 2.32 bps for the Calgary Corpus.
|
|
|
Modifications of the Burrows and Wheeler Data Compression Algorithm
|
This paper of Bernhard Balkenhol, Stefan Kurtz and Yuri Shtarkov from 1999 describes a modification of the MTF algorithm, which moves the next symbol to the second place in the list instead of the first place, except the old position of the new symbol was 0 or 1, then it is moved to the first place. This BWCA approach achieves a compression rate of 2.30 bps for the Calgary Corpus.
|
|
|
One attempt of a compression algorithm using the BWT
|
Another important paper of Bernhard Balkenhol and Yuri Shtarkov from 1999 with focus on improvements of the Burrows-Wheeler Compression Algorithm (BWCA). It includes some interesting aspects for the RLE field too. An important property of the output of the BWT is the presents of many runs, which results in overestimating the probability of symbols outside the run. This problem is called "Pressure of Runs" and can be decreased by RLE. Besides that, a modification of the MTF algorithm is described which moves the next symbol to the second place in the list instead of the first place, except the old position of the new symbol was 0 or 1. If the old position of the new symbol was 1 it is moved to the first place only if the last output number was different from 0. This paper is my favourite paper from Bernhard. This BWCA approach achieves a compression rate of 2.26 bps for the Calgary Corpus.
|
|
|
Tree Source Identification with the Burrows Wheeler Transform
|
Dror Baron and Yoram Bresler study in their article from 2000 the identification of a tree source model from a given sequence by using the BWT as a sorting transformation for states in the tree.
|
|
|
A locally adaptive data compression scheme
|
The original MTF paper of Jon Bentley, Daniel Sleator, Robert Tarjan and Victor Wei from 1986. The paper descibes a word based MTF scheme as a data compression scheme, that exploits locality of reference. It is based on a simple heuristic for self-organizing sequential search.
|
|
|
Distance Coding newsgroup message
|
The famous newsgroup posting from 2000 by Edgar Binder, where he describes the DC algorithm by an example together with 3 important properties of the DC output sequence.
|
|
|
Inverting the Burrows-Wheeler Transform
|
Richard Bird and Shin-Cheng Mu describe in their brief article from 2001 the inverse transformation of the BWT.
|
|
|
A Block-Sorting Lossless Data Compression Algorithm
|
Michael Burrows and David Wheeler introduce the Burrows-Wheeler Transform (BWT) and the Burrows-Wheeler Compression Algorithm (BWCA) in a research report from 1994 at the Digital Systems Research Center. The paper explains the transformation and why the output sequence compresses well. Some variants of the algorithm, such as a modified MTF, are named too. This paper is a "must" for everyone interested in BWCAs. This BWCA approach achieves a compression rate of 2.43 bps for the Calgary Corpus.
|
|
|
On variants of block-sorting compression using context from both the left and right
|
Michael Burrows and Li Zhang describe a variation of BWT, which sorts contexts from left and right. The problem is, that the output of the new transfrom is not unambiguously, meaning that different input strings are transformed to the same output string. In order to unambiguously decode the output, one has to add log n-bits to the output.
|
|
|
BWT, a transformation algorithm
|
A brief article by Anturo San Emeterio Campos from 1999, describing the basics of the BWT.
|
|
|
Higher Compression from the Burrows-Wheeler Transform by Modified Sorting
|
Brenton Chapin and Stephen Tate show in their article from 1998, that the sorting order of the alphabet has a significant impact on the compression rate of a BWCA. They present a heuristic manuel alphabet permutation and a calculated alphabet permutation. The prsented calculated alphabet permutation does not improve the compression rate in all cases. For text files their heuristic manuel alphabet permutation achieves better results.
|
|
|
Switching Between Two On-Line List Update Algorithms for Higher Compression of Burrows-Wheeler Transformed Data
|
In his article from 2000 Brenton Chapin describes a switching scheme between two list update algorithms for a BWCA, a MTF scheme and a so called "Best x of 2x - 1" algorithm. This BWCA approach achieves a compression rate of 2.33 bps for the Calgary Corpus.
|
|
|
Higher Compression from the Burrows-Wheeler Transform with new Algorithms for the List Update Problem
|
The doctoral thesis of Brenton Chapin describes not only different LUAs but also some preprocessing schemes like alphabet reordering and variations of the BWT (Gray code sort) in order to achieve better compression rates.
|
|
|
The Burrows-Wheeler block sorting algorithm (long)
|
A short introduction into the BWT can be found at the comp.compression FAQ from 1999. The internet site has a little example and describes briefly the BWCA.
|
|
|
An analysis of the second step algorithms in the Burrows-Wheeler compression algorithm
|
A quite similar publication from 2000 to the article "Second step algorithms in the Burrows-Wheeler compression algorithm" from Sebastian Deorowicz, describing his Weighted Frequency Count Algorithm. This BWCA approach achieves a compression rate of 2.25 bps for the Calgary Corpus.
|
|
|
Improvements to Burrows-Wheeler Compression Algorithm
|
Sebastian Deorowicz describes in his article from 2000 a BWCA, based on a modified MTF and RLE0 stage. He also uses an adapted EC stage. This BWCA approach achieves a compression rate of 2.27 bps for the Calgary Corpus.
|
|
|
Second step algorithms in the Burrows-Wheeler compression algorithm
|
This publication of Sebastian Deorowicz in "Software-Practice and Experience" from 2002 give a quite complete overview of the post BWT stages used within the Burrows-Wheeler Compression Algorithm. Besides his own Weighted Frequency Count Algorithm and other post BWT stages, Sebastian describes briefly but clearly variantions of the Move To Front scheme, the Inversion Frequencies algorithm and the Distance Coding algorithm with the 3 properties of the newsgroup posting. This is one of my favourite BWCA papers. This BWCA approach achieves a compression rate of 2.25 bps for the Calgary Corpus.
|
|
|
Universal lossless source coding with the Burrows Wheeler transform
|
The work from 1999 of Michelle Effros, Karthik Visweswariah, Sanjeev Kulkarni, Sergio Verdú gives a theoretical evaluation of BWT-based coding. It prooves that BWT-based lossless source codes achieve universal lossless coding performance that converges to the optimal coding performance more quickly than the rate of convergence observed in LZ-style codes.
|
|
|
Textkomprimierung - Multimedia-Seminar
|
Stefan Fabrewitz describes in his German seminar paper from 2000 several compression methods for text compression. He also has a brief chapter about the BWT.
|
|
|
Experiments with a Block Sorting Text Compression Algorithm
|
In his article form 1995 Peter Fenwick desribes several variations of BWCAs, One variation is a modified MTF scheme, which does not allways move the current symbol to the very front of the list but a little bit behind (index greater than 0). These appraoches achieved normaly a reduction in compression performance. He also describes some sorting improvements. This BWCA approach achieves a compression rate of 2.43 bps for the Calgary Corpus.
|
|
|
Improvements to the Block Sorting Text Compression Algorithm
|
Peter Fenwick introdues 1995 some BWCA improvements and variations like sort improvements, LZ77 preprocessing, a delta model and the direct coding of the BWT output. This BWCA approach achieves a compression rate of 2.36 bps for the Calgary Corpus.
|
|
|
Block Sorting Text Compression - Final Report
|
This technical paper is one of Peter’s most important BWCA papers published in 1996. Peter describes the basic transformation by an example and several improvements of the basic algorithm, like hierarchical coding models, structured coding models and Wheeler’s Run Length Encoding for zeros (RLE0). He also gives some sort and MTF improvements and a interpretation of the BWT output sequence. This is a "must", if you are interested in the BWCA field. This BWCA approach achieves a compression rate of 2.34 bps for the Calgary Corpus.
|
|
|
Block Sorting Text Compression / Proceedings of the 19th Australian Computer Science Conference 1996
|
This paper is similar to Peter’s "Block Sorting Text Compression - Final Report" and shows about the same several variants and improvements to BWCAs. This BWCA approach achieves a compression rate of 2.34 bps for the Calgary Corpus.
|
|
|
Burrows Wheeler - Alternatives to Move to Front
|
The paper from 2003 by Peter Fenwick, Mark Titchener and Michelle Lorenz describes a cached MTF modeling scheme with a foreground (more frequent symbols) and background (less frequent symbols) model. It also introduces the so called T-entropy as a measurement for entropies after each BWCA stage.
|
|
|
Burrows-Wheeler Compression with Variable Length Integer Codes
|
Peter describes in his paper from 2002 an alternative to arithmetic or Huffman coding at the last stage of a BWCA by using variable length integer codes (Elias Gamma and Fraenkel-Klein Fibonacci codes). Furthermore a sticky MTF stage is revealed, which move a accessed symbol back to 40% where it come from except if the symbol is accessed twice at once.
|
|
|
An experimental study of an opportunistic index
|
In the article from 2001 Paolo Ferragina and Giovanni Manzini combine the BWCA with with the suffix array data structure in order to support data indexing.
|
|
|
Lossless, Reversible Transformations that Improve Text Compression Ratios
|
In their paper form 2000 Robert Franceschini, Holger Kruse, Nan Zhang, Raja Iqbal and Amar Mukherjee desribe 3 text preprocessing techniques for BWCAs and PPM: LPT, RLPT and SCLPT. The achieve a compression gain of 7.1% over BZIP. The disadvantages of their method are, that they are language dependent and that they need an extern dictionary.
|
|
|
Compression, part 3: Burrows-Wheeler transform
|
A short introduction into the BWT with some examples is presented by Ole Friis on his internet site from 2001.
|
|
|
Burrows-Wheeler Transform in image compression
|
Markus Gärtner and David Havelin present on their internet site from 2000 a project for compression of images by using a transformation of the image, like DCT or DWT, before using the BWCA. They also plan to adept the BWT in order to process 2-dimensional data.
|
|
|
A space-economic suffix tree construction algorithm (Powerpoint Slides)
|
This powerpoint slide from 1997 by Robert Giegerich and Stefan Kurtz explains the suffix tree construction algorithm of McCreight from 1976.
|
|
|
Efficient Implementation of Lazy Suffix Trees
|
Robert Giegerich, Stefan Kurtz and Jens Stoye present 1999 time and space efficient suffix sorting algorithms.
|
|
|