Logo

Title

Description


Grundlagen des BurrowsWheelerKompressionsalgorithm us

A German article from 2003 by Jürgen Abel about the principles of the BurrowsWheeler Compression Algorithm, written for the journal "Informatik  Forschung und Entwicklung", SpringerVerlag Heidelberg, Germany, Association for Informatics (Gesellschaft für Informatik eV.). The article describes a BWCA implementation with a RLE and WFC stage. The BWCA approach PIK03 achieves a compression rate of 2.25 bps for the Calgary Corpus.


Verlustlose Datenkompression auf Grundlage der BurrowsWheelerTransformation

Another German article from 2003 by Jürgen Abel about the basics of the BurrowsWheeler Compression Algorithm, written for the journal "PIK  Praxis der Informationsverarbeitung und Kommunikation", Saur Verlag, Germany, Association for Informatics (Gesellschaft für Informatik eV.). The paper describes a simple but fast BWCA implementaion PIK03 with a MTF, RLE0 and EC stage, which has the same compression speed as GZIP and BZIP2. The BWCA approach PIK03 achieves a compression rate of 2.35 bps for the Calgary Corpus.


Improvements to the BurrowsWheeler Compression Algorithm: After BWT Stages

A BWCA paper from 2003 by Jürgen Abel, which gives a short introduction into the BWCA field and proposes several improvements for the WFC stage and the IF stage and introduces a new RLE scheme for bypassing the run length information around the WFC stage. This paper is the basis of the program ABC. This BWCA approach achieves a compression rate of 2.24 bps for the Calgary Corpus.


Universal Text Preprocessing for Data Compression

This paper from 2003 by Jürgen Abel and Bill Teahan presents several preprocessing algorithms for textual data, which work with BWT, PPM and LZ based compression schemes. The algorithms need no external dictionary and are language independent. The average compression gain is in the range of 3 to 5 percent for the text files of the Calgary Corpus and between 2 to 9 percent for the text files of the large Canterbury Corpus.


Record Preprocessing for Data Compression

Another preprocessing paper from 2003 by Jürgen Abel. The paper reveals a preprocessing algorithm which exploits the structure of record based files. The compression rate of these files can be enhanced if the file is bytewise transposed before compression and after decompression by the record length. The approach is able to detect files with such a structure and to determine the corresponding record length.


Post BWT stages of the BurrowsWheeler compression algorithm

A paper from 2010 by Jürgen Abel describing several post BWT stages including a new context based approach. The article includes comparisons of compression rates and compression rtimes of the different algorithms. Software: Practice and Experience, 40(9), 751777, 2010.


DNA sequence compression using the BurrowsWheeler Transform

Don Adjeroh, Yong Zhang, Amar Mukherjee, Matt Powell and Tim Bell present in 2002 a proposal for compressing DNA sequences by using an offline dictionary oriented approache in combination with a BWCA.


On Compressibility of Protein Sequences

Don Adjeroh and Fei Nan wrote a paper about the compression of protein sequences at the DCC 2006. The algorithm used is based on a BWT with a dictionary stage following.


Text Compression with the BurrowsWheeler Transform

James Allen gives a introduction of the BWT and presents a simple BWCA implementation with source code in C on his internet site from 2002. He uses two coding stages I haven't seen anywhere else in the BWCA field: Zero Running (ZR) after the MTF stage and Elias's gamma coding as the EC stage.


Lexical Permutation Sorting Algorithm

This article by Ziya Arnavut and Spyros Magliveras from 1996 describes a "Lexical Permutation Sorting Algorithm" (LPSA), which is a generalization of the BurrowsWheeler Compression Algorithm (BWCA).


Block Sorting and Compression

This article by Ziya Arnavut and Spyros Magliveras from 1997 describes a "Lexical Permutation Sorting Algorithm" (LPSA), which is a generalization of the BurrowsWheeler Compression Algorithm (BWCA), and a new postBWT stage, called Inversion Frequencies (IF).


MovetoFront and Inversion Coding

This conference proceeding paper from Ziya Arnavut at DCC 2000 compares the MTF stage with the IF stage. The IF stage, called inversion coding (ranking) technique, yields better compression gain than the recency ranking (MTF coder) for almost all the test data files.


Investigation of BlockSorting of Multiset Permutations

Ziya and Meral Arnavut investigate in this 2004 article special cases of permutations, called multiset permutations, together with a transform different to the BWT, called Linear Ordering Transformation (LOT). LOT sorts rows of a cyclic rotated matrix with respect to the first element of each row.


Inversion Coder

A paper from 2004 by Ziya Arnavut presents and compares different inversion ranking schemes (IF) with recency ranking schemes (MTF). For large files, image and DNA files, IF achieves better compression rate than MTF schemes.


LIPT: A reversible lossless text transform to improve compression performance

In this paper from 2001 by Fauzia Awan, Nan Zhang, Nitin Motgi, Raja Iqbal and Amar Mukherjee a text preprocessing algorithm is described for PPM and BWCA, which leads to a better compression rate. The gain for BWCA is around 5% on the Calgary Corpus. The algorithm replaces words by sequneces of "*" and letters. By using an extern dictionary it is dependent on a specific language.


A new text preprocessing algorithm for bzip2 and PPM

This is the same article from 2001 than "LIPT: A reversible lossless text transform to improve compression performance" by Fauzia Awan, Nan Zhang, Nitin Motgi, Raja Iqbal and Amar Mukherjee about text preprocessing for PPM and BWCA.


Selective Application of Burrowswheeler Transformation for Enhancement of JPEG Entropy Coding

HyunKee Baik, Dong Ha, HyunGyoo Yook, SungChul Shin and MyongSoon Park describe in their paper from 1999 a variant of a BWCA in order to improve jpgimage compression.


A New Method to Improve the Performance of JPEG Entropy Coding Using BurrowsWheeler Transformation

Another paper by HyunKee Baik, HyunGyoo Yook, SungChul Shin, MyongSoon Park and Dong Sam Ha from 1999 about using BWCA for jpgimage compression.


Universal Data Compression Based on the BurrowsWheeler Transformation: Theory and Practice

Bernhard Balkenhol and Stefan Kurtz describe in their article from 1998 a BWCA which uses a BWT, MTF, RLE0 and EC stage. This BWCA approach achieves a compression rate of 2.32 bps for the Calgary Corpus.


Modifications of the Burrows and Wheeler Data Compression Algorithm

This paper of Bernhard Balkenhol, Stefan Kurtz and Yuri Shtarkov from 1999 describes a modification of the MTF algorithm, which moves the next symbol to the second place in the list instead of the first place, except the old position of the new symbol was 0 or 1, then it is moved to the first place. This BWCA approach achieves a compression rate of 2.30 bps for the Calgary Corpus.


One attempt of a compression algorithm using the BWT

Another important paper of Bernhard Balkenhol and Yuri Shtarkov from 1999 with focus on improvements of the BurrowsWheeler Compression Algorithm (BWCA). It includes some interesting aspects for the RLE field too. An important property of the output of the BWT is the presents of many runs, which results in overestimating the probability of symbols outside the run. This problem is called "Pressure of Runs" and can be decreased by RLE. Besides that, a modification of the MTF algorithm is described which moves the next symbol to the second place in the list instead of the first place, except the old position of the new symbol was 0 or 1. If the old position of the new symbol was 1 it is moved to the first place only if the last output number was different from 0. This paper is my favourite paper from Bernhard. This BWCA approach achieves a compression rate of 2.26 bps for the Calgary Corpus.


Tree Source Identification with the Burrows Wheeler Transform

Dror Baron and Yoram Bresler study in their article from 2000 the identification of a tree source model from a given sequence by using the BWT as a sorting transformation for states in the tree.


A locally adaptive data compression scheme

The original MTF paper of Jon Bentley, Daniel Sleator, Robert Tarjan and Victor Wei from 1986. The paper descibes a word based MTF scheme as a data compression scheme, that exploits locality of reference. It is based on a simple heuristic for selforganizing sequential search.


Distance Coding newsgroup message

The famous newsgroup posting from 2000 by Edgar Binder, where he describes the DC algorithm by an example together with 3 important properties of the DC output sequence.


Inverting the BurrowsWheeler Transform

Richard Bird and ShinCheng Mu describe in their brief article from 2001 the inverse transformation of the BWT.


A BlockSorting Lossless Data Compression Algorithm

Michael Burrows and David Wheeler introduce the BurrowsWheeler Transform (BWT) and the BurrowsWheeler Compression Algorithm (BWCA) in a research report from 1994 at the Digital Systems Research Center. The paper explains the transformation and why the output sequence compresses well. Some variants of the algorithm, such as a modified MTF, are named too. This paper is a "must" for everyone interested in BWCAs. This BWCA approach achieves a compression rate of 2.43 bps for the Calgary Corpus.


On variants of blocksorting compression using context from both the left and right

Michael Burrows and Li Zhang describe a variation of BWT, which sorts contexts from left and right. The problem is, that the output of the new transfrom is not unambiguously, meaning that different input strings are transformed to the same output string. In order to unambiguously decode the output, one has to add log nbits to the output.


BWT, a transformation algorithm

A brief article by Anturo San Emeterio Campos from 1999, describing the basics of the BWT.


Higher Compression from the BurrowsWheeler Transform by Modified Sorting

Brenton Chapin and Stephen Tate show in their article from 1998, that the sorting order of the alphabet has a significant impact on the compression rate of a BWCA. They present a heuristic manuel alphabet permutation and a calculated alphabet permutation. The prsented calculated alphabet permutation does not improve the compression rate in all cases. For text files their heuristic manuel alphabet permutation achieves better results.


Switching Between Two OnLine List Update Algorithms for Higher Compression of BurrowsWheeler Transformed Data

In his article from 2000 Brenton Chapin describes a switching scheme between two list update algorithms for a BWCA, a MTF scheme and a so called "Best x of 2x  1" algorithm. This BWCA approach achieves a compression rate of 2.33 bps for the Calgary Corpus.


Higher Compression from the BurrowsWheeler Transform with new Algorithms for the List Update Problem

The doctoral thesis of Brenton Chapin describes not only different LUAs but also some preprocessing schemes like alphabet reordering and variations of the BWT (Gray code sort) in order to achieve better compression rates.


The BurrowsWheeler block sorting algorithm (long)

A short introduction into the BWT can be found at the comp.compression FAQ from 1999. The internet site has a little example and describes briefly the BWCA.


An analysis of the second step algorithms in the BurrowsWheeler compression algorithm

A quite similar publication from 2000 to the article "Second step algorithms in the BurrowsWheeler compression algorithm" from Sebastian Deorowicz, describing his Weighted Frequency Count Algorithm. This BWCA approach achieves a compression rate of 2.25 bps for the Calgary Corpus.


Improvements to BurrowsWheeler Compression Algorithm

Sebastian Deorowicz describes in his article from 2000 a BWCA, based on a modified MTF and RLE0 stage. He also uses an adapted EC stage. This BWCA approach achieves a compression rate of 2.27 bps for the Calgary Corpus.


Second step algorithms in the BurrowsWheeler compression algorithm

This publication of Sebastian Deorowicz in "SoftwarePractice and Experience" from 2002 give a quite complete overview of the post BWT stages used within the BurrowsWheeler Compression Algorithm. Besides his own Weighted Frequency Count Algorithm and other post BWT stages, Sebastian describes briefly but clearly variantions of the Move To Front scheme, the Inversion Frequencies algorithm and the Distance Coding algorithm with the 3 properties of the newsgroup posting. This is one of my favourite BWCA papers. This BWCA approach achieves a compression rate of 2.25 bps for the Calgary Corpus.


Universal lossless source coding with the Burrows Wheeler transform

The work from 1999 of Michelle Effros, Karthik Visweswariah, Sanjeev Kulkarni, Sergio Verdú gives a theoretical evaluation of BWTbased coding. It prooves that BWTbased lossless source codes achieve universal lossless coding performance that converges to the optimal coding performance more quickly than the rate of convergence observed in LZstyle codes.


Textkomprimierung  MultimediaSeminar

Stefan Fabrewitz describes in his German seminar paper from 2000 several compression methods for text compression. He also has a brief chapter about the BWT.


Experiments with a Block Sorting Text Compression Algorithm

In his article form 1995 Peter Fenwick desribes several variations of BWCAs, One variation is a modified MTF scheme, which does not allways move the current symbol to the very front of the list but a little bit behind (index greater than 0). These appraoches achieved normaly a reduction in compression performance. He also describes some sorting improvements. This BWCA approach achieves a compression rate of 2.43 bps for the Calgary Corpus.


Improvements to the Block Sorting Text Compression Algorithm

Peter Fenwick introdues 1995 some BWCA improvements and variations like sort improvements, LZ77 preprocessing, a delta model and the direct coding of the BWT output. This BWCA approach achieves a compression rate of 2.36 bps for the Calgary Corpus.


Block Sorting Text Compression  Final Report

This technical paper is one of Peter’s most important BWCA papers published in 1996. Peter describes the basic transformation by an example and several improvements of the basic algorithm, like hierarchical coding models, structured coding models and Wheeler’s Run Length Encoding for zeros (RLE0). He also gives some sort and MTF improvements and a interpretation of the BWT output sequence. This is a "must", if you are interested in the BWCA field. This BWCA approach achieves a compression rate of 2.34 bps for the Calgary Corpus.


Block Sorting Text Compression / Proceedings of the 19th Australian Computer Science Conference 1996

This paper is similar to Peter’s "Block Sorting Text Compression  Final Report" and shows about the same several variants and improvements to BWCAs. This BWCA approach achieves a compression rate of 2.34 bps for the Calgary Corpus.


Burrows Wheeler  Alternatives to Move to Front

The paper from 2003 by Peter Fenwick, Mark Titchener and Michelle Lorenz describes a cached MTF modeling scheme with a foreground (more frequent symbols) and background (less frequent symbols) model. It also introduces the so called Tentropy as a measurement for entropies after each BWCA stage.


BurrowsWheeler Compression with Variable Length Integer Codes

Peter describes in his paper from 2002 an alternative to arithmetic or Huffman coding at the last stage of a BWCA by using variable length integer codes (Elias Gamma and FraenkelKlein Fibonacci codes). Furthermore a sticky MTF stage is revealed, which move a accessed symbol back to 40% where it come from except if the symbol is accessed twice at once.


An experimental study of an opportunistic index

In the article from 2001 Paolo Ferragina and Giovanni Manzini combine the BWCA with with the suffix array data structure in order to support data indexing.


Lossless, Reversible Transformations that Improve Text Compression Ratios

In their paper form 2000 Robert Franceschini, Holger Kruse, Nan Zhang, Raja Iqbal and Amar Mukherjee desribe 3 text preprocessing techniques for BWCAs and PPM: LPT, RLPT and SCLPT. The achieve a compression gain of 7.1% over BZIP. The disadvantages of their method are, that they are language dependent and that they need an extern dictionary.


Compression, part 3: BurrowsWheeler transform

A short introduction into the BWT with some examples is presented by Ole Friis on his internet site from 2001.


BurrowsWheeler Transform in image compression

Markus Gärtner and David Havelin present on their internet site from 2000 a project for compression of images by using a transformation of the image, like DCT or DWT, before using the BWCA. They also plan to adept the BWT in order to process 2dimensional data.


A spaceeconomic suffix tree construction algorithm (Powerpoint Slides)

This powerpoint slide from 1997 by Robert Giegerich and Stefan Kurtz explains the suffix tree construction algorithm of McCreight from 1976.


Efficient Implementation of Lazy Suffix Trees

Robert Giegerich, Stefan Kurtz and Jens Stoye present 1999 time and space efficient suffix sorting algorithms.


Text Preprocessing for BurrowsWheeler BlockSorting Compression

This paper from 1999 by Szymon Grabowski is an interesting text preprocessing paper for BWCAs. It describes capital conversion, space stuffing, phrase substitution, alphabet reordering and EOL coding. A drawback is the dependence on the English language.


Parsing Strategies for BWT Compression

Yugo Isal and Alistair Moffat descrribe in their paper from 2001 several parsing preprocessing schemes for BWCAs. They use word and ngram techniques with implicit and explicit dictionaries. In all cases the dictionary information is coded and transmitted too, so their techniques remain language independent.


Enhanced WordBased BlockSorting Text Compression

This paper from 2002 by Yugo Isal, Alistair Moffat and Alwin Ngai explains word based preprocessing schemes and a ranking scheme for the GST, called DSEG, which uses search trees to calculate proper ranking values.


An Efficient Method for Construction of Suffix Arrays

This paper from 1999 about suffix sorting is an important BWT paper by Hideo Itoh and Hozumi Tanaka. It descibes the twostage suffix sort, which is a very fast sorting algorithm for suffices. It can be used in order to calculate the BWT very efficiently.


Simple Linear Work Suffix Array Construction

A very interesting paper from Juha Kärkkäinen and Peter Sanders from 2003 at the MaxPlanckInstitut für Informatik about a simple suffix sorting in linear time. It sorts suffices beginning at positions (i mod 3 <> 0) recursivly and the remaining suffices and merges them at the end.


Improving SuffixArray Construction Algorithms with Applications

One of my favourite BWT sorting articles from 2001 by TsaiHsing Kao as a Masters thesis. He describes an improved version of the twostage suffix sort from Hideo Itoh and Hozumi Tanaka by going through the sorting array both from left to right and from right to left. The thesis also explains related algorithms like Sadakane’s sorting.


Preprocessing Text to Improve Compression Ratios

This paper from 1997 by Holger Kruse and Amar Mukherjee describes a text preprocessing techniques for BWCAs, called star encoding. The presented method is language dependent and need an extern dictionary.


Improving Text Compression Ratios with the BurrowsWheeler Transform

Holger Kruse and Amar Mukherjee describe in their paper from 1999 three text preprocessing schemes: alphabet reordering, bigram encoding and dictionary encoding. The presented method is language dependent and need an extern dictionary.


Reducing the Space Requirement of Suffix Trees

Stefan Kurtz introduces in his article from 1998 a suffix tree algorithm, which needs only about half as much space (10  20 bytes per input symbol) than previous algorithms.


Space Efficient Linear Time Computation of the Burrows and WheelerTransformation

Stefan Kurtz and Bernhard Balkenhol present in their paper from 1999 a suffix tree algorithm for BWT, which needs few space and is linear in time.


Extended application of suffix trees to data compression

A method for constructing a suffix tree in linear time within a sliding window is presented by Jesper Larsson in 1998. It has some relations to LZ, PPM* and ACB by George Buynovksy.


The Context Trees of Block Sorting Compression

Jesper Larsson describes in his paper from 1998 a method, which calculates a suffix tree, prunes the tree, encodes the tree and uses the transmitted tree information for a GST stage inside the BWCA. This BWCA approach achieves a compression rate of 2.65 bps for the Calgary Corpus.


Notes On Suffix Sorting

Jesper Larsson reveals in his paper from 1998 some improvements over Sadakane’s suffix sort algorithm, which can be used to calculate the BWT. He also gives some theoretical time complexity estimates of the algorithm.


Faster Suffix Sorting

Jesper Larsson and Kunihiko Sadakane describe a fast algorithm for sorting suffix array in their paper from 1999, which can be used to calculate the BWT. This is one of my favourite articles about suffix sorting for BWT.


Structures of String Matching and Data Compression

The doctoral thesis of Jesper Larsson from 1999 about suffix trees, suffix arrays, context trees, BWT and their relations. Important fpr the BWT area is the chapter about a fast construction of a suffix array, where Jesper Larsson reveils several algorithm refinements together with examples of source code. This work is highly recomended for calculation of the BWT.


Datenkompression (German Script)

Maciej Liskiewicz and Henning Fernau describe in their German paper from 1999 several compression techniques both for lossless and for lossy compression. The paper contains a very brief chapter about BWT.


A Run Length Encoding Scheme For Block Sort Transformed Data

Michael Maniscalco describes in 2000 a RLE algorithm for usage within the BurrowsWheeler Compression Allgorithm, which is based on a fixed length threshold run and a symbol run which is about half as long as the original run.


A Second Modified Run Length Encoding Scheme For Block Sort Transformed Data

This RLE algorithm from 2001 by Michael Maniscalco is based on a variable length threshold run, which defines the length of the binary representation of the threshold run and a mantissa part, which is stored in a separate data buffer.


An Analysis of the BurrowsWheeler Transform

A theoretical paper from 1999 by Giovanni Manzini, which analyses the properties of a BWCA with a MTF stage and a BWCA with a MTF and RLE0 stage.


The BurrowsWheeler Transform: Theory and Practice

Another paper from 1999 by Giovanni Manzini, which analyses the BWCA and several variations like some implementations of Schindler and Sadakane.


Engineering a Lightweight Suffix Array Construction Algorithm

Giovanni Manzini and Paolo Ferragina introduce in their paper from 2002 an improved suffix array construction algorithm called DeepShallow Suffix Sorting and based on the Copy algorithm from Seward. The algorithm is able to calculate the BWT very fast.


Algorithms for Text and Image Compression

This paper from 1999 by Amar Mukherjee, Holger Kruse and Kunal Mukherjee describe algorithms for text and for image compression. The text compression algorithm is a BWCA with text preprocessing based on the Star (*) Encoding technique. The presented method is language dependent and need an extern dictionary.


Data Compression with the BurrowsWheeler Transform

Mark explain in his Dr. Dobbs Journal article the basics of the BWT by some examples and presents a reference implementation of a BWCA. He also offers free source code for his BWCA. This article his high recommendable for BWT beginners.


Protein is incompressible

The paper which introduced the 4 protein files by Craig NevillManning and Ian Witten from the DCC 1999. They state that protein files are not very high compressable with PPM like algorithms and introduce a special protein compression program called CP. Further investigation will have to find out, how much possibilities have BWCAs in the field of protein compression.


Comparison among Suffix Array Construction Algorithms

Kunihiko Sadakane compares different algorithms for constructing suffix arrays and introduces his new fast algorithm, which is effective if the length of repeated strings inside the text is large. This paper from 1997 is a basic paper about calculating the BWT by suffix sorting.


Improvements of Speed and Performance of Data Compression Based on Dictionary and Context Similarity

The masters thesis of Kunihiko Sadakane from 1997 introduces the new compression algorithm Recency Rank Code with Context (RRC), which is related to context similarity schemes and BWCAs. RRC achieves a compression rate of 2.48 bps for the Calgary Corpus.


Text Compression using Recency Rank with Context and Relation to Context Sorting, Block Sorting and PPM*

Kunihiko Sadakane describes in his paper from 1997, which is similar to his masters thesis, the compression algorithm RRC, which is related to blocksorting and context similarity schemes.


A Fast Algorithm for Making Suffix Arrays and for BurrowsWheeler Transformation

This article by Kunihiko Sadakane from 1998 is very similar to his paper "Comparison among Suffix Array Construction Algorithms", and compares different algorithms for constructing suffix arrays. It explains his fast algorithm for constructing suffix arrays.


On Optimality of Variants of the Block Sorting Compression

Kunihiko Sadakane proposes in this overview from 1998 two online algorithms based on blocksorting. In order to use the first algorithm, one has to know the order of the source, the second algorithm divides the input into pieces with same contexts and codes the pieces with seperated models.


A Modified BurrowsWheeler Transformation for Caseinsensitive Search with Application to Suffix Array Compression

This paper from 1999 by Kunihiko Sadakane introduces a BWT variant in order to support caseinsensitive searches inside the text.


Unifying Text Search And Compression Suffix Sorting, Block Sorting and Suffix Arrays

The doctoral thesis of Kunihiko Sadakane from 2000 proposes a unified method for improving efficiency of search and compression for huge text data. Besides many other aspects this paper includes the best and most extensive explanation of his famous suffix sorting algorithm. Therefore I highly recommend this paper for informations about the fast algorithm of Sadakane for constructing suffix arrays.


Constructing Suffix Arrays of Large Texts

Kunihiko Sadakane and Hiroshi Imai describe in their brief paper from 1998 the algorithm of Sadakane for constructing suffix arrays.


A Fast Blocksorting Algorithm for lossless Data Compression

Michael Schindler proposes in his article from 1996 a variant of the BWT. The execution speed of his approach does not depend on the used blocksize.


Two New Families of List Update Algorithms

Frank Schulz presents in his paper from 1998 two list update algorithm families for the online accessing problem: SortByRank and SortByDelay.


On the performance of BWT sorting algorithms

Julian Seward proposes in his paper from 2000 several very fast BWT algorithm, called COPY and CACHE, which are used in his fast BWCA implementation BZIP2. This article is very recommendable for fast calculation of the BWT by suffix arrays.


Spacetime tradeoffs in the Inverse BW Transform

Several fast variants of calculating the inverse BWT are discussed in the article from 2001 by Julian Seward. This article is very recommendable for inverse BWT calculation.


Packen wie noch nie

A German article from the ct magazine 2000 about a basic BWCA implementation by Michael Tamm.


The Entropy of English using PPMBased Models

William Teahan and John Cleary descibe in their paper from 1996 the entropy of English text. For BWCAs this paper is interesting too, since it proposes bigram based text preprocessing, which can be used in the BWCA field too in order to improve the compression ratio.


Models of English Text

This article from 1997 by William Teahan and John Cleary descibe models of English text. The word based model can be used as a text preprocessing approach in the BWCA field in order to improve the compression ratio.


Modelling English text

In his doctoral thesis Bill describes properties of the English language and adaptive models of the English language for PPM based compression. The PPM* algorithm is described too. For the BWCA field, the word based model can be used as a text preprocessing approach in order to improve the compression ratio.


Switching Between Two Universal Source Coding Algorithms

Paul Volf and Frans Willems present in their paper the basics of the switching method, which switches between two sequential compression algorithms in order to get a better compression rate than the rate of the algorithms seperatly. Three algorithms are proposed, the switching algorithm, which stores all states, the Snake algorithm, which stores two stages, and the reduced complexity algorithm, a compromise which stores a fixed amount of stages. The switching method can be applied in many cases, for example with PPM variants but also for some GST stages like MTF and WFC within BWCAs. This switching approach achieves a compression rate of 2.14 bps for the Calgary Corpus.


The Switching Method: Elaborations

Paul Volf and Frans Willems extend the switching method by combining CTW with PPM variations in 1998. This switching approach achieves a compression rate of 2.12 bps for the Calgary Corpus.


An implementation of block coding

This paper from 1995 by David Wheeler is a describtion of the program BRED, a BWCA implemenation of David Wheeler using Huffman coding as the EC stage. A primed BWCA approach of BRED achieves a compression rate of 2.29 bps for the Calgary Corpus.


Upgrading bred with multiple tables

In his paper from 1997 David Wheeler describes improvents to his program BRED by using multiple tables for adaption to changing statistics. This BWCA approach of BRED achieves a compression rate of 2.36 bps for the Calgary Corpus.


Can we do without ranks in Burrows Wheeler transform compression?

Anthony Ian Wirth and Alistair Moffat discuss in 2001 direct symbol encoding instead of ranks from the GST stage. They use a hierarchical model similar to the one from Balkenhol and Shtarkov. This BWCA approach of BRED achieves a compression rate of 2.312 bps for the Calgary Corpus.


Symboldriven compression of Burrows Wheeler transformed text

Anthony Ian Wirth wrote a masters thesis in 2001 about direct symbol encoding after the BWT stage. He describes a model for encoding similar to PPM. This BWCA approach of BRED achieves a compression rate of 2.312 bps for the Calgary Corpus.


An Efficient Method For Compressing Test Data

This paper of Takahiro Yamaguchi, Marco Tilgner, Masahiro Ishida and Dong Ha from 1997 proposes a method for compressing test data by using the BWT and a RLE stage.


Approximate Pattern Matching Over the BurrowsWheeler Transformed Text

Nan Zhang, Amar Mukherjee, Don Adjeroh and Tim Bell discuss approximate pattern matching on the output of the BWT stage in their paper from 2003.
