www.data-compression.info The Data Compression Resource on the Internet

Context Tree Weighting (CTW)

The CTW-algorithm is a binary technique, invented by Frans Willems, Yuri Shtarkov and Tjalling Tjalkens, which was later extended for byte processing. The main idea is to weight the probabilities of several branches inside a tree in order to get a good probability estimation for the next symbol. The algorithm is optimal in the sense that it achieves the Rissanen (1984) lower bound.

ATTENTION: In contrast to PPM or BWCA, the CTW-method is patented!!! The patents are assigned to Koninklijke KPN NV. More information about the Intellectual Property Rights can be found at http://www.ele.tue.nl/ctw/ipr.html. Therefore, if you plan to develop compression algorithm based on the CTW-method for commercial utilization, get in contact with KPN NV, or maybe use a patentfree algorithm.

This paper from the DCC 1997 by Jan Aberg and Yuri Shtarkov describes different modifications of the context tree weighting algorithm. The combination of CTW with PPM is studied and a gain of 0.091 bps on the Calgary Corpus compared with PPMD achieved.

The paper by Ron Begleiter, Ran El-Yaniv and Golan Yona from 2004 compares several lossless compression algorithms, including LZ, PPM, CTW and PST on the example of biological, textual and musical data.

This paper by Ron Begleiter and Ran El-Yaniv from 2006 presents theoretical and experimental results for hierarchical applications of a state-of-the-art CTW variant (decomposed CTW).

Kunihiko Sadakane, Takumi Okazaki and Hiroshi Imai extend in their article from 2000 the CTW-method for processing multi-alphabet sequences. Furthermore, a method to optimize a parameter for binary alphabets is reveiled. This CTW approach achieves a compression rate of 2.27 bps for the Calgary Corpus.

An arithmetic coding procedure adapted for the CTW-method is presenmted by Tjalling Tjalkens and Frans Willems in thier paper from 1997. Instead of an estimated probability Pe(s) and a weighted probability Pw(s) the logarithm of the ratio Pe(s)/Pw(os)Pw(1s) is stored in each node s of the context tree, which leads to a considerable storage reduction.

Tjalling Tjalkens, Paul Volf and Frans Willems extend the CTW-method for compressing ASCII- and Byte-files. The ideas that make this possible, i.e. binary decomposition, zero-redundancy estimator, and weighting only at symbol-boundaries, were presented in the DCC paper from 1997.

In his paper from 1994 Frans Willems modifies the basic binary CTW-method such that the past symbols are not needed by the encoder and the decoder. He describes how to make the context-tree depth D infinite, which results in optimal redundancy behavior for all tree sources, while the number of records in the context tree is not larger than 2T-1, where T is the length of the source sequence. For this extended context-tree weighting algorithm he shows that with probability one the compression ratio is not larger than the source entropy for source sequence length T approaching infinity for stationary and ergodic sources.

The basic and theoretical principles of the CTW-method are described in this paper from 1995 by Frans Willems, Yuri Shtarkov and Tjalling Tjalkens. Here CTW is used for binary sources. Frans Willems, Yuri Shtarkov and Tjalling Tjalkens received for this paper the 1996 Paper Award of the IEEE Information Theory Society. The award was presented at the IEEE International Symposium on Infomation Theory in Ulm, Germany, 29.06.1997.

Frans Willems, Yuri Shtarkov and Tjalling Tjalkens extend in their paper from 1996 the CTW-method to sources with structures,which are more complex than tree structures. The generalized algorithms are again recursive, and they also achieve optimal redundancy behavior for sources in the model class for which the algorithm is designed.

Frans Willems extends the CTW-method in his article from 1996 for efficient coding of sequences generated by piecewise stationary sources. The transition pattern can be considered as part of the mechanism that selects the parameters of the source.

Frans Willems, Yuri Shtarkov and Tjalling Tjalkens have been asked by Michelle Effros, editor of the IEEE Information Theory Newsletter, to write a reflection contribution for their 1996 Paper Award of the IEEE Information Theory Society. The paper "Reflections on: Context-Tree Weighting: Basic Properties" contains a mini-course on universal source coding and an introduction to the CTW-method.

Frans Willems and Tjalling Tjalkens present in their aerticle from 1997 a method to decrease the storage and communication complexity of the CTW-method. The method is based on combining the estimated probability of a node in the context tree and weighted probabilities of its children in one single variable. The variable is represented by its logarithm.

Ran is a professor at the Technion - Israel Institute of Technology, Israel. His research interests include machine learning and data mining, online algorithms, competitive analysis and computational finance.

Yuri Shtarkov is a researcher at the Institute for Problems of Information Transmission (IPPI), Russia. His current interests include source coding and data compression.

Tjalling Tjalkens works at the Electrical Engineering Department of the University of Eindhoven, Netherlands. His research interests are Source coding and Channel coding.

Frans Willems works at the Electrical Engineering Department of the University of Eindhoven, Netherlands. His research interests are Source coding, Channel coding and Coding for Networks.

The research interests of Golan, assistant professor at the Technion - Israel Institute of Technology, Israel, and adjunct assistant professor at Cornell University, United States of America, are computational molecular biology and machine learning.