www.data-compression.info
The Data Compression Resource on the Internet

Contents

 Adaptive Model


The process of entropy coding can be split in two parts: modeling and coding. Modeling assigns probabilities to the symbols, and coding produces a bit sequence from these probabilities. As established in Shannon's source coding theorem, there is a relationship between the symbol probability and it's corresponding bit sequence. A symbol with probabilitiy p gets a bit sequence of length -log(p).
In order to achieve a good compression rate, an exact propability estimation is needed. Since the model is responsible for the probability of each symbol, modeling is one the most important tasks in data compression.

An adaptive model is a model, which changes the symbol probabilities during the compression process in order to adapt to the changing contexts during the process. Initially the compression process starts with a basic model, so the model doesn't need to be transmitted. During the process, the model gets adapted by the symbols, which have been transmitted already. It is important that the model gets adapted only by the symbols, which have been transmitted already, since the decoder needs to be able to adapt the model in the same way later at the decompression process. Since the decoder knows the before transmitted symbols, he can adapt the model in the same way than the coder did.

An interesting and nice overview concerning different techniques for using compression in cache and main memory systems is presented in a paper of Sparsh Mittal and Jeffrey Vetter.

 Publications


Logo

Title

Description

Modeling for Text Compression

Modeling for Text Compression
 

Although from 1988 this paper from Timothy Bell, Ian Witten and John Cleary is one of my favourites. It is easy to read, well structured and explains all important details.
Models are best formed adaptively, based on the text seen so far. This paper surveys successful strategies for adaptive modeling which are suitable for use in practical text compression systems. The strategies fall into three main classes: finite-context modeling, in which the last few characters are used to condition the probability distribution for the next one; finite-state modeling, in which the distribution is conditioned by the current state (and which subsumes finite-context modeling as an important special case); and dictionary modeling, in which strings of characters are replaced by pointers into an evolving dictionary. A comparison of different methods on the same sample texts is included, along with an analysis of future research directions.
 

Compression: Algorithms: Statistical Coders

Compression: Algorithms: Statistical Coders
 

A good introduction into entropy coding is article from Charles Bloom in 1996. The process of statistical coding is explained with many simple examples.
 

Solving the Problems of Context Modeling

Solving the Problems of Context Modeling
 

This paper from Charles Bloom in 1998 is about the PPMZ algorithm. It handles local order estimation and secondary escape estimation.
 

New Techniques in Context Modeling and Arithmetic Encoding

New Techniques in Context Modeling and Arithmetic Encoding
 

Charles Bloom presents 1996 several new techniques on high order context modeling, low order context modeling, and order-0 arithmetic coding. Emphasis is placed on economy of memory and speed. Performance is found to be significantly better than previous methods.
 

Inductive Modeling for Data Compression

Inductive Modeling for Data Compression
 

John Cleary and Ian Witten wrote this basic paper about modeling, parsing, prediction, context and state in 1987.
 

Context Modelling for Text Compression

Context Modelling for Text Compression
 

Several modeling strategies and algorithms are presented in 1992 by the paper of Daniel Hirschberg and Debra Lelewer. It contains a very interesting blending strategy.
 

The Design and Analysis of Efficient Lossless Data Compression Systems

The Design and Analysis of Efficient Lossless Data Compression Systems
 

The thesis of Paul Howard from 1993 about data compression algorithms with emphasis on arithmetic coding, text and image compression.
 

Data Compression (Tutorial)

Data Compression (Tutorial)
 

A basic paper from Debra Lelewer and Daniel Hirschberg about fundametal concepts of data compression, intended as a tutorial from 1987. Contains many small examples.
 

Streamlining Context Models for Data Compression

Streamlining Context Models for Data Compression
 

This paper from 1991 was written by Debra Lelewer and Daniel Hirschberg and is about context modeling using self organizing lists to speed up the compression process.
 

A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory Systems

A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory Systems
 

A survey of different techniques for using compression in cache and main memory systems discussing similarities and differences in compression inside CPUs and GPUs, conventional and non-volatile memory systems, and 2D and 3D memory systems by Sparsh Mittal and Jeffrey Vetter in 2015.
 

Lossless Compression for Text and Images

Lossless Compression for Text and Images
 

Again a basic paper about modeling and coding with models for text and image compression, written by Alistair Moffat, Timothy Bell and Ian Witten in 1995.
 

Arithmetic Coding + Statistical Modeling = Data Compression

Arithmetic Coding + Statistical Modeling = Data Compression
 

Mark Nelson's article about arithmetic coding from 1991. The concepts are easy to understand and accompanied by a simple "BILL GATES" example. Source code for Billyboy is available.
 

Arithmetic Coding for Data Compression

Arithmetic Coding for Data Compression
 

This ACM paper from 1987, written by Ian Witten, Radford Neal and John Cleary, is the definite front-runner of all arithmetic coding papers. The article is quite short but comes with full source code for the famous CACM87 AC implementation.
 

 People


Logo

Name

Description

Timothy Bell

Timothy Bell
 

Timothy Bell works at the University of Canterbury, New Zealand, and is "father" of the Canterbury Corpus. His research interests include compression, computer science for children, and music.
 

Charles Bloom

Charles Bloom
 

Charles Bloom has published many papers about data compression and is author of PPMZ2, a very strong compression algorithm (2.141 bps on the Calgary Corpus)
 

John Cleary

John Cleary
 

John Cleary works at the University of Waikato, New Zealand, and has published several well known papers together with Ian Witten and Timothy Bell.
 

Daniel Hirschberg

Daniel Hirschberg
 

Daniel Hirschberg is working at the University of California, United States of America. He is interested in the theory of design and analysis of algorithms.
 

Paul Howard

Paul Howard
 

Paul Howard is working at the Eastern Michigan University, United States of America, and is engaged in the arithmetic coding filed since 10 years.
 

Sparsh Mittal

Sparsh Mittal
 

Sparsh is a postdoctoral research associate in the Future Technologies Group at the Oak Ridge National Laboratory, USA.
 

Alistair Moffat

Alistair Moffat
 

Alistair Moffat is working at the University of Melbourne, Australia. Together with Ian Witten and Timothy Bell he is author of the book "Managing Gigabytes".
 

Radford Neal

Radford Neal
 

Radford Neal works at the University of Toronto, Canada. He is one of the authors of the CACM87 implementation, which sets the standard in aritmetic coding.
 

Mark Nelson

Mark Nelson
 

Mark is the author of the famous compression site www.datacompression.info and has published articles in the data compression field for over ten years. He is an editor of the Dr. Dobb's Journal and author of the book "The Data Compression Book". He lives in the friendly Lone Star State Texas ("All My Ex's"...).
 

Jeffrey Vetter

Jeffrey Vetter
 

Jeffrey is a group leader of the Future Technologies Group at thr Oak Ridge National Laboratory, USA.
 

Ian Witten

Ian Witten
 

Ian is working at the University of Waikato, New Zealand. Together with John Cleary and Timothy Bell he published "Modeling for Text Compression".
 

 Source Code


Logo

Title

Description

CACM87

CACM87
 

The standard CACM 1987 implementation of arithmetic coding in three different versions from John Cleary, Radford Neal and Ian Witten.
 

Arithmetic Coding + Statistical Modeling = Data Compression

Arithmetic Coding + Statistical Modeling = Data Compression
 

The source code for the arithmetic coding article from Mark Nelson.
 

 

Copyright © 2002-2022 Dr.-Ing. Jürgen Abel, Lechstraße 1, 41469 Neuß, Germany. All rights reserved.