www.data-compression.info The Data Compression Resource on the Internet
Artithmetic Coding (AC)
Arithmetic coding (AC) is a special kind of entropy coding. Unlike Huffman coding, arithmetic coding doesn´t use a discrete number of bits for each symbol to compress. It reaches for every source almost the optimum compression in the sense of the Shannon theorem and is well suitable for adaptive models. The biggest drawbak of the arithmetic coding is it´s low speed since of several needed multiplications and divisions for each symbol. The main idea behind arithmetic coding is to assign to each symbol an interval. Starting with the interval [0..1), each interval is devided in several subintervals, which sizes are proportional to the current probability of the corresponding symbols of the alphabet. The subinterval from the coded symbol is then taken as the interval for the next symbol. The output is the interval of the last symbol. Implementations write bits of this interval sequence as soon as they are certain. A fast variant of arithmetic coding, which uses less multiplications and divisions, is a range coder, which works byte oriented. The compression rate of a range coder is only a little bit less than pure arithmetic coding, and the difference in many real implementation is not noticeable.
Charles Bloom presents 1996 several new techniques on high order context modeling, low order context modeling, and order-0 arithmetic coding. Emphasis is placed on economy of memory and speed. Performance is found to be significantly better than previous methods.
A well structured description of the ideas, background and implementation of arithmetic coding in German from 2002 by Eric Bodden, Malte Clasen and Joachim Kneis. Good explanation of the renormalisation process and with complete source code. Very recommendable for German readers.
Together with the CACM87 paper this 1998 paper from Alistair Moffat, Radford Neal and Ian Witten is very well known. Improves the CACM87 implementation by using fewer multiplications and a wider range of symbol probabilities.
A very intersting and usefull site from Andrew Polar in 2007 about range encoding with technical details of arithmetic and range encoders and some patent issues. It contains links to the source code of RanCode.cpp, an range encoder written for research purposes.
This ACM paper from 1987, written by Ian Witten, Radford Neal and John Cleary, is the definite front-runner of all arithmetic coding papers. The article is quite short but comes with full source code for the famous CACM87 AC implementation.
Malte Clasen is a student of the RWTH Aachen, Germany, and is known as "the update" in the demoscene, a community of people whose target is to demonstrate their coding, drawing and composing skills in small programs called demos that have no purpose except posing.
Mark is the author of the famous compression site www.datacompression.info and has published articles in the data compression field for over ten years. He is an editor of the Dr. Dobb's Journal and author of the book "The Data Compression Book". He lives in the friendly Lone Star State Texas ("All My Ex's"...).
Amir works as a senior researcher at the Hewlett-Packard Laboratories, Palo Alto, United States of America, on imaging, image and video coding, signal processing, and security. He is coauthor of the Lossless Compression Handbook, and has released free C++ source code of fast arithmetic coding implementation.
Range coder source code from Michael Schindler, which is one of my favourite range coder implementations. A range coder is working similary to an arithmetic coder but uses less renormalisations and a faster byte output.