 www.data-compression.infoThe Data Compression Resource on the Internet Entropy Coding (EC)

The process of entropy coding (EC) can be split in two parts: modeling and coding. Modeling assigns probabilities to the symbols, and coding produces a bit sequence from these probabilities. As established in Shannon's source coding theorem, there is a relationship between a symbols probability and its corresponding bit sequence. A symbol with probability p gets a bit sequence of length -log(p).
In order to achieve a good compression rate, an exact propability estimation is needed. Since the model is responsible for the probability of each symbol, modeling is one the most important tasks in data compression.
Entropy coding can be achieved by different coding schemes. A common scheme, which uses a discrete number of bits for each symbol, is Huffman coding. A different approach is arithmetic coding, which outputs a bit sequence representing a point inside an interval. The interval is build recursively by the probabilities of the encoded symbols.

 Publications