Notes

Chapter 10: Processes of Perception and Analysis

Section 5: Data Compression

Arithmetic coding

Consider dividing the interval from 0 to 1 into a succession of bins, with each bin having a width equal to the probability for some sequence of blocks to occur. The idea of arithmetic coding is to represent each such bin by the digit sequence of the shortest number within the bin—after trailing zeros have been dropped. For any sequence s this can be done using

Module[{c, m = 0}, Map[c[#] = {m, m += Count[s, #]/Length[s]} &, Union[s]]; Function[x, (First[RealDigits[2^# Ceiling[2^-# Min[x]], 2, -#, -1]] &)[Floor[Log[2, Max[x] - Min[x]]]]][Fold[(Max[#1] - Min[#1]) c[#2] + Min[#1] &, {0, 1}, s]]]

Huffman coding of a sequence containing a single 0 block together with n 1 blocks will yield output of length about n; arithmetic coding will yield length about Log[n]. Compression in arithmetic coding still relies, however, on unequal block probabilities, just like in Huffman coding. Originally suggested in the early 1960s, arithmetic coding reemerged in the late 1980s when high-speed floating-point computation became common, and is occasionally used in practice.