, , , , ,

Compressing data is all about replacing long pieces of information with shorter pieces. At the same time, we want to make the process that performs the compression (and decompression) fast enough to be practical. If too much time is taken, the solution tends to be only useful for archiving.

Run Length Encoding (RLE) replaces contiguous “runs” of similar values with a tiny code that says “repeat the value X, Y times”. It is useful for simple cartoon-like images but bad for text. Simple image formats like TGA and BMP use RLE in certain cases.

Lempel-Ziv compression looks for more complex patterns that repeat. This is better for text, for example, because certain words like “the” occur frequently, and can be substituted by a token which looks up the word in a dictionary. Image formats like GIF and PNG use a form of this compression.

Huffman (and arithmetic) encoding try to store values with fewer bits, by exploiting that some number values occur more frequently than others. It is common practice for image and video codecs to use Huffman coding at their final stages after figuring out which numbers to use as part of their own compression. For example, a fractal compressor may build a fractal code that contains a color shift value of 2, but because the value 2 occurs frequently, it can be squeezed further by being Huffman coded.

Bitwise streaming is used to save bits that are never used. For example, if we are storing an integer value that ranges from 0 to 63, we only need six bits. This is far fewer than the thirty-two bits integers normally take, and even fewer than the eight bits that a byte occupies. Bitwise streaming is often used when Huffman coding is used. The Adobe Flash format used bitwise streaming to more efficiently encode scripts and fonts.