Fractal image compression and other block-comparison schemes assume that an image contains sufficient self-similarity; that for any given block to be encoded, there will be one or more other blocks that look similar enough that their pixel content can be exchanged for a much smaller referential description. This is different from DCT, which simply finds the component waveform signals for each block.
Anyone who has assembled a jigsaw puzzle notices right away how many puzzle pieces look similar. Even different pieces can share many common features. It is tempting to think that all the pieces forming the sky in a landscape photo, for example, could be efficiently described. But sadly, there are pitfalls.
In video, self-similarity occurs heavily between successive frames, which is the logical basis for P-frame compression, and all proper video codecs do this. So we will focus on the problems with self-similarity for just one frame (or for key frames, if you prefer).
Self-similarity occurs in inverse proportion to block size. The smaller the blocks, the more likely a match can be found with other blocks. However, there are now more blocks to encode. Large blocks are fewer, but finding a match becomes exponentially more difficult.
The most fundamental error is that typical images (or images that people find useful) do not actually contain useful self-similarity. This is probably why Barnsley’s examples focus almost entirely on ideal pictures of ferns and fractals. In the real world, even a photo of a fern does not contain large self-similarity. A cursory review of an average photo yields abundant exceptions. Even if an object in the scene contains a highly regular repeating texture, it may lie at an angle unfavorable to the block grid, or be distorted by perspective, or the lighting may differ across it. Because the block matching scheme only sees pixels, all these things (and more, like dirt), easily frustrate it.
Another problem is that, in order to find a match, we cannot be choosy about which block we are comparing. That means that, at a minimum, when a match is found, we must encode the location of the block. Needless to say, block searching is also slow.
We are also limited in where we can search, because the decode process cannot copy from blocks which have not yet been drawn onto the frame. If we are drawing blocks from top to bottom, we can only fetch blocks above the current drawing position. This makes the first set of rows compress poorly as well.
Matches are almost impossible to find unless we allow for color shifting. This is a value (which is another encoding cost) that alters the brightness of a block in order to improve the match. If we also transform (rotate, flip, etc.) a block to improve a match, then the code number of the transform must also be encoded.
Matches also require lossy compression, because exact matches are astronomically unlikely. A tiny 2 x 2 grayscale block can have over four billion possible color combinations, and a 4 x 4 grayscale block has over 3.4 x 10 to the 38th power. This combinatorial problem is also why quadtree schemes (like TRUDEF) often split large blocks to encode more smaller blocks.
It would be fitting to say that the devil is in the details. We look at all the puzzle pieces of a grassy field and find them similar, but upon closer examination, each contains a unique pattern. It is easy to mistake the superficial similarity for the kind of similarity that can be efficiently compressed. If we were allowed to assemble our puzzle pieces in any order, we would find the resulting scene unacceptable, because even though all grass pieces are green, we would easily see a discordant random jumble. The details matter, and must be encoded.