8351509, 8639053, barnsley, block, color, compressor, dimension, domain, encoder, fractal, frame, image, interpolate, interpolation, iterated, pifs, pixel, range, resample, rgb, softvideo, tmm, tmmi, trudef, upscale, upscaling, vdk, video, ycbcr, zoom
In the previous post, we went over the upscaling process a little. Now we will go into Dimension’s patent 8,639,053 in more detail.
Side note: as in the prior posts, the meanings of “domain” and “range” are reversed. This occurs because Barnsley (and the later patents) used “domain” to refer to the nonoverlapping blocks being processed, while I use “domain” to refer to the blocks being searched which is the nomenclature in other papers.
The patent most likely describes the upscaling method used by the SoftVideo/Trudef encoder. Its computational demands closely match that of the Iterated VDK’s compressor application when zooming is specified. Also, it is influenced by PIFS, and the compromises from PIFS (to save time) are similar to those made by the compressor.
A key feature will be to not take too long to upscale the current video frame, since the time must be added to whatever time the encoder itself is taking to encode the image.
Another key feature is that the upscaler must make some attempt at making the frame sharper than it would be with traditional means such as bicubic upsampling, otherwise the whole point of “fractal zooming” is lost.
The patent is identified as upscaling standard definition video to high quality, but since the VDK can be fed any size frame, the upscaler must allow for a generality of frame sizes including HD to 4K. Dimension touts going even higher. In the patent’s abstract, this generalization is claimed, and farther in, several examples are given of different resolution mappings. The patent even mentions 4K and larger sizes.
The patent goes to considerable lengths to talk about marketing opportunities, which is odd. I get the impression that Dimension is overly eager to promote its technology.
Overall, the upscaler tries to replace small blocks with large blocks having a similar (but sharper) appearance than if the small block were only bicubically enlarged. To do this, it divides the frame into blocks and then, for each one, searches for an appropriate larger block. Block comparisons are performed in YCbCr color space and the final upscaled image is converted back to RGB.
Video input is not necessary to the process, since only an image is being upscaled, except if one wants to extend block searches to the previous frame. This does not appear to be a preferred method. Other features, such as deinterlacing the input video prior to upscaling, are mentioned but appear to be at the choice of the implementer.
Upscaling is preferred to be 200%. In cases where a higher scale must be used, it is recommended to simply upscale the result as many times as needed, and then to downscale (if necessary) to fit the target resolution. To upscale a 720 x 480 frame to HD, for example, one would first upscale to 1440 x 960, then scale again to 2880 x 1920, then downsampled to 1920 x 1080. This is very different from PIFS (which only needs to upscale once), and must be computationally expensive. Merely upscaling twice takes five times longer, since the second pass now involves four times as many pixels than the first.
Each range block is normally 1 x 1 or 2 x 2 in size, but include surrounding pixels to effectively make them 3 x 3 or larger. So the range blocks are 3 x 3 but overlap. The 2 x 2 range block case does not appear to be mentioned further.
Only sixteen 6 x 6 domain blocks are searched for, above and to the left of the current range block. This is much faster than PIFS, but the small search area must also produce far fewer good matches.
Spatial transforms are not used. Instead, domain blocks have their color multiplied and shifted. Since nearby blocks tend to share the same overall color, the color transforms involved need not be great. It probably also explains why the search range is so small: farther out, the blocks would diverge more greatly in color, and a wider range of time-consuming color transforms would be required.
Two artifact reduction filters are applied to the pixels of the best matching block. The first filter keeps pixels from being too different than their immediate neighbors. The second filter keeps the average of the upscaled block from being too different than the center pixel of the original block. Both of these filters introduce blur, but they must have been deemed important. From the sound of it, the original process introduced too much noise or color divergence.
After the frame is scaled and converted to RGB, a smoothing filter is applied to reduce noise further. However, this would also further reduce the sharpness gained.
No PIFS-like fractal interpolation is used, as there is no program loop performing affine mapping iterations. Like the compressor, blocks are simply compared, color shifted, and copied. With the small block search range, lack of spatial transforms, and smooth filtering, the results would be hard to distinguish from normal upscaling except perhaps in high-contrast cases. However, the vast majority of video is photographic in nature, and therefore more continuous tone.