Google Wins


, , , , , , , , , , ,

I discussed before how, if one wanted to meaningfully upscale images, one would need the computer to be an artist, inserting whole new details in order to truly create new pixels.

It now appears that Google has done just that: their Google Brain project upscales an 8 x 8 image 400% to create a 32 x 32 image with creatively added details. It uses a machine learning system to estimate what types of facial features, for example, to draw based on a low-resolution image that is assumed to be a face.


The system is in its early days, so the results are not quite yet ready for prime time, but the provided examples do a good job of demonstrating what the system will become. This is the kind of research that, on top of RAISR (mentioned in a recent post), really shuts the door on everyone else. Google is simply blowing right by its competitors.

I can even tell you what Google is going to develop next, and it will be amazing. I can do this not because I know anyone at Google, but because it is the obvious next step. Ready?
People have immediately pointed out — as I have done with fractals — that this new upscaler is not suitable for law enforcement because new pixels are dreamed up and therefore cannot implicate a particular person with certainty. They are correct, but this can be overcome.

A single frame of a surveillance video providing a low-resolution image of a person’s face is not enough, but if one takes several frames, then the information to reconstruct the face is available. Methods like SuperRes use optical flow but the pixel data must not move much from one frame to the next. What Google Brain can do, however, is use multiple frames to provide error correction even if the pixels have moved considerably. All the user needs to do is to tell the system how the person is oriented in each frame. The system makes an initial estimate, then sees if its guesses can be used on subsequent frames, and when those do not work, it revises prior estimates until its guess works for all the given frames. The basic idea is that, while many high-resolution images can be downsampled to produce one low-resolution image and thus produce an ambigious solution, only a far fewer set of high-resolution images can match a similarly numbered set of low-resolution images. The more frames, the better. One may not even require frames from the same video.

Needless to say, law enforcement will be interested, but Google has that market wrapped up. Competitors need not apply.


Review of the 2016 TMMI patents


, , , , , , , , ,

I was asked to evaluate three patents that TMMI filed in 2016, specifically, patents US20160078601A1, WO2016186925A1, and WO2016040939A1.

They are worthless, and could justify the patent office rejecting them. Either TMMI’s management does not understand that the patents describe useless technology, or they filed them just so that they could claim that they have patents. Each patent also heavily tries to cover alternative formulations, which is laughable given that such alternatives include considerable prior art.

I will now go over them individually.

Patent US20160078601A1: Image upsampling using local adaptive weighting

Basically a knockoff of any image upsampler that performs edge detection and weights interpolated pixels accordingly to make enlarged edges sharp. This is probably the algorithm that TRUPIX uses, and it is no better than NEDI which has been around for many years. But with methods like RAISR, the industry has already moved beyond this patent anyway.

Bonus goof: the patent talks about how mobile devices and TV sets may not be powerful enough to do upscaling. Someone needs to let TMMI know what year this is.

Patent WO2016186925A1: Systems and methods for digital video sampling and upscaling

This one has to be read to be believed. All it describes is a general method to decode a video, upscale the frames, and then encode the upscaled frames into a new, larger video. People have been doing this for a long time, so it is a perfect case of prior art. Due to its generality, I cannot fathom how this even qualifies as a patent. It is more like those garbage patents that attempt to grant a monopoly for a mere business process.

Patent WO2016040939A1: Systems and methods for subject-oriented compression

This one describes how a computer can identify and track regions of interest in a video in order to compress them with higher fidelity than other (background) regions. No real specific algorithms mentioned, and it relies on user input. The point is to compress video by compressing the background regions more, but no mention is made of any specific compression method.

Someone needs to remind TMMI that video codecs use keyframes to avoid needlessly encoding static backgrounds, and modern codecs use interframe motion detection to avoid encoding parts of images that are simply moving. Why compress background regions at lower quality when you can have them at high quality anyway? Given how much user input this patent’s method requires, the meager compression savings are not worth it.

Overall, it sounds like TMMI wanted to get some patents to impress people, so they just looked around and filed whatever was “close enough to sound patent-ish.” They would be better off refunding investors’ money and calling it a day. They do not understand that a technology company needs to have, well, technology.

TMMI and Dimension left behind


, , , , , , , , , , ,

Back in June 2016, Twitter bought a company named Magic Pony Technology. TechCrunch has a nice article here:

Twitter pays up to $150M for Magic Pony Technology, which uses neural networks to improve images

In a fraction of the time of TMMI’s existence, Magic Pony did what TMMI or Dimension never could: find a way to upscale images and video with decent quality. They got $150 million and more.

The idea is to use machine learning: a neural network is trained using a large set of image pairs. Each pair has a low resolution and a high resolution version of the same image, and the computer learns how to generate detail to do the upscaling.

But that is not all. In November 2016, Google announced RAISR, short for Rapid and Accurate Image Super Resolution. Google has a page here:

And their research paper with the algorithm is available for free here:


There is a nice video explaining RAISR here:


If you prefer text, there are these links:

The last link is perhaps the best because it goes into the algorithm more. There are also lots of images showing what the effect looks like; Google is not trying to keep anything secret.

RAISR is good, and it is fast. It is so fast that Google is positioning it for use in smartphones, which means it will probably be fast enough to also use in televisions. This is not the NEDI-style technology that TMMI was imitating last year, it is better. It is produced by a big company and it is open and free.

Whatever TMMI and Dimension are planning, they will need something that leaps ahead of RAISR, not just technically but also in terms of openness and zero cost. Given their history, it is unlikely. As all businesses know, the world does not wait. The future is finally here, and Google delivered it.

The end of an era?


, , , , , , , , ,

TMMI recently announced that it is dropping its lawsuit against Dimension regarding ownership of fractal video technology, because it is pursuing a different approach.

This is a welcome development, and hopefully Dimension will follow suit and also cease promoting fractal technology, as it has long been known to be stillborn, especially for video. We all benefit from closing the book on a technology that never really went anywhere.

Why are fractals hopeless? Because of one simple reason: the inverse problem has no solution in practical time. This was known from the outset, but TMMI kept — and Dimension still keeps beating — a dead horse.

What is the inverse problem? In fractal imaging, a picture with self-similar features (e.g. small and large circles) can be efficiently described using simple formulas that are repeatedly applied. The problem lies with determining what the exact formulas are, given only the original image. The image must be painstakingly searched with subregions compared to other regions. Hence, the inverse problem. Decoding a fractal file is easy and relatively fast, but encoding? Anything but.

Barnsley’s graduate student Arnaud Jacquin simplified the inverse problem by dividing the image into smaller pieces, but this introduced quality issues: when the size of the blocks is large enough to make compression efficient, the seams between adjacent blocks becomes more visible. Also, the range of efficient fractal formulas goes down enormously, so the full power of fractal compression never gets utilized. Even with his method, encoding time is still too long. Finally, fractals are a poor method for general-purpose image compression, because they rely so heavily on self-similarity, and there is no guarantee that enough such frames will exist in a video.

So it is good that at least one vendor has woken up and finally admitted what everyone else already knew. If the other can too, we can move on and stop wasting investor’s time and money.

Adobe already has the video upscaling market


, , , , , , , , , , , , ,

It turns out that Adobe uses their “Preserve Details” resampler in not just Photoshop, but also in their After Effects video editing software. That means that with TRUPIX, TMM is competing with a well-established company that has been producing and selling the technology for years and has already dominated the video upscaling market.

Adobe’s resampler is also fast. In their demonstration videos and tutorials, the effect occurs as soon as the user clicks OK to proceed. It is also fast enough to be shown in a live preview area of the resampler’s user interface. It also offers noise reduction.

There has been some conjucture that any slowness in TRUPIX is due to the need to write processed video frames back to permanent storage such as a hard drive. However, there is no indication of this. The site’s chart ambigiously says “Processing Time Result” only and clearly defines the test platform as having 16 GB of RAM and a 1 TB SSD drive. Given that Windows uses write caching, and SSD drives are very fast, it is reasonable to assume that TRUPIX is not a realtime upscaler. If it is, then surely this would be a feature point that the company would have mentioned. We either have a slow product or poor marketing.

The minimum system specifications are odd. RAM is not mentioned, but apparently a 1 TB SSD drive is required. Does TRUPIX require some sort of massive disk-based database in order to work?

Another concern mentioned is if TRUPIX is not based on fractals. After all, much was claimed of fractal’s superiority over other methods. The processing speed suggests that fractals have been abandoned as the requisite block searching is extremely slow. At the very least, the company owes it to its shareholders to explain whether fractals are used or not, and if not, why not. It is possible that TRUPIX is some variation on NEDI (new edge-directed interpolation). Since NEDI is well documented and can be implemented by anyone, the market for TRUPIX is even more questionable.

Finally, it would be nice to see non-zoomed, high-resolution versions of their test images, so that TRUPIX upscaling could be compared to actual high-resolution content.

TMMI releases TRUPIX


, , , , , , ,

TMMI announced a new product last Friday called TRUPIX. It upscales video frames and offers several integer zoom levels, e.g. 2x, 3x, 4x, etc. They also have downloadable still images in BMP format showing how well TRUPIX performs.

I suppose that simply making such data available is a good sign. Before, the company was notorious for keeping all of their research and results hidden. Hopefully more transparency will be forthcoming, such as a detailed explanation of the upscaling algorithm.

The introductory image on the company’s website for TRUPIX could be construed as misleading: it shows a 540 pixel image on top of a 720 pixel image, and so on, each larger image showing successively more detail. One might be led to believe that the tiniest image can be upscaled to be as detailed as the largest. This, of course, is not the case.

Below are details of two of their example images: a motorcycle upsampled 4x. I chose an area that contains text as this is one of the hardest tests for any upsampler. On the left side is the original image upsampled 4x using a traditional bicubic algorithm, and on the right side is the same image upsampled 4x using TRUPIX.

TRUPIX 4x upscaling comparison

The TRUPIX upsampling is noticeably sharper, but identical to long-existing technologies such as Perfect Resize and its predecessor, Genuine Fractals. Because the former can easily be configured to process multiple frames, there is little advantage to using TRUPIX. Perfect Resize is also a far more robust product including a full graphical preview system and many more filtering options. There are of course other products: Adobe Photoshop CC includes a “Preserve Details” option in its resampler, and Alien Skin offers their vectorizing upsampler BlowUp, etc.

The text in the image shows that no actual new detail is added to the TRUPIX image; it simply does the best it can with the pixels given to it. Edges are crisper, but the text does not resolve into perfect legibility. In technical terms, the information limits imposed by Shannon entropy are not overcome.

In the upper part of the image, the aliasing artifacts immediately below the white rectangle are not successfully smoothed. So the TRUPIX algorithm operates solely on local pixel areas and has insufficient knowledge of what the image is about in order to sharpen that region the way a human artist would.

To make matters worse, TRUPIX does not work in realtime but is instead an order of magnitude slower even when running on high-end equipment. So it cannot be used for live broadcasting, security applications, live surveillance, etc. In contrast, the existing digital zoom in one’s smartphone offers similar edge sharpening and easily records video in realtime.

Overall, TRUPIX is welcome in that TMMI is finally releasing something, but in practical terms, it is a textbook case of too little, too late. Is this what the company’s followers and investors have been waiting decades for?

TMM finally has something?


, , , , , , , , , , , , , , , , , , , ,

A few days ago, TMM issued a long letter to its shareholders claiming that it has a new, patent pending algorithm. They did not say what this algorithm did, or what methods (fractal, DCT, etc.) it uses, except that it would enhance the video market. They also did not say if it was from their collaboration with Raytheon.

Oddly enough, there was a big stress on how TMM is playing for the long term, with their future divided into categories of crawl, walk, and run. Does this mean that their new algorithm is just a baby step? Will it need a lot more work before anything useful comes out? Or is something so domain-specific that while useful for a special case of image data, it needs more work to be useful on mainstream material?

At the very least, TMM is now on the hook for a deliverable, so hopefully in a few weeks they will show some plausible demonstrations. NAB 2015 is coming up soon, so they have another chance to demo there too. If they do, however, it needs to be a real public demo with proper third-party validation instead of behind closed doors.

TMM said the new algorithm is proprietary, which is unfortunate. It runs counter to the way every other codec is developed, and hints at onerous licencing conditions, which was one of the key downfalls of Iterated’s work.

The investment community took the news with cautious optimism. The stock is up 2-3 cents and seems to be holding there, but is still under ten cents. The previous press releases are probably to blame; they moved the needle for only a day or two before having it plunge back. Too much wolf has been cried, so everyone wants to see real evidence of progress this time around.

In the meantime, I played with the popular VLC media player, which has a handy sharpen filter. It works very well. You can even toggle it during playback to easily see how well it works. If you want to save a lot of bandwidth on a 4K TV, just play a regular HEVC-encoded HD video upscaled 200% and sharpen it. When I saw how good it was, I understood why nobody cares about TRUDEF.

Another window closes


, , , , , , , , , , , , , , , ,

Several days ago, Apple released the iPhone 6. It did not go unnoticed that this phone uses HEVC (also known as H.265) to encode FaceTime video calls. Which is a great help for customers on cellular networks since it reduces bandwidth usage by up to half.

This is the kind of thing Dimension and TMMI were supposed to address. Fractals were supposed to shrink video in amazing ways and boost network capacities.

Apple is no stranger to video (they developed QuickTime, and were key in popularizing H.264), and they carefully scrutinize available technologies in order to stay on top. So they must have known about TRUDEF and decided it was not for them. Which was not a hard call to make; any codec that cannot encode in realtime is a non-starter.

On the Android side, Intel has been making great strides with the Strongene HEVC codec. Now that Apple has it on the iPhone 6, the pressure on Android is even greater. I see the future playing out like this:

  • Apple upgrades iPhones to encode non-FaceTime videos in HEVC also.
  • Android phones and tablets support HEVC.
  • Apple adds HEVC support to their iPad tablets.
  • With mobile users creating HEVC content on 4K devices such as the Droid Turbo, 4K TV sales increase.
  • More HEVC and/or VP9 videos appear on YouTube.
  • Netflix expands its 4K catalog.
  • Apple releases iPhone 7 which has true 4K resolution.

So what we are seeing now is the start of HEVC getting an absolute lock on the market. Whatever Dimension and TMMI are hoping to accomplish, they need to do it soon, or risk being niche players.

Problems with self-similarity as a basis for image compression


, , , , , , , , , , , , , , ,

Fractal image compression and other block-comparison schemes assume that an image contains sufficient self-similarity; that for any given block to be encoded, there will be one or more other blocks that look similar enough that their pixel content can be exchanged for a much smaller referential description. This is different from DCT, which simply finds the component waveform signals for each block.

Anyone who has assembled a jigsaw puzzle notices right away how many puzzle pieces look similar. Even different pieces can share many common features. It is tempting to think that all the pieces forming the sky in a landscape photo, for example, could be efficiently described. But sadly, there are pitfalls.

In video, self-similarity occurs heavily between successive frames, which is the logical basis for P-frame compression, and all proper video codecs do this. So we will focus on the problems with self-similarity for just one frame (or for key frames, if you prefer).

Self-similarity occurs in inverse proportion to block size. The smaller the blocks, the more likely a match can be found with other blocks. However, there are now more blocks to encode. Large blocks are fewer, but finding a match becomes exponentially more difficult.

The most fundamental error is that typical images (or images that people find useful) do not actually contain useful self-similarity. This is probably why Barnsley’s examples focus almost entirely on ideal pictures of ferns and fractals. In the real world, even a photo of a fern does not contain large self-similarity. A cursory review of an average photo yields abundant exceptions. Even if an object in the scene contains a highly regular repeating texture, it may lie at an angle unfavorable to the block grid, or be distorted by perspective, or the lighting may differ across it. Because the block matching scheme only sees pixels, all these things (and more, like dirt), easily frustrate it.

Another problem is that, in order to find a match, we cannot be choosy about which block we are comparing. That means that, at a minimum, when a match is found, we must encode the location of the block. Needless to say, block searching is also slow.

We are also limited in where we can search, because the decode process cannot copy from blocks which have not yet been drawn onto the frame. If we are drawing blocks from top to bottom, we can only fetch blocks above the current drawing position. This makes the first set of rows compress poorly as well.

Matches are almost impossible to find unless we allow for color shifting. This is a value (which is another encoding cost) that alters the brightness of a block in order to improve the match. If we also transform (rotate, flip, etc.) a block to improve a match, then the code number of the transform must also be encoded.

Matches also require lossy compression, because exact matches are astronomically unlikely. A tiny 2 x 2 grayscale block can have over four billion possible color combinations, and a 4 x 4 grayscale block has over 3.4 x 10 to the 38th power. This combinatorial problem is also why quadtree schemes (like TRUDEF) often split large blocks to encode more smaller blocks.

It would be fitting to say that the devil is in the details. We look at all the puzzle pieces of a grassy field and find them similar, but upon closer examination, each contains a unique pattern. It is easy to mistake the superficial similarity for the kind of similarity that can be efficiently compressed. If we were allowed to assemble our puzzle pieces in any order, we would find the resulting scene unacceptable, because even though all grass pieces are green, we would easily see a discordant random jumble. The details matter, and must be encoded.