I discussed before how, if one wanted to meaningfully upscale images, one would need the computer to be an artist, inserting whole new details in order to truly create new pixels.
It now appears that Google has done just that: their Google Brain project upscales an 8 x 8 image 400% to create a 32 x 32 image with creatively added details. It uses a machine learning system to estimate what types of facial features, for example, to draw based on a low-resolution image that is assumed to be a face.
The system is in its early days, so the results are not quite yet ready for prime time, but the provided examples do a good job of demonstrating what the system will become. This is the kind of research that, on top of RAISR (mentioned in a recent post), really shuts the door on everyone else. Google is simply blowing right by its competitors.
I can even tell you what Google is going to develop next, and it will be amazing. I can do this not because I know anyone at Google, but because it is the obvious next step. Ready?
People have immediately pointed out — as I have done with fractals — that this new upscaler is not suitable for law enforcement because new pixels are dreamed up and therefore cannot implicate a particular person with certainty. They are correct, but this can be overcome.
A single frame of a surveillance video providing a low-resolution image of a person’s face is not enough, but if one takes several frames, then the information to reconstruct the face is available. Methods like SuperRes use optical flow but the pixel data must not move much from one frame to the next. What Google Brain can do, however, is use multiple frames to provide error correction even if the pixels have moved considerably. All the user needs to do is to tell the system how the person is oriented in each frame. The system makes an initial estimate, then sees if its guesses can be used on subsequent frames, and when those do not work, it revises prior estimates until its guess works for all the given frames. The basic idea is that, while many high-resolution images can be downsampled to produce one low-resolution image and thus produce an ambigious solution, only a far fewer set of high-resolution images can match a similarly numbered set of low-resolution images. The more frames, the better. One may not even require frames from the same video.
Needless to say, law enforcement will be interested, but Google has that market wrapped up. Competitors need not apply.