Monday, April 7, 2008

Exploiting Temporal Coherence

There are a few big-picture concepts I remember fondly from my first reading of 'Computer Graphics: Principles and Practice' in college. Two of the most important are the related concepts of spatial and temporal coherence:

Real-world sequences of moving images exhibit high local coherence in both space and time.

Modern GPU architectures are built around parallelisim and spatial image coherence - exploited at least in terms of mapping memory access locality to image spatial locality through texture caches and the like. Its possible to go even further with spatial and temporal coherence optimization to reduce computation.

The insight of above really hits home when you see how well modern video compression takes advantage of this principle to achieve nearly three orders of magnitude in coding effeciency. The optimizations are both spatial in terms of static image compression techniques, and temporal through motion compensation. Exploiting spatial coherence through pushing lower frequency sub-computations to lower image sampling frequencies is the subject of some recent and interesting work, such as this. I have been thinking of a related idea more closely based on image compression, which is another whole topic. I think there has been some past work on wavelet basis shading, but I've been thinking about simpler, faster techniques based on block image coding (DXTC) principles.

But anyway, temporal coherence is the other half of the picture, and here motion compensation again offers much inspiration. Way back in college I thought up a complicated system to catch sub-shader expressions in textures, to automatically factor out time-invariant sub portions of a complex shader, and catch them in quadtree textures. I didn't get very far into this is as an actual research project, mainly because 1.) its complicated, and 2.) there wasn't enough video memory to make this viable yet. Later, while at pandemic, probably after reading about more caching alternatives, such as the render cache and its ilk, and deferred shading, I stumbled across the strikingly simple idea of resuing past frame G-buffers as a cache. Its a very attractive cache as well, because its nearly fully memory effecient compared to textures or other storage domains which are not anisotropically aligned to pixels, haven't been fully filtered, etc. Screen space is pretty ideal in that sense! I actually implemented this and my coworker later found this sketch, which then became this paper, which pretty much describes exactly what I had tried and done. And building further, getting into complicated territory, another researcher combined that with the auto-compilation approach that originally fascinated me.

Tthe basic idea - which I called cached deferred shading - is simple. You just treat previous frames as a reusable cache, and backproject to find useable samples. I quickly found that error accumulates very fast, and using key frames and interpolated frames, just as they do in mpeg, makes lots of sense. I use stencil masks to keep track of dirty pixels. I originally tried both random and tiled refresh regions, just as Nehab et al did, forced to align to stencil blocks. I then tried key-frame based refreshing and was much more satisified with the results. Refreshing scattered sub-tiles has various problems at different tile sizes: larger than a stencil block is too obvious (blocky artifacts), while tile size or smaller appeared to blur out too quickly, probably because bilinear filtering quickly smooths out the image. With key framing, you are always filtering from a pristine copy, so this isn't an issue. Its basically artifact free, once tuned. I don't even bilinear filter the depth as they do, which would probably further improve quality at some cost (perhaps you can combine it with DoF or something?).

Overall it works pretty well. There are some issues: frame rate spikes on key frames, ineffecient texture cache utilization for random noisy error pixels, and spikes on rapid motion. I have some ideas on improvements for each - I intend to smooth out the overall frame rate by using this technique for even more shading steps, and spreading out the key frames for different steps across frames to even the load (ie, when you have numerous systems that spike, you can average them all out!). The reduced speedup on rapid motion is much less of a problem than you'd think in our games (3rd person action), but it can be improved with more advanced filtering and sampling techniques, where you only compute a subset of the samples each frame and take advantage of the fact that fast motion blurs out any high frequency spatial details - so it almost completely counter balances out.

Taking the spatial-temporal coherence optimization idea to its logically conclusion results in a very interesting radical way to approach a renderer. Really what you want is something like a real-time video compression loop, but without the encoding step. When you need new fresh image macrotiles, you invoke the rasterizer, sometimes rendering them at reduced spatial resolution depending on the amount of motion blur, and then these are fed into a forward reprojection system which composites most of the image, along with some holes and errors introduced by new/unpredicted moving objects, lighting changes, etc. But that would be an entire new line of research, and is more suited to a ray tracer or really tight tree-based micro-rasterizer that can effeciently handle micro draw call invocations. Another story for another day, and probably for another generation. But today, on the current hardware, exploiting temporal coherence for at least g-buffer reprojection is quite viable and useful.