Saturday, December 13, 2008

Forward Reprojection - current console hardware

Lately I have been thinking alot about motion compensation inspired forward reprojection schemes, both for current console generation level hardware and for the next generation cone traced engine plans.

For the current generation, I think a reasonable approach is to store and track image macrotiles, just like mpeg does, at say 8x8 pixel granularity. The rasterizer (preferablly an octree hybrid point-splatting/polygon renderer) would be designed to be very effecient at culling scene geometry down to this fine level of granularity. Tiles, once generated, are reused across frames and projected on the GPU, which can render them quite easily as simple quads with a relatively simple pixel shader filter. To handle depth edges, I would render each tile as two quads, one covering the near-pixels, the other the far-pixels. (otherwise these edges will result in large stretched quads, rather than two small quads which move apart) The tiles would be managed with a caching policy, with weights assigned to the number of correct pixels the tiles provide each frame. Old invalidated tiles would then be evicted to make way for new tile generation.

The resulting image will have a small number of error regions that then need to be corrected - some pixels won't be hit by any previous tiles for new regions of the scene revealed by camera motion and occlusion changes. If you also track motion vectors for the tiles, you can have some regions that need to be invalidated because of animation. In some cases moving tiles will happen to project into an occlusion gap but are actually behind something else in the scene (false occlusion). For a static scene, you can treat the result of the forward projection as a conservative z-buffer. Animation errors can be handled then by simply not projecting stored tiles that have too much animation error. A coarse z-pyramid rendering can then reliably identify new screen tiles which need to be regenerated.

The stencil & z-buffer is used to track invalid regions of the screen, resulting in a low-res version of these maps which is read back onto the CPU. The CPU then does heirachical intersection of the image z/stencil pyramid with the scene octrees to rasterize out new tiles which need to be generated. A bias is used to avoid re-rendering onto valid reprojected screen tiles - essentially it searches for octree cells that intersect error pixels or have moved in front of the previous projection. This is particularly well suited to the PS3's SPU's, but could also work reasonably well on the 360 with slightly different tile size tradeoffs. The key is that it is relatively coarse, operating at lower levels of an image pyramid and an octree.

When combined with deferred point splat filtering, this system can tolerate and cover over a few pixels of small occlusion errors, which can improve performance at some small potential error cost. You want to avoid rasterizing a whole tile just because of a couple error pixels. The screen interpolation filtering for the point splatting would fill in missing z-information by propagating splat surfaces using a hierachical push-pull algorithm. In essence, small gaps of a couple of pixels caused by a moving edge would be filled in to match the background surface, and because even texturing is deferred in such a scheme, there would be no noticeable smearing - small gaps can thus easily be filled in. What your left with then is a more coherent error mask and less tiles that need to be refreshed.

Lighting changes would be handled seperately with deferred shading. Static light interactions could be cached in the g-buffer and greatly benefit from the forward projection. So your deferred shading system could seperate static and dynamic lights. Static lights could use the screen error mask so they only need to recompute for the small number of new pixels. There is one slight complication, which is moving shadow casters, but this can be handled by a rough low-res shadow map look up which identifies screen regions that are shadowed by dirty regions of the shadow map, and thus need to be resampled.

This type of fine-grained micro-culling architecture is also exactly what you need to do really high quality outdoor shadows through quadtree shadow maps, and the reprojection scheme can also be employed to speed up the shadow map generation as well. And in this case, there is even more temporal coherence for typical outdoor scenes, as the sun can be treated as a static light (even if time-of-day is simulated, this would only change the projection every minute or so - not an issue). For this case, the only shadowmap regions that need to be regenerated are new quadtree cells as they are expanded in response to scene update, and cells which overlap moving objects. There are some crappy cases like a windy day in a jungle, but for most typical scenes this could be a vastly faster shadowing system that could scale to the ultra-detailed geometry of a frame coherent point-splatting renderer.

No comments: