On 360, the memory situation is even worse because of EDRAM. Even though the bandwidth while rendering is nearly free, the geometry overhead adds cost and the all the extra resolves eat up time. The tile padding ends up using up even more total bandwidth. And during resolve, the rest of the system is essentially idle, destroying parallelism, so its actually somewhat worse than on PS3.
Newer GPU can do MSAA compression while rendering, using simple block based schemes that store a couple of real samples per pixel (like the min/max of DXTC), and then a large number of 'coverage samples' per pixel which are simply a few bits to select an examplar sample. This compression takes advantage of the fact that really the high frequency information we care about is coverage, and its somewhat wasteful to store all of our buffers at multi-sample resolution.
So based on that, I have a MSAA deferred shading idea that uses 2 passes. Lets call this technique MSAA Z-prepass. One pass is rendered with z-only, 2x or even 4x MSAA, and no other buffers active - an MSAA z-pre pass essentially. The second pass is rendered with your typical DS buffers, but no MSAA. You then perform lighting/shading as normal, resulting in a pre-final non-MSAA framebuffer. As a final step, you use a bilateral filter of depth to fill in the missing information and up-sample to MSAA resolution, which can then be resolved back down for the final buffer - naturally this can all be combined in to one fast step. I'm assuming familiarity with bilateral filtering - but basically here I'm using it as a depth-sensitive up-sample. The results should be very similar to full MSAA on all the buffers, but without the memory/bandwidth cost - as it uses the same compression principle the newer coverage based MSAA techniques use.
With a careful z-downsample/resolve pass, you can probably use the 1st Z pass to populate the Hi-Z for the 2nd pass and speed up rendering. Still requires 2 render passes, which is un-optimal as I described in a previous post.
This is still on my wishlist, not something I've had the time to implement, but there was a recent paper by some guys at volition that uses the same principles to combine MSAA with 2 pass deferred lighting. They decided this warranted a whole new name, dubbing it inferred lighting.
They modify defered lighting to render at different resolutions in the two passes. Specifically, they use a reduced frame buffer (40% or so) for 1st depth/normal pass, and then use a full size MSAA buffer for the 2nd pass. The light buffers are up-sampled using a bilateral technique in the shader of the 2nd pass. By extending this up-sampling technique with some dither knowledge, they also do stippled alpha rendering to get some level of order-independent translucency with deferred shading. Not enough that you could render a full particle system with that path, but enough for a few layers of glass windows or what not.
Their solution is interesting, but it even worsens the performance problems with alpha tested stuff I mentioned in my earlier post on Deferred Shading vs Deferred Lighting - as the 2nd pass has now gotten considerably more expensive due to the bilateral up-sampling filter. And worse, since the buffers mis-match, they can not easily use the 1st pass output to prime the z-buffer for the 2nd pass. (its probably possible to do a conservative screen pass just to populate the HI-Z, but not sure if they are doing that.) Noticeably, Red Faction takes place on mars, so their engine doesn't have to deal with foilage.
Accumulating lighting at lower resolution (or actually better - multiple resolutions) is something I've been thinking about for a while, and is already well tested at least for AO, and they are using this to great effect in Red Faction to get lots of lights per pixel at speed.
But anyway, this motivated me to try and improve my MSAA Z prepass idea to get it down to a single pass with deferred shading, and also to find a fast method of bilateral or depth-sensitive filtering.