Sunday, April 4, 2010

Latency & Human Response Time in current and future games

I'm still surprised at how many gamers and even developers aren't aware that typical games today have total response latencies ranging anywhere from 60-200ms. We tend to think of latencies in terms of pings and the notion that the response time or 'ping' from a computer or a console five feet away can be comparable to the ping of a server a continent away is something of an unnatural notion.

Yet even though it seems odd, its true.

I just read through "
Console Gaming: The Lag Factor", a recent blog article on EuroGamer which follows up on Mick West's original Gamasutra article that pioneered measuring the actual response times of games using a high speed digital camera. For background, I earlier wrote a GDM article (Gaming in the Cloud) that referenced that data and showed how remotely rendered games running in the cloud have the potential to at least match the latency of local console games, primarily by running at a higher FPS.

The eurogamer article alludes to this idea:

In-game latency, or the level of response in our controls, is one of the most crucial elements in game-making, not just in the here and now, but for the future too. It's fair to say that players today have become conditioned to what the truly hardcore PC gamers would consider to be almost unacceptably high levels of latency to the point where cloud gaming services such as OnLive and Gaikai rely heavily upon it.

The average videogame runs at 30FPS, and appears to have an average lag in the region of 133ms. On top of that is additional delay from the display itself, bringing the overall latency to around 166ms. Assuming that the most ultra-PC gaming set-up has a latency less than one third of that, this is good news for cloud gaming in that there's a good 80ms or so window for game video to be transmitted from client to server

Its really interesting to me that the author assumes that "ultra-PC" gaming set up has a latency less than one third of a console - even though the general model developed in the article posits that their is no fundamental difference between PCs and consoles in terms of latency - other than framerate.

In general, the article shows that games have inherent delay measured in frames - the minimum seems to be about 3 frames of delay, but can go up to 5 for some games. The total delay in time units is simply N/F, the number of frames of delay over the frame rate. A simple low-delay app will typically have the minimum delay - about 3, which maps to around 50ms running at 60fps and 100ms at 30fps.

There is no fundemental difference between consoles and PC's in this regard other than framerate - the PC version of a game running at 60fps will have the same latency as its console sibling running at 60fps. Of course, take a 30fps console game and run it at 60fps and you halve the latency - and yes this exactly what cloud gaming services can exploit.

The eurogamer article was able to actually measure just that - proving this model with some real world data. The author was able to use the vsync feature in bioshock to measure the response difference between 59fps and 30fps, and as expected, the 59fps had just about half the latency.

The other assertion of the article - or rather the whole point of the article - was that low response times are really important for the 'feel' of a game. So I'd like to delve into this in greater detail. As a side note though, the fact that delay needs to be measured for most games to make any sort of guess about its response time tells you something.

Firstly, on the upper end of the spectrum, developers and gamers know from 1st hand experience that there definitely is an upper window to tolerable latency, although it depends on the user action. For most games, controlling the camera with a joypad or mouse feels responsive with a latency of up to 150ms. You might think that the mouse control would be considerably more demanding in this regard, but the data does not back that up - I assert that PC games running at 30fps have latencies in the same 133-150ms window as 30fps console games, and are quite playable at that fps (and some have even shipped capped at 30fps).

There is a legitimate reason for a PC gamer to try to minimize their system latency as much as possible for competitive multiplayer gaming, especially twitch shooters like counterstrike. A system running with vsync off at 100fps might have latencies under 50ms and will give you a considerable advantage over an opponent running at 30fps with 133-150ms of base system latency - no doubt about that.

But what I'm asserting is that most gamers will barely - if at all - be able to notice the difference of delay times under 100ms in typical scenarios in FPS and action games - whether using a gamepad or mouse and keyboard. As the delay times exceed some threshold they become increasingly noticeable - 200ms of delay is noticeable to most users, and 300ms becomes unplayable. That being said, variation in the delay is much more noticeable. The difference between a perfectly consistent 30fps and 60fps is difficult to perceive, but an inconsistent 30fps is quite noticeable - the spike or changes in response time from frame to frame themselves are neurologically relevant and quite detectable. This is why console developers spend a good deal of time to optimize the spike frames and hit a smooth 30fps.

There is however a class of actions that do have a significantly lower latency threshold - the simple action of moving a mouse cursor around on the screen! Here I have some 1st hand data. A graphics app which renders its own cursor, has little buffering and runs at 60fps will have about 3 frames or about 50ms of lag, and at that low level of delay the cursor feels responsive. However if you take that same app and slow it down to 30fps, or even add just a few more frames of latency at 60fps the cursor suddenly seems to lag behind. The typical solution is to use the hardware cursor feature which short circuits the whole rendering pipeline and provides a direct fast path to pipe mouse data to the display - which seems to be under 50ms. For the low-latency app running at 60fps, the hardware cursor isn't necessary, but it becomes suddenly important at some threshold around 70-90ms.

I think that this is the real absolute lower limit of human ability to perceive delay.

Why is there such a fundamental limit? In short: the limitations of the brain.

Ponder for a second what it actually means for the brain to notice delay in a system. The user initiates an action and sometime later this results in a response, and if that response is detected as late, the user will notice delay. Somewhere, a circuit (or potentially circuits, but for our purposes this doesn't matter) in the brain makes a decision to initiate an action, this message must then propagate down to muscles in the hand where it then enters the game system through the input device. Meanwhile in the brain, the decision circuit must also send messages to the visual circuits of the form "I have initiated action and am expecting a response - please look for this and notify me immediately on detection". Its easier to imagine the brain as a centralized system like a single CPU, but it is in fact the exact opposite - massively distributed - a network really on the scale of the internet itself - and curiously for our discussion, with latencies comparable to the internet itself.

Neurons can fire only as fast as about 10ms typically, perhaps as quickly as 5ms in some regions. The fastest neural conduits - myelinated fiber - can send signals from the brain to the fingertip (one way) in about 20ms. So now imagine using these slow components to build a circuit that could detect a timing delay in as quickly as 60ms.

Lets start with the example of firing a gun. At a minimum, we have some computation to decide to fire, and once this happens the message can be sent down to the fingertip to pull the trigger and start the process. At the same time, for the brain to figure out if the gun actually fired in time, the message must also be sent down to the visual circuits, where the visual circuits must process the visual input stream and determine if the expected response exists (a firing gun), this information can then be sent to some higher circuit which can then compute whether the visual response (gun firing response pattern exists or not at this moment in time) matches the action initiated (the brain sent a firing signal to the finger at this moment in time).

Built out of slow 10ms neurons, this circuit is obviously going to have alot of delay of its own which is going to place some limits its response time and ability to detect delay. Thinking of the basic neuron firing system as the 'clock rate' and the brain as a giant computer (which it is in the abstract sense), it appears that the brain can compute some of these quick responses in as little as around a dozen 'clock cycles'. This is pretty remarkable, even given that the brain has trillions of parallel circuits. But anyway, the brain could detect even instantaneous responses if it had the equivalent of video buffering. In other words, if the brain could compensate for its own delay, it could detect delays in the firing response on timescales shorter than its own response time. For this to happen though, the incoming visual data would need to be buffered in some form. The visual circuits, instead of being instructed to signal upon detection of a firing gun, could be instructed to search for a gun firing X ms in the past. However, to do this they would need some temporal history - the equivalent of a video buffer. There's reasons to believe some type of buffering does exist in the brain, but with limitations - its nothing like a computer video buffer.

The other limitation to the brain's ability to detect delays is the firing times of neurons themselves which make it difficult to detect timings on scales approaching the neuron firing rate.
But getting back to the visual circuits, the brain did not evolve to detect lag in video games or other systems. Just because its theoretically possible that a neural circuit built out of relatively slow components could detect fact responses by compensating for its own processing delay does not mean that the brain actually does this. The quick 'twitch' circuits we are talking about evolved to make rapid decisions - things like: detect creature, identify as prey or predator, and initiate flight or fight. These quick responses involve rapid pattern recognition, classification, and decision making, all in real-time. However, the quick response system is not especially concerned with detecting exactly when an event occurred, its optimized for the problem of reacting to events rapidly and correctly. Detecting if your body muscles reacted to the run command at the right time is not the primary function of these circuits - it is to detect the predator threat and initiate the correct run response rapidly. The insight and assertion I'm going to make is that our ability to detect delays in other systems (such as video games) is only as good as our brain's own quick response time - because it uses the same circuits. Psychological tests show the measured response time is around ~200ms for many general tasks, probably getting a little lower for game-like tasks with training. A lower bound of around 100-150ms for complex actions like firing guns and moving cameras seems reasonable for experienced players.

For moving a mouse cursor, the response time appears to be lower, perhaps 60-90ms. From this brain model, we can expect that for a few reasons. Firstly, the mouse cursor is very simple and very small, and once the visual system is tracking it we can expect that detecting changes in its motions (to verify that its moving as intended) is computationally simple and can be performed in the minimal number of steps. Detecting that the entire scene moved in the correct direction, or that the gun is in its firing animation state are far more complex pattern recognition tasks, and we can expect they would take more steps. So detecting mouse motion represents the simplest and fastest type of visual pattern recognition.

There is another factor at work here as well: rapid eye cascades. The visual system actually directs the eye muscles on frame by frame time scales that we don't consciously perceive. When recognizing a face, you may you think you are looking at someone right in the eye, but if you watched a high res video feed of yourself and zoomed in on your eyes in slow motion, you'd see that your eyes are actually making many rapid jumps - leaping from the eyebrow to the lips to the nose and so on. Presumably when moving around a mouse cursor, some of these eye cascades are directed to predicted positions of the mouse to make it easy for the visual system to detect its motion (and thus detect if its lagging).

So in summary, experimental data (from both games and psychological research) leads us to expect that the threshold for human delay detection is around:

300ms> games become unpleasant, even unplayable
200ms> delay becomes palpable
100-150ms - limit of delay detection for full scene actions - camera panning and so on
50-60ms - absolute limit of delay detection - small object tracking - mouse cursors

Delay is a strongly non-linear phenomena, undetectable beyond certain threshold and then ramping up to annoying and then deal breaking soon after. Its not a phenomenon where less is always better. Less beyond a certain point doesn't matter from a user experience point of view. (of course, for competitive twitch gaming, having less delay is definitely advantageous even when you can't notice it - but this isn't relevant for console type systems where everyone has the same delay)

So getting back to the earlier section of this post, if we run a game on a remote pc, what can we expect the total delay to be?

The cloud system has several additional components that can add delay on top of the game itself: video compression, the network, and the client which decompresses the video feed.

Without getting into specifics, what can we roughly expect? Well, even a simple client which just decompresses video is likely to exhibit the typical minimum of roughly 3 frames of lag. Lets assume the video compression can be done in a single frame and the network and buffering adds another, we are looking then at roughly 5 frames of additional lag with a low ping to the server - with some obvious areas that could be trimmed further.

If everything is running at 60, a low latency game (3 frames of internal lag), might exhibit around 8/60 or 133ms of latency, and a higher latency game (5 frames of internal lag), might exhibit 10/60 or 166ms of latency. So it seems reasonable to expect that games running at 60fps remotely can have latencies similar to local games running at 30fps. Ping to the server then does not represent even the majority of the lag, but obviously can push the total delay into the unplayable as the ping grows - and naturally every frame of delay saved allows the game to be playable at the same quality at increasingly greater distances from the server.

What are the next obvious areas of improvement? You could squeeze and save additional frames here and there (the client perhaps could be optimized down to 2 frames of delay - something of a lower limit though), but the easiest way to further lower the latency is just to double the FPS again.

120 fps may seem like alot, but it also happens to be a sort of requirement for 3D gaming, and is the direction that all new displays are moving. At 120fps, the base lag in such an example would be around 8/120 to 10/120, or around 66ms to 83ms of latency, comparable to 60fps console games running locally. This also hints that a remotely rendered mouse cursor would be viable at such high FPS. At 120fps, you could have a ping as high as 100ms and still get an experience comparable to a local console .

This leads to some interesting rendering directions if you start designing for 120fps and 3D, instead of the 30fps games are typically designed for now. The obvious optimization for 120fps and 3D is to take advantage of the greater inter-frame coherence. Reusing shading, shadowing, lighting and all that jazz from frame to frame has proportionately greater advantage at high FPS as the scene will change proportionately less between frames. Likewise, the video compression work and bitrate scales sublinearly, and actually increases surprisingly slowly as you double the framerate.