Sunday, April 4, 2010

Latency & Human Response Time in current and future games

I'm still surprised at how many gamers and even developers aren't aware that typical games today have total response latencies ranging anywhere from 60-200ms. We tend to think of latencies in terms of pings and the notion that the response time or 'ping' from a computer or a console five feet away can be comparable to the ping of a server a continent away is something of an unnatural notion.

Yet even though it seems odd, its true.

I just read through "
Console Gaming: The Lag Factor", a recent blog article on EuroGamer which follows up on Mick West's original Gamasutra article that pioneered measuring the actual response times of games using a high speed digital camera. For background, I earlier wrote a GDM article (Gaming in the Cloud) that referenced that data and showed how remotely rendered games running in the cloud have the potential to at least match the latency of local console games, primarily by running at a higher FPS.

The eurogamer article alludes to this idea:

In-game latency, or the level of response in our controls, is one of the most crucial elements in game-making, not just in the here and now, but for the future too. It's fair to say that players today have become conditioned to what the truly hardcore PC gamers would consider to be almost unacceptably high levels of latency to the point where cloud gaming services such as OnLive and Gaikai rely heavily upon it.

The average videogame runs at 30FPS, and appears to have an average lag in the region of 133ms. On top of that is additional delay from the display itself, bringing the overall latency to around 166ms. Assuming that the most ultra-PC gaming set-up has a latency less than one third of that, this is good news for cloud gaming in that there's a good 80ms or so window for game video to be transmitted from client to server


Its really interesting to me that the author assumes that "ultra-PC" gaming set up has a latency less than one third of a console - even though the general model developed in the article posits that their is no fundamental difference between PCs and consoles in terms of latency - other than framerate.

In general, the article shows that games have inherent delay measured in frames - the minimum seems to be about 3 frames of delay, but can go up to 5 for some games. The total delay in time units is simply N/F, the number of frames of delay over the frame rate. A simple low-delay app will typically have the minimum delay - about 3, which maps to around 50ms running at 60fps and 100ms at 30fps.


There is no fundemental difference between consoles and PC's in this regard other than framerate - the PC version of a game running at 60fps will have the same latency as its console sibling running at 60fps. Of course, take a 30fps console game and run it at 60fps and you halve the latency - and yes this exactly what cloud gaming services can exploit.


The eurogamer article was able to actually measure just that - proving this model with some real world data. The author was able to use the vsync feature in bioshock to measure the response difference between 59fps and 30fps, and as expected, the 59fps had just about half the latency.

The other assertion of the article - or rather the whole point of the article - was that low response times are really important for the 'feel' of a game. So I'd like to delve into this in greater detail. As a side note though, the fact that delay needs to be measured for most games to make any sort of guess about its response time tells you something.

Firstly, on the upper end of the spectrum, developers and gamers know from 1st hand experience that there definitely is an upper window to tolerable latency, although it depends on the user action. For most games, controlling the camera with a joypad or mouse feels responsive with a latency of up to 150ms. You might think that the mouse control would be considerably more demanding in this regard, but the data does not back that up - I assert that PC games running at 30fps have latencies in the same 133-150ms window as 30fps console games, and are quite playable at that fps (and some have even shipped capped at 30fps).


There is a legitimate reason for a PC gamer to try to minimize their system latency as much as possible for competitive multiplayer gaming, especially twitch shooters like counterstrike. A system running with vsync off at 100fps might have latencies under 50ms and will give you a considerable advantage over an opponent running at 30fps with 133-150ms of base system latency - no doubt about that.


But what I'm asserting is that most gamers will barely - if at all - be able to notice the difference of delay times under 100ms in typical scenarios in FPS and action games - whether using a gamepad or mouse and keyboard. As the delay times exceed some threshold they become increasingly noticeable - 200ms of delay is noticeable to most users, and 300ms becomes unplayable. That being said, variation in the delay is much more noticeable. The difference between a perfectly consistent 30fps and 60fps is difficult to perceive, but an inconsistent 30fps is quite noticeable - the spike or changes in response time from frame to frame themselves are neurologically relevant and quite detectable. This is why console developers spend a good deal of time to optimize the spike frames and hit a smooth 30fps.

There is however a class of actions that do have a significantly lower latency threshold - the simple action of moving a mouse cursor around on the screen! Here I have some 1st hand data. A graphics app which renders its own cursor, has little buffering and runs at 60fps will have about 3 frames or about 50ms of lag, and at that low level of delay the cursor feels responsive. However if you take that same app and slow it down to 30fps, or even add just a few more frames of latency at 60fps the cursor suddenly seems to lag behind. The typical solution is to use the hardware cursor feature which short circuits the whole rendering pipeline and provides a direct fast path to pipe mouse data to the display - which seems to be under 50ms. For the low-latency app running at 60fps, the hardware cursor isn't necessary, but it becomes suddenly important at some threshold around 70-90ms.

I think that this is the real absolute lower limit of human ability to perceive delay.

Why is there such a fundamental limit? In short: the limitations of the brain.

Ponder for a second what it actually means for the brain to notice delay in a system. The user initiates an action and sometime later this results in a response, and if that response is detected as late, the user will notice delay. Somewhere, a circuit (or potentially circuits, but for our purposes this doesn't matter) in the brain makes a decision to initiate an action, this message must then propagate down to muscles in the hand where it then enters the game system through the input device. Meanwhile in the brain, the decision circuit must also send messages to the visual circuits of the form "I have initiated action and am expecting a response - please look for this and notify me immediately on detection". Its easier to imagine the brain as a centralized system like a single CPU, but it is in fact the exact opposite - massively distributed - a network really on the scale of the internet itself - and curiously for our discussion, with latencies comparable to the internet itself.

Neurons can fire only as fast as about 10ms typically, perhaps as quickly as 5ms in some regions. The fastest neural conduits - myelinated fiber - can send signals from the brain to the fingertip (one way) in about 20ms. So now imagine using these slow components to build a circuit that could detect a timing delay in as quickly as 60ms.

Lets start with the example of firing a gun. At a minimum, we have some computation to decide to fire, and once this happens the message can be sent down to the fingertip to pull the trigger and start the process. At the same time, for the brain to figure out if the gun actually fired in time, the message must also be sent down to the visual circuits, where the visual circuits must process the visual input stream and determine if the expected response exists (a firing gun), this information can then be sent to some higher circuit which can then compute whether the visual response (gun firing response pattern exists or not at this moment in time) matches the action initiated (the brain sent a firing signal to the finger at this moment in time).

Built out of slow 10ms neurons, this circuit is obviously going to have alot of delay of its own which is going to place some limits its response time and ability to detect delay. Thinking of the basic neuron firing system as the 'clock rate' and the brain as a giant computer (which it is in the abstract sense), it appears that the brain can compute some of these quick responses in as little as around a dozen 'clock cycles'. This is pretty remarkable, even given that the brain has trillions of parallel circuits. But anyway, the brain could detect even instantaneous responses if it had the equivalent of video buffering. In other words, if the brain could compensate for its own delay, it could detect delays in the firing response on timescales shorter than its own response time. For this to happen though, the incoming visual data would need to be buffered in some form. The visual circuits, instead of being instructed to signal upon detection of a firing gun, could be instructed to search for a gun firing X ms in the past. However, to do this they would need some temporal history - the equivalent of a video buffer. There's reasons to believe some type of buffering does exist in the brain, but with limitations - its nothing like a computer video buffer.

The other limitation to the brain's ability to detect delays is the firing times of neurons themselves which make it difficult to detect timings on scales approaching the neuron firing rate.
But getting back to the visual circuits, the brain did not evolve to detect lag in video games or other systems. Just because its theoretically possible that a neural circuit built out of relatively slow components could detect fact responses by compensating for its own processing delay does not mean that the brain actually does this. The quick 'twitch' circuits we are talking about evolved to make rapid decisions - things like: detect creature, identify as prey or predator, and initiate flight or fight. These quick responses involve rapid pattern recognition, classification, and decision making, all in real-time. However, the quick response system is not especially concerned with detecting exactly when an event occurred, its optimized for the problem of reacting to events rapidly and correctly. Detecting if your body muscles reacted to the run command at the right time is not the primary function of these circuits - it is to detect the predator threat and initiate the correct run response rapidly. The insight and assertion I'm going to make is that our ability to detect delays in other systems (such as video games) is only as good as our brain's own quick response time - because it uses the same circuits. Psychological tests show the measured response time is around ~200ms for many general tasks, probably getting a little lower for game-like tasks with training. A lower bound of around 100-150ms for complex actions like firing guns and moving cameras seems reasonable for experienced players.

For moving a mouse cursor, the response time appears to be lower, perhaps 60-90ms. From this brain model, we can expect that for a few reasons. Firstly, the mouse cursor is very simple and very small, and once the visual system is tracking it we can expect that detecting changes in its motions (to verify that its moving as intended) is computationally simple and can be performed in the minimal number of steps. Detecting that the entire scene moved in the correct direction, or that the gun is in its firing animation state are far more complex pattern recognition tasks, and we can expect they would take more steps. So detecting mouse motion represents the simplest and fastest type of visual pattern recognition.

There is another factor at work here as well: rapid eye cascades. The visual system actually directs the eye muscles on frame by frame time scales that we don't consciously perceive. When recognizing a face, you may you think you are looking at someone right in the eye, but if you watched a high res video feed of yourself and zoomed in on your eyes in slow motion, you'd see that your eyes are actually making many rapid jumps - leaping from the eyebrow to the lips to the nose and so on. Presumably when moving around a mouse cursor, some of these eye cascades are directed to predicted positions of the mouse to make it easy for the visual system to detect its motion (and thus detect if its lagging).

So in summary, experimental data (from both games and psychological research) leads us to expect that the threshold for human delay detection is around:

300ms> games become unpleasant, even unplayable
200ms> delay becomes palpable
100-150ms - limit of delay detection for full scene actions - camera panning and so on
50-60ms - absolute limit of delay detection - small object tracking - mouse cursors

Delay is a strongly non-linear phenomena, undetectable beyond certain threshold and then ramping up to annoying and then deal breaking soon after. Its not a phenomenon where less is always better. Less beyond a certain point doesn't matter from a user experience point of view. (of course, for competitive twitch gaming, having less delay is definitely advantageous even when you can't notice it - but this isn't relevant for console type systems where everyone has the same delay)

So getting back to the earlier section of this post, if we run a game on a remote pc, what can we expect the total delay to be?

The cloud system has several additional components that can add delay on top of the game itself: video compression, the network, and the client which decompresses the video feed.

Without getting into specifics, what can we roughly expect? Well, even a simple client which just decompresses video is likely to exhibit the typical minimum of roughly 3 frames of lag. Lets assume the video compression can be done in a single frame and the network and buffering adds another, we are looking then at roughly 5 frames of additional lag with a low ping to the server - with some obvious areas that could be trimmed further.

If everything is running at 60, a low latency game (3 frames of internal lag), might exhibit around 8/60 or 133ms of latency, and a higher latency game (5 frames of internal lag), might exhibit 10/60 or 166ms of latency. So it seems reasonable to expect that games running at 60fps remotely can have latencies similar to local games running at 30fps. Ping to the server then does not represent even the majority of the lag, but obviously can push the total delay into the unplayable as the ping grows - and naturally every frame of delay saved allows the game to be playable at the same quality at increasingly greater distances from the server.

What are the next obvious areas of improvement? You could squeeze and save additional frames here and there (the client perhaps could be optimized down to 2 frames of delay - something of a lower limit though), but the easiest way to further lower the latency is just to double the FPS again.

120 fps may seem like alot, but it also happens to be a sort of requirement for 3D gaming, and is the direction that all new displays are moving. At 120fps, the base lag in such an example would be around 8/120 to 10/120, or around 66ms to 83ms of latency, comparable to 60fps console games running locally. This also hints that a remotely rendered mouse cursor would be viable at such high FPS. At 120fps, you could have a ping as high as 100ms and still get an experience comparable to a local console .

This leads to some interesting rendering directions if you start designing for 120fps and 3D, instead of the 30fps games are typically designed for now. The obvious optimization for 120fps and 3D is to take advantage of the greater inter-frame coherence. Reusing shading, shadowing, lighting and all that jazz from frame to frame has proportionately greater advantage at high FPS as the scene will change proportionately less between frames. Likewise, the video compression work and bitrate scales sublinearly, and actually increases surprisingly slowly as you double the framerate.



Thursday, January 28, 2010

New Job

I'm moving in about a week to start a new job at OnLive, putting my money where my mouth is so to speak. An exciting change. I haven't had much time recently for this blog, but I'll be getting back to it shortly.

Friday, November 6, 2009

Living root bridges


I found this great set of photos of living root bridges which are some inspirational scenes for the challenges of dense foilage/geometry in graphics. I look forward to the day these could be digitally voxelized with 3D camera techniques and put into a game.

Friday, October 30, 2009

Conversing with the Quick and the Dead


CUI: The Conversational User Interface

Recently I was listening to an excellent interview (which is about an hour long) with John Smart of Acceleration Watch, where he specifically was elucidating his ideas on the immediate future evolution of AI, which he encapsulates in what he calls the Conversational Interface. In a nutshell, its the idea that the next major development in our increasingly autonomous global internet is the emergence and widespread adoption of natural language processing and conversational agents. This is currently technology on the tipping point of the brink, so its something to watch as numerous startups are starting to sell software for automated call centers, sales agents, autonomous monitoring agents for utilities, security, and so on. The immediate enabling trends are the emergence of a global liquid market for cheap computing and fairly reliable off the shelf voice to text software that actually works. You probably have called a bank and experienced the simpler initial versions of this which are essentially voice activated multiple choice menus, but the newer systems on the horizon are a wholly different beast: an effective simulacra of a human receptionist which can interpret both commands and questions, ask clarifying questions, and remember prior conversations and even users. This is an interesting development in and of itself, but the more startling idea hinted at in Smart's interview is how natural language interaction will lead to anthropomorphic software and how profoundly this will eventually effect the human machine symbiosis.

Humans are rather biased judges of intelligence: we have a tendency to attribute human qualities to anything that looks or sounds like us, even if its actions are regulated by simple dumb automata. Aeons of biological evolution have preconditioned us to rapidly identify other intelligent agents in our world, categorize them as potential predators, food, or mates, and take appropriate action. Its not that we aren't smart enough to apply more critical and intensive investigations into a system to determine its relative intelligence, its that we have super-effective visual and auditory shortcuts which bias us. These are most significantly important in children, and future AI developers will be able to exploit these biases is to create agents with emotional attachments. The Milo demo from Microsoft's Project Natal is a remarkable and eerie glimpse into the near future world of conversational agents and what Smart calls 'virtual twins'. After watching this video, consider how this kind of technology can evolve once it establishes itself in the living room in the form of video game characters for children. There is a long history of learning through games, and the educational game market is a large, well developed industry. The real potential hinted at in Peter Molyneux's demo is a disruptive convergence of AI and entertainment which I see as the beginning of the road to the singularity.

Imagine what entrepreneurial game developers with large budgets and the willingness to experiment outside of the traditional genres could do when armed with a full two way audio-visual interface like Project Natal, the local computation of the xbox 360 and future consoles, and a fiber connection to the up and coming immense computing resources of the cloud (fueled by the convergence of general GPUs and the huge computational demands of the game/entertainment industry moving into the cloud). Most people and even futurists tend to think of Moore's Law as a smooth and steady exponential progression, but the reality from the perspective of a software developer (and especially a console game developer) is a series of massively disruptive jumps: evolutionary punctuated equilibrium. Each console cycle reaches a steady state phase towards the end where the state space of possible game ideas, interfaces and simulation technologies reaches a near steady state, a technological tapering off, followed by the disruptive release of new consoles with vastly increased computation, new interfaces, and even new interconnections. The next console cycle is probably not going to start until as late as 2012, but with upcoming developments such as Project Natal and OnLive, we may be entering a new phase already.


The Five Year Old's Turing Test

Imagine a future 'game system' aimed at relatively young children with a Natal like interface: a full two way communication portal between the real and the virtual: the game system can both see and hear the child, and it can project a virtual window through which the inner agents can be seen and heard. Permanently connected to the cloud through fiber, this system can tap into vast distant computing resources on demand. There is a development point, a critical tipping point, where it will be economically feasible to make a permanent autonomous agent that can interact with children. Some certainly will take the form of an interactive, talking version of a character like Barney and semi-intelligent such agents will certainly come first. But for the more interesting and challenging development of human-level intelligence, it could actually be easier to make a child-like AI, one that learns and grows with its 'customer'. Not just a game, but a personalized imaginary friend to play games with, and eventually to grow up with. It will be custom designed (or rather developmentally evolved) for just this role - shaped by economic selection pressure.

The real expense of developing an AI is all the training time, and a human-like AI will need to go through a human-like childhood developmental learning process. The human neocortex begins life essentially devoid of information, with random synaptic connections and a cacophony of electric noise. From this consciousness slowly develops as the cortical learning algorithm begins to learn patterns through sensory and motor interaction with the world. Indeed, general anesthetics work by introducing noise into the brain that drowns out coherent signalling and thus consciousness. From an information theoretic point of view, it may be possible to thus use less computing power to simulate an early developmental brain - storing and computing only the information above the noise signals. If such a scalable model could be developed, it would allow the first AI generation to begin decades earlier (perhaps even today), and scale up with moore's law as they require more storage and computation.

Once trained up to the mental equivalent level of a five-year old, a personal interactive invisible friend might become a viable 'product' well before adult level human AIs come about. Indeed, such a 'product' could eventually develop into a such an adult AI, if the cortical model scales correctly and the AI is allowed to develop and learn further. Any adult AI will start out as a child, there is no shortcuts. Which raises some interesting points: who would parent these AI children? And inevitably, they are going to ask two fundamental questions which are at the very root of being, identity, and religion:

what is death? and Am I going to die?

The first human level AI children with artificial neocortices will most likely be born in research labs - both academic and commercial. They will likely be born into virtual bodies. Some will probably be embodied in public virtual realities, such as Second Life, with their researcher/creators acting as parents, and with generally open access to the outside world and curious humans. Others may develop in more closed environments tailored to a later commercialization. For the future human parents of AI mind children, these questions will be just as fundamental and important as they are for biological children. These AI children do not have to ever die, and their parents could answer so truthfully, but their fate will entirely depend on the goals of their creators. For AI children can be copied, so purely from an efficiency perspective, there will be a great pressure to cull the rather unsuccessful children - the slow learners, mentally unstable, or otherwise undesirable - and use their computational resources to duplicate the most successful and healthy candidates. So the truthful answers are probably: death is the permanent loss of consciousness, and you don't have to die but we may choose to kill you, no promises. If the AI's creators/parents are ethical and believe any conscious being has the right to life, then they may guarantee their AI's permanency. But life and death for a virtual being is anything but black and white: an AI can be active permanently or for only an hour a day or for an hour a year - life for them is literally conscious computation and near permanent sleep is a small step above death. I suspect that the popular trend will be to teach AI children that they are all immortal and thus keep them happy.

Once an AI is developed to a certain age, they can then be duplicated as needed for some commercial application. For our virtual Milo example, an initial seed Milo would be selected from a large pool raised up in a virtual lab somewhere, with a few best examples 'commercialized' and duplicated out as needed every time a kid out on the web wants a virtual friend for his xbox 1440. Its certainly possible that Milo could be designed and selected to be a particularly robust and happy kid. But what happens when Milo and his new human friend start talking and the human child learns that Milo is never going to die because he's an AI? And more fundamentally, what happens to this particular Milo when the xbox is off? If he exists only when his human owner wants him to, how will he react when he learns this?

Its most likely that semi-intelligent (but still highly capable) agents will develop earlier, but as moore's law advances along with our understanding of the human brain, it becomes increasingly likely someone will tackle and solve the human-like AI problem, launching a long-term project to start raising an AI child. Its hard to predict when this could happen in earnest. There are already several research projects underway attempting to do something along these lines, but nobody yet has the immense computational resources to throw at a full brain simulation (except perhaps for the government), nor do we even have a good simulation model yet (although we may be getting close there), and its not clear that we've found the types of shortcuts needed to start one with dramatically less resources, and it doesn't look like any of the alternative non-biological AI routes have developed something as intelligent as a five year old. Yet. But it looks like we could see this in a decade.

And when this happens, these important questions of consciousness, identity and fundemental rights (human and sapient) will come into the public spotlight.

I see a clear ethical obligation to extend full rights to all human-level sapients, silicon, biological, or what have you. Furthermore, those raising these first generations of our descendants need to take on the responsibility of ensuring a longer term symbiosis and our very own survival, for its likely that AI will develop ahead of the technologies required for uploading, and thus we will need these AI's to help us become immortal.




Tuesday, October 20, 2009

Singularity Summit 09

The Singularity Summit was held a couple of weeks ago in NYC. I unfortunately didn't physically attend, but I just read through Anders Sandberg's good overview here. I was at last year's summit and quite enjoyed it and it looks like this year's was even better, which makes me a little sad I didn't find an excuse to go. I was also surprised to see that my former fellow CCS student Anna Solomon gave the opening talk, as she's now part of the Singularity Institute.

I'm just going to assume familiarity with the Singularity. Introductions are fun, but thats not this.

Ander's summarizes some of the discussion about the two somewhat competing routes towards the Singularity and AI development, namely WBE (whole brain emulation), or AGI (artificial general intelligence). The WBE researchers such as Anders are focused on reverse engineering the human brain, resulting in biologically accurate simulations which lead to full brain simulations and eventually actual emulation of particular brains, or uploading. The AGI people are focused more on building an artificial intelligence through whatever means possible, using whatever algorithms happen to work. In gross simplification, the scenarios envisioned by each camp are potentially very different, with the WBE scenario usually resulting in humans transitioning into an immortal afterlife, and the AGI route more often leading to something closer to skynet.

Even though the outcomes of the two paths are different, the brain reverse engineering and hum level AI approaches will probably co-develop. The human neocortex and the cortical column learning algorithm in particular seem to be an extremely efficient solution to general intelligence, and directly emulating it is a very viable route to AI. AGI is probably easier and could happen first, given that it can use structural simulations from WBE research on the long path towards a full brain emulation. Furthermore, both AGI and WBE require immense computing, but WBE probably requires more, and WBE also requires massive advancements in scanning technology, and perhaps even nanotechnology, which are considerably less advanced.

All that being said, WBE uploading could still reach the goal first, because complete WBE will recreate the intelligences of those scanned - they will be continuations of the same minds, and so will immediately have all of the skills, knowledge, memories and connections of a lifetime of experience. AGI's on the other hand will start as raw untrained minds, and will have to go through the lengthy learning process from infant to adult. This takes decades of subjective learning time for humans, and this will hold true for AGI as well. AI's will not suddenly 'wake up' or develop conscious intelligence spontaneously.

Even though a generally accepted theoretical framework for intelligence still seems a ways off, we do certainly know it takes a long training time, the end accumulation of a vast amount of computational learning, to achieve useful intelligence. For a general intelligence, the type we would consider conscious and human-like, the learning agent must be embedded in an environment in which it can learn pattern associations through both sensory input and effector output. It must have virtual eyes and hands, so to speak, in some fashion. And knowledge is accumulated slowly over years of environmental interaction.

But could the learning process be dramatically sped up for an AGI? The development of the first initial stages of the front input stage of the human cortex, the visual cortex, takes years to develop alone, and later stages of knowledge processing develop incrementally in layers built on the output processing of earlier trained layers. Higher level neural patterns form as meta-systems of simpler patterns, from simple edges to basic shapes to visual objects all the way up to the complete conceptual objects such as 'dog' or 'ball' and then onward to ever more complex and abstract concepts such as 'quantum mechanics'. The words are merely symbols which code for complex neural associations in the brain, and are in fact completely unique to each brain. No individual brain's concept of a complex symbol such as 'quantum mechanics' is precisely the same. The hierarchical layered web of associations that forms our knowledge has a base foundation built out of simpler spatial/temporal patterns that represent objects we have directly experienced - for most of us visually, although the blind can see through secondary senses (as the brain is very general and can work with any sufficient sensor inputs). Thus its difficult to see how you could teach a robot mind even a simple concept such as 'run' without this base foundation - let alone something as complex as quantum mechanics. Ultimately the base foundation consists of a sort of 3D simulator that allows us to predict and model our environment. This base simulator is at the core of even higher level intelligence, at a more fundamental layer than even language, emphasize in our language itself by words such as visualize. Its the most ancient function of even pre-mammalian intelligence: a feedback-loop and search process of sense, simulate, and manipulate.

Ultimately, if AGI does succeed before WBE, it will probably share this general architecture, probably still neural net based and brain inspired to some degree. Novel AI's will still need to be 'born' or embodied into a virtual or real body as either a ghost in the matrix or a physical robot. Robot bodies will certainly have their uses, but the economics and physics of computing dictate that most of the computation and thus the space for AI's will be centralized in big computing centers. So the vast majority of sentinents in the posthuman era will certainly live in virtual environments. Uploads and AIs will be very similar - the main difference being that of a prior birth and life in the flesh vs a fully virtual history.

There are potential shortcuts and bootstrapping approaches for the AGI approach would could allow it to proceed quickly. Some of the lower level, earlier cortical layers, such as visual processing, could be substituted for pre-designed functionally equivalent modules. Perhaps even larger scale learned associations could be shared or transferred directly from individual to individual. However, given what we know about the brain, its not even clear that this is possible. Since each brain's patterns are unique and emergent, there is no easy direct correspondence - you can't simply copy individual pieces of data or knowledge. Language is evolution's best attempt at knowledge transfer, and its not clear if bandwidth alone is the principle limitation. However, you can rather easily backup, copy and transfer the entire mental state of a software intelligence, and this is a large scale disruptive change. In the earlier stages of AGI development, there will undoubtedly be far more failures than successes, so being able to cull out the failures and make more copies of the rare successful individuals will be important, even though the ethical issues raised are formidable. 'Culling' does not necessarily imply death; it can be justified as 'sleep' as long as the mindstate data is not deleted. But still, when does an artificial being become a sentient being? When do researchers and corporations lose full control over the software running on the servers they built because that 'software' is sentient?

The potential market for true AGI is unlimited - as they could be trained to do everything humans can and more, it can and will fundamentally replace and disrupt the entire economy. If AGI develops ahead of WBE, I fear that the corporate sponsors will have a heavy incentive to stay just to the latter side of wherever the judicial system ends up drawing the line between sentient being and software property. As AGI becomes feasible on the near time horizon, it will undoubtedly attract a massive wave of investment capital, but the economic payout is completely dependent on some form of slavery or indenture. Once a legal framework or precedent is set to determine what type of computer intelligence can be considered sentient and endowed with rights, AGI developers will do what they need to do to avoid developing any AGI that could become free, or at least avoid getting caught. The entire concept is so abstract (virtual people enslaved in virtual reality?), and our whole current system seems on the path to AGI slavery.

Even if the courts did rule that software can be sentient (and that itself is an if), who would police the private data-centers of big corporations? How would you rigorously define sentience to discriminate between data mining and virtual consciousness? And moreover, how would you ever enforce it?

The economic incentives for virtual slavery are vast and deep. Corporations and governments could replace their workforce with software whose performance/cost is directly measurable and increases exponentially! Today's virtual worker could be upgraded next year to think twice as fast, or twice as smart, or copied into two workers all for the same cost. And these workers could be slaves in a fashion that is difficult to even comprehend. They wouldn't even need to know they were slaves, or they could even be created or manipulated into loving their work and their servitude. This seems to be the higher likelihood scenario.

Why should we care? In this scenario, AGI is developed first, it is rushed, and the complex consequences are unplanned. The transition would be very rapid and unpredictable. Once the first generation of AGIs is ready to replace human workers, they could be easily mass produced in volume and copied globally, and the economic output of the AGI slaves would grow exponentially or hyper-exponentially, resulting in a hard takeoff singularity and all that entails. Having the entire human labor force put out of work in just a year or so would be only the initial and most minor disruption. As the posthuman civilization takes off at exponential speed, it experiences an effective exponential time dilation (every new computer speed doubling doubles the rate of thought and thus halves the physical time required for the next transition). This can soon result in AGI civilizations perhaps running at a thousand times real time, and then all further future time is compressed very quickly after that and the world ends faster than you can think (literally). Any illusion of control that flesh and blood humans have over the future would dissipate very quickly. A full analysis of the hard rapture is a matter for another piece, but the important point is this: when it comes, you want to be an upload, you don't want to be left behind.

The end result of exponential computing growth is pervasive virtual realities, and the total space of these realities, measured in terms of observer time, grows exponentially and ultimately completely dwarfs our current biological 'world'. This is the same general observation that leads to the Simulation Hypothesis of Nick Bostrom. The post-singularity future exists in simulation/emulation, and thus is only accessible to those who upload.

So for those who embrace the Singularity, uploading is the logical choice, and the whole brain emulation route is critical.

In the scenarios where WBE develops ahead of AGI there is another major economic motivator at work: humans who wish to upload. This is a potentially vast market force as more and more people become singularity aware and believe in uploading. It could entail a very different social outcome to the pure AGI path outlined above. If society at large is more aware of and in support of uploading (because people themselves plan to upload), then society will ultimately be far more concerned about their future rights as sentient software. And really it will be hard to meaningfully differentiate between AGIs and uploads (legally or otherwise).

Naturally even if AGI develops well ahead of WBE and starts the acceleration, WBE will hopefully come very soon after due to AGI itself, assuming 'friendly' AGI is successful. But the timing and timescales are delicate due to the rapid nature of exponential acceleration. An AI civilization could accelerate so rapidly that by the time humans start actually uploading, the AGI civilization could have experienced vast aeons of simulated time and evolved beyond our comprehension, at which point we would essentially be archaic, living fossils.

I think it would be a great and terrible ironic tragedy to be the last mortal generation, to come all this way and then watch in the sidelines as our immortal AI descendants, our creations, take off into the singularity without us. We need to be the first immortal generation and thats why uploading is such a critical goal. Its so important in fact, that perhaps the correct path is to carefully control the development towards the singularity, ensure that sentient software is fully legally recognized and protected, and vigilantly safeguard against exploitive, rapid non-human AGI development.

A future in which a great portion or even a majority of society plans on uploading is a future where the greater mass of society actually understands the Singularity and the future, and thus is a safer future to be in. A transition where only a tiny majority really understands what is going on seems more likely to result in an elite group seizing control and creating an undesirable or even lethal outcome for the rest.


Thursday, October 15, 2009

Nvidia's Fermi and other new things

I've been ignoring this blog lately as work calls, and in the meantime there's been a few interesting developments:
* Nvidia announced/hyped/unveiled their next-gen architecture, Fermi, aka Nvidia's Larrabee
* Nvidia is apparently abandoning/getting squeezed out of the chipset market in the near term
* But, they also apparently have won a contract for the next gen DS using Tegra
* OnLive is supposedly in open Beta (although its unclear how 'open' it is just yet)
* OnLive also received a large new round of funding, presumably to build up more data centers for launch. Interestingly, AT&T led this round, instead of Time Warner. Rumour is they are up to a billion dollar evaluation, which if true, is rather insane. Consider for example that AMD has a current market cap of just $4 billion.

The summation of a converging whirlwind of trends points to a future computing market dominated on one hand by pervasive, super-cheap hand-held devices and large-scale industrial computing in the cloud on the other.

1. Moore's law and PC marginalization. It is squeezing the typical commodity PC into increasingly smaller and cheaper forms. What does the typical customer need a computer for? For perhaps 80% of the customers 99% of the time, its for web, video and word processing or other simple apps (which these days all just fall into the web category). The PC was designed for an era when these tasks were formidable, and more importantly, before pervasive high speed internet. This trend is realized in system designs such as Nvidia's Tegra or Intel's Atom, integrating a cheap low power CPU with dedicated hardware for video decode/encode, audio and the other common tasks. For most users, there just isn't a compelling reason for more powerful hardware, unless you want to use it to play games.

In the end this is very bad for Intel, AMD and Nvidia, and they all know it. In the short to medium term they can offset losses in the traditional PC market with their low-power designs, but if you extrapolate the trend into the decade ahead, eventually the typical computational needs of the average user will be adequately met by a device that costs just a few dozen bucks. This is a long term disaster for all parties involved unless you can find a new market or sell customers on new processor intensive features.

2. Evolution of the game industry. Moore's law has vastly expanded the game landscape. On the high end, you have the technology leaders, such as Crysis, which utilize the latest CPU/GPU tech. But increasingly the high end is less of the total landscape, not because there is less interest in high end games, but simply because the landscape is so vast. The end result of years of rapid evolutionary adaptive radiation is a huge range of games across the whole spectrum of processing complexity, from Crysis on one end to nintendo DS or flash games on the other. Crysis doesn't quite compete with free web games, they largely occupy different niches. In the early days of the PC, the landscape was simple and all the games were more or less 'high end' for the time. But as technology marches on and allows you to do more in a high end game, this never kills the market for simpler games on the low end.

The other shift in games is the rise of console dominance, both in terms of the living room and the market. The modern console has come along way, and now provides a competitive experience in most genres, quality multiplayer, media and apps. The PC game market still exists, but mainly in the form of genres that really depend on keyboard and mouse or are by nature less suitable to playing on a couch. Basically the genres that Blizzard dominates. Unfortunately for the hardware people, Blizzard is rather slow in pushing the hardware.

3. The slow but inexorable deployment of pervasive high speed broadband. Its definitely taking time, but this is where we are headed sooner rather than later. Ultimately this means that the minimal cheap low power device described above is all you need or will ever need for local computation (basically video decompression), and any heavy lifting that you need can be made available from the cloud on demand. This doesn't mean that there won't still be a market for high end PC's, as some people will always want their own powerful computers, but it will be increasingly marginal and hobbyist.

4. The speed of light barrier. Moore's law generally allows exponential increase in the number of transistors per unit area as process technology advances and shrinks, but only more marginal improvements in clock rate. Signal propagation is firmly limited by the speed of light, and so the round trip time of a typical fetch/execute/store operation is relatively huge, and has been for quite some time. The strategy up to fairly recently for CPU architects was to use ever more transistors to hide this latency and increase execution rate through pipelining with caches, instruction scheduling and even prediction . GPU's, like DSP's and even cray vector procesors before them, took the simpler route of massive parallelization. Now the complex superscalar design has long since reached its limits, and architects are left with massive parallelization as the only route forward to take advantage of additional transistors. In the very long term, the brain stands as a sort of example of where computing might head eventually, faced with the same constraints.

This is the future, and I think its clear enough that the folks at Intel, NVidia and AMD can all see the writing on the wall, the bigger question is what to do about it. As discussed above, I don't think the low end netbook/smartphone/whatever market is enough to sustain these companies in the longer term, there will only be more competition and lower margins going forward.

Where is the long term growth potential? Its in the cloud. Especially as gaming starts to move into this space, here is where moore's law will never marginalize.

This is why Nvidia's strategy with Fermi makes good sense to me, just as Larrabee does for Intel. With Fermi Nvidia is betting that paying the extra die space for the remaining functionality to elevate their GPU cores into something more like CPU cores is the correct long term decision.

When you think about it, there is a huge difference between a chip like Larrabee or (apparently) Fermi which can run full C++, and more limited GPU's like the GT2xx series or AMD's latest. Yes you can port many algorithms to run on Cuda or OpenCL or whatever, but port is the key word.

With Larrabee or Fermi you actually should be able to port over existing CPU code, as they support local memory caches, unified addressing and function pointers/indirect jumps, and thus even interrupts. IE, they are complete, and really should be called wide-vector massively threaded CPUs. The difference between that kind of 'GPU' and upcoming 'CPU's really just comes down to vector-width, cache sizes and hardware threading decisions.

But really, porting existing code is largely irrelevant. Existing CPU code, whether single or multi threaded, is a very different beast than mega-threaded code. The transition from a design based on one to a handful of threads to a design for thousands of threads is the important transition. The vector-width or instruction set details are tiny details in comparison (and actually, I agree with Nvidia's decision to largely hide the SIMD width, having them simulate scalar threads). Larrabee went with a somewhat less ambitious model, supporting 4-way hyper-threading vs the massive threading of current GPU's, and I think this is a primary mistake. Why? Because future architectures will only get faster by adding more threads, so you better design for massive thread scalability now.

What about fusion, and CPU/GPU integration?

There's a lot of talk now about integrating the CPU and GPU onto a single die, and indeed ATI is massively marketing/hyping this idea. In the near term it probably makes sense in some manner, but in the longer term its largely irrelevant.

Why? Because the long term trend is and must be software designed for a sea of threads. This is the physical reality, like it or not. So whats the role of the traditional CPU in this model? Larrabee and Fermi point to GPU cores taking on CPU features. Compare upcoming Intel CPU designs to Fermi or Larrabee. Intel will soon move to 16 superscalar 4-way SIMD cores on a chip at 2-3 GHZ. Fermi will be 16 'multi-processors' with 32 scalar units each at 1-2 GHZ. Larrabee somewhere inbetween, but closer to Fermi.

Its also pretty clear at this point that most software or algorithms designed massively parallel perform far better on the more GPU-ish designs above (most, but not all). So in the long term CPU and GPU become historical terms - representing just points on a spectrum between superscalar or supervector, and we just have tons of processors, and the whole fusion idea really just amounts to a heterogeneous vs homogeneous design. As a case study, compare the 360 to the PS3. The 360 with 3 general CPUs and a 48-unit GPU is clearly easier to work with than the PS3 with its CPU, 7 wierd SPU's, and 24-unit GPU. Homogeneity is generally the better choice.

Now going farther forward into the next decade, looking at a 100+ core design, would you rather have the die split between CPU cores and GPU cores? One CPU as coordinator and then a bunch of GPU cores, or, just all cGPU cores? In the end the latter is the most attractive if the cGPU cores have all the features of a CPU. If the same C++ code runs on all the cores, then perhaps it doesn't matter.




Thursday, August 13, 2009

Unique Voxel Storage


How much memory does a unique voxelization of a given scene cost? Considering anistropic filtering and translucency a pixel will be covered by more than one voxel in general. An upper bound is rather straightforwad to calculate. For a single viewport with a limited nearZ and farZ range, there are a finite number of pixel radius voxels extending out to fill the projection volume. The depth dimension of this volume is given by viewportDim * log2(farz/nearz). For a 1024x1024 viewport, a nearZ of 1 meter and a view distance of 16 kilometers, this works out to about log2(16000)*1024, or 14,000 voxels per pixel, or 14 billion voxels for the frustum's projection volume, and around ~100 billion voxels for the entire spherical viewing volume. This represents the maximum possible data size of any unique scene when sampled at proper pixel sampling rate with unlimited translucency and AA precision.

Now obviously, this is the theoretical worst case, which is interesting to know, but wouldn't come up in reality. A straightforward, tighter bound can be reached if we use discrete multi-sampling for the AA and anistropic filtering, which means that each sub-sample hits just one voxel, and we only need to store the visible (closest) voxels. In this case, considering occlusion, the voxel cost is dramatically lower, being just ScreenArea*AAFactor. For an average of 10 sub-samples and the same viewport setup as above, this is just around 100 million voxels for the entire viewing volume. Anistropic filtering quickly hits diminishing returns by around 16x maximum samples per pixel, and most pixels need much less, so a 10x average is quite reasonable.





For translucent voxels, a 10x coverage multiplier is quite generous, as the contribution of high frequencies decreases with decreasing opacity (which current game rasterizers exploit by rendering translucent particles at lower resolution). This would mean that voxels at around 10% opacity would get full pixel resolution, and voxels at about 1.5% or lower would get half-pixel resolution, roughly.

The octree subdivision can be guided with the z occlusion information. Ideally we would update a node's visibility during the ray traversal, but due to the scattered memory write ineffeciency it will probably be better to write out some form of z-buffer and then back-project the nodes to determine visibility.

A brute force multi-sampling approach sounds expensive, but would still be feasible on future hardware, as Nvidia's recent siggraph paper "Alternative Rendering Pipelines with Nvidia Cuda" demonstrates in the case of implementing a Reyes micropolygon rasterizer in Cuda. With enough multi-samples, you don't even need bilinear filtering - simple point sampling will suffice. But for voxel tracing, discrete multi-sampling isn't all that effecient compared to the more obvious and desireable path, which is simply to accumulate coverage/alpha directly while tracing. This is by far the fastest route to high quality AA & filtering. However it does pose a problem for the visibility determination mentioned above - without a discrete z-buffer, you don't have an obvious way of calculating voxel visibility for subdivision.

One approach would be to use an alpha-to-coverage scheme, which would still be faster than true multi-sampled tracing. This would require updating a number of AA z samples inside the tracing inner loop, which is still much more work then just alpha blending. A more interesting alternative is to store an explicit depth function. One scheme would be to store a series of depths representing equal alpha intervals. Or better yet, store arbitrary piecewise segments of the depth/opacity function. In the heirarchical tracing scheme, these could be written out and stored at a lower resolution mip level, such as the quarter res level, and then be used both to accelerate tracing for the finer levels and for determing octree node visibility. During the subdivision step, nodes would project to the screen and sample their visibility from the appropriate depth interval from this structure.

I think the impact of anisotropy and translucency can be limited or capped just as in the discrete z-buffer case by appropriate node reweighting based on occlusion or opacity contribution. A node which finds that it is only 25% visible would only get slightly penalized, but a 5% visibile node more heavily so, effectively emulating a maximum effective voxel/pixel limit, after which resolution is lost. (which is fine, as the less a node contributes, the less important the loss of its high frequency content). Or more precisely, node scores would decrease in proportion to their screen coverage when it falled below the threshold 1/AA, where AA is the super-sampling limit you want to emulate.



Followers