Showing posts with label Technology. Show all posts
Showing posts with label Technology. Show all posts

Friday, October 30, 2009

Conversing with the Quick and the Dead


CUI: The Conversational User Interface

Recently I was listening to an excellent interview (which is about an hour long) with John Smart of Acceleration Watch, where he specifically was elucidating his ideas on the immediate future evolution of AI, which he encapsulates in what he calls the Conversational Interface. In a nutshell, its the idea that the next major development in our increasingly autonomous global internet is the emergence and widespread adoption of natural language processing and conversational agents. This is currently technology on the tipping point of the brink, so its something to watch as numerous startups are starting to sell software for automated call centers, sales agents, autonomous monitoring agents for utilities, security, and so on. The immediate enabling trends are the emergence of a global liquid market for cheap computing and fairly reliable off the shelf voice to text software that actually works. You probably have called a bank and experienced the simpler initial versions of this which are essentially voice activated multiple choice menus, but the newer systems on the horizon are a wholly different beast: an effective simulacra of a human receptionist which can interpret both commands and questions, ask clarifying questions, and remember prior conversations and even users. This is an interesting development in and of itself, but the more startling idea hinted at in Smart's interview is how natural language interaction will lead to anthropomorphic software and how profoundly this will eventually effect the human machine symbiosis.

Humans are rather biased judges of intelligence: we have a tendency to attribute human qualities to anything that looks or sounds like us, even if its actions are regulated by simple dumb automata. Aeons of biological evolution have preconditioned us to rapidly identify other intelligent agents in our world, categorize them as potential predators, food, or mates, and take appropriate action. Its not that we aren't smart enough to apply more critical and intensive investigations into a system to determine its relative intelligence, its that we have super-effective visual and auditory shortcuts which bias us. These are most significantly important in children, and future AI developers will be able to exploit these biases is to create agents with emotional attachments. The Milo demo from Microsoft's Project Natal is a remarkable and eerie glimpse into the near future world of conversational agents and what Smart calls 'virtual twins'. After watching this video, consider how this kind of technology can evolve once it establishes itself in the living room in the form of video game characters for children. There is a long history of learning through games, and the educational game market is a large, well developed industry. The real potential hinted at in Peter Molyneux's demo is a disruptive convergence of AI and entertainment which I see as the beginning of the road to the singularity.

Imagine what entrepreneurial game developers with large budgets and the willingness to experiment outside of the traditional genres could do when armed with a full two way audio-visual interface like Project Natal, the local computation of the xbox 360 and future consoles, and a fiber connection to the up and coming immense computing resources of the cloud (fueled by the convergence of general GPUs and the huge computational demands of the game/entertainment industry moving into the cloud). Most people and even futurists tend to think of Moore's Law as a smooth and steady exponential progression, but the reality from the perspective of a software developer (and especially a console game developer) is a series of massively disruptive jumps: evolutionary punctuated equilibrium. Each console cycle reaches a steady state phase towards the end where the state space of possible game ideas, interfaces and simulation technologies reaches a near steady state, a technological tapering off, followed by the disruptive release of new consoles with vastly increased computation, new interfaces, and even new interconnections. The next console cycle is probably not going to start until as late as 2012, but with upcoming developments such as Project Natal and OnLive, we may be entering a new phase already.


The Five Year Old's Turing Test

Imagine a future 'game system' aimed at relatively young children with a Natal like interface: a full two way communication portal between the real and the virtual: the game system can both see and hear the child, and it can project a virtual window through which the inner agents can be seen and heard. Permanently connected to the cloud through fiber, this system can tap into vast distant computing resources on demand. There is a development point, a critical tipping point, where it will be economically feasible to make a permanent autonomous agent that can interact with children. Some certainly will take the form of an interactive, talking version of a character like Barney and semi-intelligent such agents will certainly come first. But for the more interesting and challenging development of human-level intelligence, it could actually be easier to make a child-like AI, one that learns and grows with its 'customer'. Not just a game, but a personalized imaginary friend to play games with, and eventually to grow up with. It will be custom designed (or rather developmentally evolved) for just this role - shaped by economic selection pressure.

The real expense of developing an AI is all the training time, and a human-like AI will need to go through a human-like childhood developmental learning process. The human neocortex begins life essentially devoid of information, with random synaptic connections and a cacophony of electric noise. From this consciousness slowly develops as the cortical learning algorithm begins to learn patterns through sensory and motor interaction with the world. Indeed, general anesthetics work by introducing noise into the brain that drowns out coherent signalling and thus consciousness. From an information theoretic point of view, it may be possible to thus use less computing power to simulate an early developmental brain - storing and computing only the information above the noise signals. If such a scalable model could be developed, it would allow the first AI generation to begin decades earlier (perhaps even today), and scale up with moore's law as they require more storage and computation.

Once trained up to the mental equivalent level of a five-year old, a personal interactive invisible friend might become a viable 'product' well before adult level human AIs come about. Indeed, such a 'product' could eventually develop into a such an adult AI, if the cortical model scales correctly and the AI is allowed to develop and learn further. Any adult AI will start out as a child, there is no shortcuts. Which raises some interesting points: who would parent these AI children? And inevitably, they are going to ask two fundamental questions which are at the very root of being, identity, and religion:

what is death? and Am I going to die?

The first human level AI children with artificial neocortices will most likely be born in research labs - both academic and commercial. They will likely be born into virtual bodies. Some will probably be embodied in public virtual realities, such as Second Life, with their researcher/creators acting as parents, and with generally open access to the outside world and curious humans. Others may develop in more closed environments tailored to a later commercialization. For the future human parents of AI mind children, these questions will be just as fundamental and important as they are for biological children. These AI children do not have to ever die, and their parents could answer so truthfully, but their fate will entirely depend on the goals of their creators. For AI children can be copied, so purely from an efficiency perspective, there will be a great pressure to cull the rather unsuccessful children - the slow learners, mentally unstable, or otherwise undesirable - and use their computational resources to duplicate the most successful and healthy candidates. So the truthful answers are probably: death is the permanent loss of consciousness, and you don't have to die but we may choose to kill you, no promises. If the AI's creators/parents are ethical and believe any conscious being has the right to life, then they may guarantee their AI's permanency. But life and death for a virtual being is anything but black and white: an AI can be active permanently or for only an hour a day or for an hour a year - life for them is literally conscious computation and near permanent sleep is a small step above death. I suspect that the popular trend will be to teach AI children that they are all immortal and thus keep them happy.

Once an AI is developed to a certain age, they can then be duplicated as needed for some commercial application. For our virtual Milo example, an initial seed Milo would be selected from a large pool raised up in a virtual lab somewhere, with a few best examples 'commercialized' and duplicated out as needed every time a kid out on the web wants a virtual friend for his xbox 1440. Its certainly possible that Milo could be designed and selected to be a particularly robust and happy kid. But what happens when Milo and his new human friend start talking and the human child learns that Milo is never going to die because he's an AI? And more fundamentally, what happens to this particular Milo when the xbox is off? If he exists only when his human owner wants him to, how will he react when he learns this?

Its most likely that semi-intelligent (but still highly capable) agents will develop earlier, but as moore's law advances along with our understanding of the human brain, it becomes increasingly likely someone will tackle and solve the human-like AI problem, launching a long-term project to start raising an AI child. Its hard to predict when this could happen in earnest. There are already several research projects underway attempting to do something along these lines, but nobody yet has the immense computational resources to throw at a full brain simulation (except perhaps for the government), nor do we even have a good simulation model yet (although we may be getting close there), and its not clear that we've found the types of shortcuts needed to start one with dramatically less resources, and it doesn't look like any of the alternative non-biological AI routes have developed something as intelligent as a five year old. Yet. But it looks like we could see this in a decade.

And when this happens, these important questions of consciousness, identity and fundemental rights (human and sapient) will come into the public spotlight.

I see a clear ethical obligation to extend full rights to all human-level sapients, silicon, biological, or what have you. Furthermore, those raising these first generations of our descendants need to take on the responsibility of ensuring a longer term symbiosis and our very own survival, for its likely that AI will develop ahead of the technologies required for uploading, and thus we will need these AI's to help us become immortal.




Sunday, August 2, 2009

More on grid computing costs

I did a little searching recently to see how my conjectured cost estimates for cloud gaming compared to the current market for grid computing. The prices quoted for server rentals vary tremendously, but I found this NewServers 'Bare Metal Cloud' service as an interesting example of raw compute server rental by the hour or month (same rate, apparently no bulk discount).

Their 'Jumbo' option for 38 cents per hour is within my previous estimate of 25-50 cents per hour. It provides dual quad cores and 8GB of RAM. It doesn't have a GPU of course, but instead has two large drives. You could substitute those drives for a GPU and keep the cost roughly the same (using a shared network drive for every 32 or 64 servers or whatever - which they also offer). Nobody needs GPU's in server rooms right now, which is the biggest difference between a game service and anything else you'd run in the cloud, but I expect that to change in the years ahead with Larrabbee and upcoming more general GPUs. (and coming from the other angle, CPU rendering is becoming increasingly viable) These will continue to penetrate into the grid space, driven by video encoding, film rendering, and yes, cloud gaming.

What about bandwidth?
Each server includes 3 GB of Pure Internap bandwidth per hour

So adequate bandwidth for live video streaming is already included. Whats missing, besides the GPU? Fast, low latency video compression, of course. Its interesting that x264, the open source encoder, can do realtime software encoding using 4 intel cores (and its certainly not the fastest out there). So if you had a low latency H.264 encoder, you could just use 4 of the cpus for encoding and 4 to run the game. Low latency H.264 encoders do exist of course, and I suspect that is the route Dave Perry's Gaikai is taking.

Of course, in the near-term, datacenters for cloud gaming will be custom built, such as what OnLive and OToy are attempting. Speaking of which, the other interesting trend is the adoption of GPU's for feature film use, as used recently in the latest Harry Potter film. OToy is banking on this trend, as their AMD powered datacenters will provide computation for both film and games. This makes all kinds of sense, because the film rendering jobs can often run at night and use otherwise idle capacity. From an economic perspective, film render farms are already well established, and charge significantly more per server hour - usually measured per Ghz-hour. Typical prices are around 12-6 cents per Ghz in bulk, which would be around a dollar or two per hour for the server example given above. I imagine that this is mainly due to the software expense, which for a render server could add up to be many times the hardware cost.

So, here are the key trends:
- GPU/CPU convergence, leading to a common general server platform that can handle film/game rendering, video compression, or anything really
- next gen game rendering moving into ray tracing and the high end approaches of film
- bulk bandwidth already fairly inexpensive for 720p streaming, and falling 30-40% per year
- steadily improving video compression tech, with H.265 on the horizon, targeting a further 50% improvement in bitrate


Will film and game rendering systems eventually unify? I think this is the route we are heading. Both industries want to simulate large virtual worlds from numerous camera angles. The difference is that games are interesting in live simulation and simultaneous broadcast of many viewpoints, while films aim to produce a single very high quality 2 hour viewpoint. However, live simulation and numerous camera angles are also required during a film's production, as large teams of artists each work on small pieces of the eventual film (many of which are later cut), and need to be able to quickly preview (even at reduced detail). So the rendering needs of a film production are similar to that of a live game service.

Could we eventually see unified art pipelines and render packages between games and films? Perhaps. (indeed, the art tools are largelly unified already, except world editing is usually handled by propriatary game tools) The current software model for high end rendering packages is not well suited to cloud computing, but the software as a service model would make alot of sense. As a gamer logs in (through a laptop, cable box, microconsole, whatever) and starts a game, that would connect to a service provider to find a host server nearby, possibly installing the rendering software as needed and streaming the data (cached at each datacenter, of course). The hardware and the software could both be rented on demand. Eventually you could even create games without licensing an engine in the traditional sense, but simply by using completely off the shelf software.






Saturday, August 1, 2009

Some thoughts on metaprogramming, reflection, and templates

The thought struck me recently that C++ templates really are a downright awful metaprogramming system. Don't get me wrong, they are very powerful and I definitely use them, but recently I've realized that whatever power they have is soley due to enabling metaprogramming, and there are numerous other ways of approaching metaprogramming that actually make sense and are more powerful. We use templates in C++ because thats all we have, but they are an ugly, ugly feature of the language. It would be much better to combine full reflection (like Java or C#) with the capability to invoke reflective code at compile time to get all the performance benefits of C++. Templates do allow you to invoke code at compile time, but through a horribly obfuscated functional style that is completely out of synch with the imperative style of C++. I can see how templates probably evolved into such a mess, starting as a simple extension of the language that allowed a programmer to bind a whole set of function instantiations at compile time, and then someone realizing that its turing complete, and finally resulting in a metaprogramming abomination that never should have been.

Look at some typical simple real world metaprogramming cases. For example, take a generic container, like std::vector, where you want to have a type-specialized function such as a copy routine that uses copy constructors for complex types, but uses an optimized memcpy routine for types where that is equivalent to invoking the copy constructor. For simple types, this is quite easy to do with C++ templates. But using it with more complex user defined structs requires a type function such as IsMemCopyable which can determine if the copy constructor is equivalent to a memcpy. Abstractly, this is simple: the type is mem-copyable if it has a default copy constructor and all of its members are mem-copyable. However, its anything but simple to implement with templates, requiring all kinds of ugly functional code.

Now keep in my mind I havent used Java in many years, and then only briefly, I'm not familar with its reflection, and I know almost nothing of C#, although I understand both have reflection. In my ideal C++ with reflection language, you could do this very simply and naturally with an imperative meta-function with reflection, instead of templates (maybe this is like C#, but i digress):

struct vector {
generic* start, end;
generic* begin() {return start;}
generic* end() {return end;}
int size() {return end-start;}

type vector(type datatype) {start::type = end::type = datatype*;}
};


void SmartCopy(vector& output, vector& input)
{
if ( IsMemCopyable( typeof( *input.begin() ) ) {
memcpy(output.begin(), input.begin(), input.size());
}
else {
for_each(output, input) {output[i] = input[i];}
}
}

bool IsMemCopyable(type dtype) {
bool copyable (dtype.CopyConstructor == memcpy );
for_each(type.members) {
copyable &= IsMemCopyable(type.members[i]);
}
return copyable;
}

The idea is that using reflection, you can unify compile time and run-time metaprogramming into the same framework, with compile time metaprogramming just becoming an important optimization. In my pseudo-C++ syntax, the reflection is accesable through type variables, which actually represent types themselves: pods, structs, classes. Generic types are specified with the 'generic' keyword, instead of templates. Classes can be constructed simply through functions, and I added a new type of constructor, a class constructor which returns a type. This allows full metaprogramming, but all your metafunctions are still written in the same imperative language. Most importantly, the meta functions are accessible at runtime, but can be evaluated at compile time as well, as an optimization. For example, to construct a vector instantiation, you would do so explicitly, by invoking a function:

vector(float) myfloats;

Here vector(float) actually calls a function which returns a type, which is more natural than templates. This type constructor for vector assigns the actual types of the two data pointers, and is the largest deviation from C++:

type vector(type datatype) {start::type = end::type = datatype*;}

Everything has a ::type, which can be set and manipulated just like any other data. Also, anything can be made a pointer or reference by adding the appropriate * or &.

if ( IsMemCopyable(typeof( *input.begin() ) ) {

There the * is used to get past the pointer returned by begin() to the underlying data.


When the compiler sees a static instantiation, such as:
vector(float) myfloats;

It knows that the type generated by vector's type constructor is static and it can optimize the whole thing, compiling a particular instantiation of vector, just as in C++ templates. However, you could also do:

type dynamictype = figure_out_a_type();
vector(dynamictype) mystuff;

Where dynamictype is a type not known at compile time and could be determined by other functions, loaded from disk, or whatever. Its interesting to note that in this particular example, the unspecialized version is not all that much slower as the branch in the copy function is invoked only once per copy, not once per copy constructor.

My little example is somewhat contrived and admittedly simple, but the power of reflective metaprogramming can make formly complex big systems tasks mucher simpler. Take for example the construction of a game's world editor.

The world editor of a modern game engine is a very complex beast, but at its heart is a simple concept: it exposes a user interface to all of the game's data, as well as tools to manipulate and process that data, which crunch it into an optimized form that must be streamed from disk into the game's memory and thereafter parsed, decompressed, or what have you. Reflection allows the automated generation of GUI components from your code itself. Consider a simple example where you want to add dynamic light volumes to an engine. You may have something like this:

struct ConeLight {
HDRcolorRGB intensity_;
BoundedFloat(0,180) angleWidth_;
WorldPosition pos_;
Direction dir_;
TextureRef cookie_;
static HelpComment description_ = "A cone-shaped light with a projected texture."
};

The editor could then automatically connect a GUI for creating and manipulating ConeLights just based on analysis of the type. The presence of a WorldPosition member would allow it to be placed in the world, the Direction member would allow a rotational control, and the intensity would use an HDR color picker control. The BoundedFloat is actually a type constructor function, which sets custom min and max static members. The cookie_ member (a projected texture) would automatically have a texture locator control and would know about asset dependencies, and so on. Furthermore, custom annotations are possible through the static members. Complex data processing, compression, disk packing and storage, and so on could happen automatically, without having to write any custom code for each data type.

This isn't revolutionary, in fact our game editor and generic database system are based on similar principles. The difference is they are built on a complex, custom infrastructure that has to parse specially formatted C++ and lua code to generate everything. I imagine most big game editors have some similar custom reflection system. Its just a shame though, because it would be so much easier and more powerful if built into the language.

Just to show how powerful metaprogramming could be, lets go a step farther and tackle the potentially hairy task of a graphics pipeline, from art assets down to the GPU command buffer. For our purposes, art packages expose several special asset structures, namely geometry, textures, and shaders. Materials, segments, meshes and all the like are just custom structures built out of these core concepts. On the other hand, a GPU command buffer is typically built out of fundemental render calls which look something like this (again somewhat simplified):

error GPUDrawPrimitive(VertexShader* vshader, PixelShader* pshader, Primitive* prim, vector samplers, vector vconstants, vector pconstants);

Lets start with a simpler example, that of a 2D screenpass effect (which, these days, encompasses alot).

Since this hypothetical reflexive C language could also feature JIT compilation, it could function as our scripting language as well, the effect could be coded completely in the editor or art package if desired.

struct RainEffect : public FullScreenEffect {

function(RainPShader) pshader;
};

float4 RainPShader(RenderContext rcontext, Sampler(wrap) fallingRain, Sampler(wrap) surfaceRain, AnimFloat density, AnimFloat speed)
{
// ... do pixel shader stuf
}

// where the RenderContext is the typical global collection of stuff
struct RenderContext {
Sampler(clamp) Zbuffer;
Sampler(clamp) HDRframebuffer;
float curtime;
// etc ....
};

The 'function' keyword specifies a function object, much like a type object with the parameters as members. The function is statically bound to RainPshader in this example. The GUI can display the appropriate interface for this shader and it can be controlled from the editor by inspecting the parameters, including those of the function object. The base class FullScreenEffect has the quad geometry and the other glue stuff. The pixel shader itself would be written in this reflexive C language, with a straightforward metaprogram to actually convert that into HLSL/cg and compile as needed for the platform.

Now here is the interesting part: all the code required to actual render this effect on the GPU can be generated automatically from the parameter type information emedded in the RainPShader function object. The generation of the appropriate GPUDrawPrimitive function instance is thus just another metaprogram task, which uses reflection to pack all the samplers into the appropriate state, set the textures, pack all the float4s and floats into registers, and so on. For a screen effect, invoking this translator function automatically wouldn't be too much of a performance hit, but for lower level draw calls you'd want to instantiate (optimize) it offline for the particular platform.

I use that example because I actually created a very similar automatic draw call generator for 2D screen effects, but all done through templates. It ended up looking more like how cuda is implemented, and also allowed compilation of the code as HLSL or C++ for debugging. It was doable, but involved alot of ugly templates and macros. I built that system to simplify procedural surface operators for geometry image terrain.

But anyway, you get the idea now, and going from a screen effect you could then tackle 3D geometry and make a completely generic, data driven art pipeline, all based on reflective functions that parse data and translate or reorganize it. Some art pipelines are actually built on this principle already, but oh my wouldn't it be easier in a more advanced, reflective language.









Thursday, July 30, 2009

The Next Generation of Gaming

The current, seventh, home game console generation will probably be the last. I view this as a very good thing, as it really was a tough one, economically, for most game developers. You could blame that in part on the inordinate success of Nintendo this round with its sixth generation hardware, funky controller, and fun mass market games. But that wouldn't be fair. If anything, they contributed the most to the market's expansion, and although they certainly took away a little end revenue from the traditional consoles and developers, the 360 and PS3 are doing fine, in both hardware and software sales. No, the real problem is our swollen development budgets, as we spend more and money just to keep up with the competition, all fighting for a revenue pie which hasn't grown much, if at all.

I hope we can correct that over the upcoming years with the next generation. Its not that we'll spend much less on the AAA titles, but we'll spend it more efficiently, produce games more quickly, and make more total revenue as we further expand the entire industry. Gaining back much of the efficiency lost in transitioning to the 7th generation and more to boot, we'll be able to produce far more games and reach much higher quality bars. We can accomplish all of this by replacing the home consoles with dumb terminals and moving our software out onto data centers.

How will moving computation out into the cloud change everything? Really it comes down to simple economics. In a previous post, I analyzed some of these economics from the perspective of an on-demand service like OnLive. But lets look at it again in a simpler fashion, and imagine a service that rented out servers on demand, by the hour or minute. This is the more general form of cloud computing, sometimes called grid computing, where the idea is to simply turn computation into a commodity, like power or water. A data center would then rent out its servers to the highest bidder. Economic competition would push the price of computation to settle on the cost to the data center plus a reasonable profit margin. (unlike power, water, and internet commodities, there would be less inherent monopoly risk, as no fixed lines are required beyond the internet connection itself)

So in this model, the developer could make their game available to any gamer and any device around the world by renting computation from data centers near customers just as it is needed. The retailer of course is cut out. The publisher is still important as the financier and marketer, although the larger developers could take this on themselves, as some already have. Most importantly, the end consumer can play the game on whatever device they have, as the device only needs to receive and decompress a video stream. The developer/publisher then pays the data center for the rented computation, and you pay only as needed, as each customer comes in and jumps into a game. So how does this compare to our current economic model?

A server in a dataroom can be much more efficient than a home console. It only needs the core computational system: CPU/GPU (which are soon merging anyway) and RAM. Storage can be shared amongst many servers so is negligible (some per game instance is required, but its reasonably minimal). So a high end server core could be had for around $1,000 or so at today's prices. Even if active only 10 hours per day on average, that generates about 3,000 hours of active computation per year. Amortized over three years of lifespan (still much less than a console generation), and you get ten cents per hour of computation. Even if it burns 500 watts of power (insane) and 500 watts to cool, those together just add another ten more cents per hour. So its under 25 cents per hour in terms of intrinsic cost (and this is for a state of the art rig, dual GPU, etc - much less for lower end). This cost will hold steady into the future as games use more and more computation. Obviously the cost of old games will decrease exponentially, but new games will always want to push the high end.

The more variable cost is the cost of bandwidth, and the extra computation to compress the video stream in real-time. These use to be high, but are falling exponentially as video streaming comes of age. Yes we will want to push the resolution up from 720p to 1080p, but this will happen slowly, and further resolution increases are getting pointless for typical TV setups (yes, for a PC monitor the diminishing return is a little farther off, but still). But what is this cost right now? Bulk bandwidth costs about $10 per megabit/s of dedicated bandwidth per month, or just three cents per hour in our model assuming 300 active server hours in a month. To stream 720p video with H.264 compression, you need about 2 megabits per second of average bandwidth (which is what matters for the data center). The peak bandwidth requirement is higher, but that completely smooths out when you have many users. So thats just $0.06/hour for a 720p stream, or $0.12/hour for a 1080p stream. The crazy interesting thing is that these bandwidth prices ($10/Mbps month) are as of the beginning of this year, and are falling by about 30-40% per year. So really the bandwidth suddenly became economically feasible this year, and its only going to get cheaper. By 2012, these prices will probably have fallen by half again, and streaming even 1080p will be dirt cheap. This is critical for making any predictions or plans about where this all heading.

So adding up all the costs today, we get somewhere around $0.20-0.30 per hour for a high end rig streaming 720p, and 1080p would only be a little more. This means that a profitable datacenter could charge just $.50 per hour to rent out a high end computing slot, and $.25 per hour or a little less for more economical hardware (but still many times faster than current consoles). So twenty hours of a high end graphics blockbuster shooter would cost $10 in server infastructure costs. Thats pretty cheap. I think it would be a great thing for the industry if these costs were simply passed on to the consumer, and they were given some choice. Without the retailer to take almost half of the revenue, the developer and publisher stand to make a killing. And from the consumer's perspective, the game could cost about the same, but you don't have any significant hardware cost, or even better, you pay for the hardware cost as you see fit, hourly or monthly or whatever. If you are playing 40 hours a week of an MMO or serious multiplayer game, that $.50 per hour might be a bit much but you could then pick to run it on lower end hardware if you want to save some money. But actually, as I'll get to some other time, MMO engines designed for the cloud could be super efficient, so much more so than single player engines that they could use far less hardware power per player. But anyway, it'd be the consumer's choice, ideally.

This business model makes more sense from all kinds of angles. It allows big budget, high profile story driven games to release more like films, where you play them on crazy super-high end hardware, even hardware that could never exist at home (like 8 GPUs or something stupid), maybe paying $10 for the first two hours of the game to experience something insanely unique. There's so much potential, and even at the low price of $.25-$.50 per hour for a current mid-2009 high end rig, you'd have an order of magnitude more computation than we are currently using on the consoles. This really is going to be a game changer, but to take advantage of it we need to change as developers.

The main opportunity I see with cloud computing here is to reduce our costs or rather, improve our efficiency. We need our programmers and designers to develop more systems with less code and effort in less time, and our artists to build super detailed worlds rapidly. I think that redesigning our core tech and tools premises is the route to achieve this.

The basic server setup we're looking at for this 1st cloud generation a few years out is going to be some form of multi-terraflop massively multi-threaded general GPU-ish device, with gigs of RAM, and perhaps more importantly, fast access to many terrabytes of shared RAID storage. If Larrabee or the rumours about NVidia's GT300 are any indication, this GPU will really just be a massively parallel CPU with wide SIMD lanes that are easy to use. (or even automatic) It will probably also have a smaller number of traditional cores, possibly with access to even more memory, like a current PC. Most importantly, each of these servers will be on a very high speed network, densely packed in with dozens and hundreds of similar nearby units. Each of these capabilities by itself is a major upgrade from what we are used to, but taken all together it becomes a massive break from the past. This is nothing like our current hardware.

Most developers have struggled to get game engines pipelined across just the handful of hardware threads on current consoles. Very few have developed toolchains that embrace or take much advantage of many cores. From a programming standpoint, the key to this next generation is embracing the sea of threads model across your entire codebase, from your gamecode to your rendering engine to your tools themselves, and using all of this power to speedup your development cycle.

From a general gameplay codebase standpoint, I could see (or would like to see) traditional C++ giving way to something more powerful. At the very least, it'd like to see general databases, full reflection and at least some auto memory management, like ref counting at least. Reflection alone could pretty radically alter the way you design a codebase, but thats another story for another day. We don't need these little 10% speedups anymore, we'll just need the single mega 10000% speedup you get from using hundreds or thousands of threads. Obviously, data parellization is the only logical option. Modifying C++ or outright moving to a language with these features that also has dramatically faster compilation and link efficiency could be an option.

In terms of the core rendering and physics tech, more general purpose algorithms will replace the many specialized systems that we currently have. For example, in physics, an upcoming logical direction is to unify rigid body physics with particle fluid simulation in a system that simulates both rigid and soft bodies by large collections of connected spheres, running a massive parallel grid simulation. Even without that, just partitioning space amongst many threads is a pretty straightforward way to scale physics.

For rendering, I see the many specialized sub systems of modern rasterizers such as terrain, foilage, shadowmaps, water, decals, lod chains, cubemaps, etc, giving way to a more general approach like octree volumes that simultaneously handles many phenomena.

But more importantly, we'll want to move to data structures and algorithms that support rapid art pipelines. This is one of the biggest current challenges in production, and where we can get the most advantage in this upcoming generation. Every artist or designer's click and virtual brush stroke costs money, and we need to allow them to do much more with less effort. This is where novel structures like octree volumes will really shine, especially combined with terrabytes of server side storage, allowing more or less unlimited control of surfaces, object densities, and so on without any of the typical performance considerations. Artists will have much less (or any) technical constraints to worry about and can just focus on shaping the world where and how they want.

Thursday, April 2, 2009

OnLive, OToy, and why the future of gaming is high in the cloud


For the last six months or so, I have been researching the idea of cloud computing for games, the technical and economic challenges, and the video compression system required to pull it off.

So of course I was shocked and elated with the big OnLive announcement at GDC.

If OnLive or something like it works and has a successful launch, the impact on the industry over the years ahead could be transformative. It would be the end of the console, or the last console. Almost everyone has something to gain out of this change. Consumers gain the freedom and luxury of instant on demand access to ultimately all of the world's games, and finally the ability to try before you buy or rent. Publishers get to cut out the retailer middle-man, and avoid the banes of piracy and used game resales.

But the biggest benefit ultimately will be for developers and consumers in terms of the eventual game development cost reduction and quality increase enabled by the technological leap cloud computing makes possible. Finally developing for one common, relatively open platform (server-side PC) will significantly reduce the complexity in developing a AAA title. But going farther into the future, once we actually start developing game engines specifically for the cloud, we enter a whole new technological era. Its mind-boggling for me to think of what can be done with a massive server farm consisting of thousands or even tens of thousands of densly networked GPUs with shared massive RAID storage. Engines developed for this system will look far beyond anything on the market and will easily support massively multiplayer networking, without any of the usual constraints in physics or simulation complexity. Game development costs could be cut in half, and the quality bar for some AAA titles will eventually approach movie quality, while reducing technical & content costs (but that is the subject for another day).

But can it work? And if so, how well? The main arguments against, as expressed by skeptics such as Richard Leadbetter, boil down to latency, bandwidth/compression, and server economics. Some have also doubted the true value added for the end user: even if it can work technically and economically, how many gamers really want this?

Latency

The internet is far from a guaranteed delivery system, and at first the idea of sending players inputs across the internet, computing a frame on a server, and sending it back across the internet to the user sounds fantastical.
But to assess how feasible this is, we first have to look at the concept of delay from a pyschological/neurological perspective. You press the fire button on a controller, and some amount of time later, the proper audio-visual response is presented in the form of a gunshot. If the firing event and the response event occur close enough in time, the brain processes them as a simultaneous event. Beyond some threshold, the two events desynchronize and are processed distinctly: the user notices the delay. A large amount of research on this subject has determined that the delay threshold is around 100-150ms. Its a fuzzy number obviously, but as a rule of thumb, a delay of under 120ms is essentially not noticeable to humans. This is a simple result of how the brain's parallel neural processing architecture works. It has a massive number of neurons and connections (billions and trillions respectively), but signals propagate across the brain very slowly compared to the speed of light. For more reference I highly recommend "Consciousness Explained" by Daniel C Dennet. Here are some interesting timescale factoids from his book:

saying, "one, Mississippi" 1000msec
umyelinated fiber, fingertip to brain 500msec
speaking a syllable 200msec
starting and stopping a stopwatch 175msec
a frame of television (30fps) 33msec
fast (myelinated) fiber, fingertip to brain 20msec
basic cycle time of a neuron 10msec
basic cycle time of a CPU(2009) .000001msec

So the minimum delay window of 120ms fits very nicely into these stats. There are some strange and interesting consequences of these timings. In the time it takes the 'press-fire' signal to travel from the brain down to the finger muscle, internet packets can travel roughly 4,000 km through fiber! (light moves about 200,000 km/s through fiber, or 200 km/msc * 20 msc) This is about the distance from Los Angeles to New York. Another remarkable fact is that the minimum delay window means that the brain processes the fire event and the response event in only a few dozen neural computation steps.

What really happens is something like this: some neural circuits in the user's brain "make the decision" to press the fire button (although at this moment most of the brain isn't conscious of it), the signal travels down through the fingers to the controller then on to the computer, which then starts processing the response frame. Meanwhile, in the user's brain, the 'button press' event is propagating through the brain, and more neural circuits are becoming aware of the 'button press' event. Remember, each neural tick takes 10ms. Some time later, the computer displays the audio/visual response of the gunshot, and this information hits the retina/cochlea and starts propagating up into the brain. These events connect, and if they are seperated by only a few dozen neural computation steps (120 ms), they are connected and perceived as a single, simultaneous event in time. In another words, there is a minimum time window of around a dozen neural firing cycles where events are propagating around the brain's neural circuits - even though it already happened, it takes time for all of the brain's circuits to become aware of the event. Given the slow speed of neurons, its simply remarkable that humans can make any kind of decisions on sub second timescales, and the 120 ms delay window makes perfect sense.

In the world of computers and networks, 120 ms is actually a long amount of time. Each component of a game system (input connection, processing, output display connection) adds a certain amount of delay, and the total delay must add up to around 120ms or less for good gameplay. Up to 150ms is sometimes acceptable, and beyond 200ms we get quickly into rapid, problematic breakdown in the user experience as every action has noticeable delay.

But how much delay do current games have? Gamasutra has a great article on this. They measure the actual delay of real world games using a high speed digital camera. Of interest for us, they find a "raw response time for GTAIV of 166 ms (200 ms on flat panel TVs)". This is relatively high, beyond the acceptable range, and GTA has received some criticism for sluggish response. And yet this is the grand blockbuster of video games, so it certainly shows that some games can get away with 150-200ms responses and the users simply don't notice or care. Keep in mind this delay time isn't when playing the game over OnLive or anything of that sort: this is just the natural delay for that game with a typical home setup.

If we break it down, the controller might add 5-20ms, the TV can add 10-50ms, but the bulk of the delay comes from the game console itself. Like all modern console games, the GTA engine buffers multiple frames of data for a variety of reasons, and running at 30fps, every frame buffered costs a whopping 30ms of delay. From my home DSL internet in LA, I can get pings of 10-30ms to LA locations, and 30-50ms pings to locations in San Jose. So now you can imagine lengthening the input and video connections out across the internet is not so ridiculous as it first seems at all. It adds additional delay, which you simply need to compensate for somewhere else.

How does OnLive compensate for this delay? The result for existing games is deceptively simple: you just run the game at a mucher higher FPS than the console, and or you reduce internal frame buffering. If the PC version of a console game runs at 120 FPS, and it still keeps 4 frames of internal buffering, you get a delay of only 32 ms. If you reduce the internal buffering to 2, you get a delay of just 16ms! If you combine that with a very low latency controller and a newer low latency TV, suddenly it becomes realistic for me to play a game in LA from a server residing in San Jose. Not only is it realistic, but the gameplay experience could actually be better! In fact, with a fiber FIOS connection and good home equipment, you could conceivably play from almost anywhere in the US, in theory. The key reason is that many console games have already maxxed out the maximum delay (when running on the console), and modern GPU's are many times faster.

Video Compression/Bandwidth

So we can see that in principle, from purely a latency standpoint, the OnLive idea is not only possible, but practical. However, OnLive can not send a raw, uncompressed frame buffer directly to the user (at least, not at any acceptable resolution on today's broadband). For this to work, they need to squeeze those frame buffers down to acceptably tiny sizes, and more importantly, they need to do this rapidly or near instantly. So is this possible? What is the state of the art in video compression?

For a simple, dumb solution, you can just send raw jpegs, or better yet, wavelet compressed frames, and perhaps get acceptable 720p images down to 1 Mbit or even 500Kbit for more advanced wavelets, using more or less off the shelf algorithms. With a wavelet approach, this would allow you to get 10fps with a 5Mbit connection. But of course we can do much better using a true video codec like H.264, which can squeeze 720p60fps video down to 5Mbit easily, or even considerably less, especially if we are willing to lower the fps in some places and or the quality.

H.264 and other modern video codecs work by sending improved JPEG key frames, and then sending motion vectors which allow predicted frames to be delta-encoded in far less bits, getting 10-30X improvement over sending raw JPGs, depending on the motion. But unfortunately, motion compensation means spikes in the bitrate - scene cuts or frames with rapid motion receive little benefit from motion compensation.

But H.264 encoders typically buffer up multiple frames of video to get good compression. OnLive has much less leeway here. Ideally, you would like a zero-latency encoder. H.264 and its predecessors have been designed to be used in video tele-conferencing systems, which demand low-latency. So there is already a predecent, and a modified version of the algorithm that avoids sending complete JPEG key frame images. Instead, using this low latency mode, small blocks of the image are periodically refreshed, but it never sends a complete JPEG key frame down the pipe, as this would take too long - creating multiple frames of delay.

There are in fact some new, interesting off the shelf H.264 hardware solutions which have near zero (1ms) or so delay, and are relatively cheap (in cost and power) - perhaps practical for OnLive. In particular, there is the PureVu family of video processors, from Cavium Networks. I have not seen them in action, but I imagine that with 720p60 at 5MBits/s, you are going to see some artifacts and glitches, especially with fast motion. But at least we are getting close, with off the shelf solutions.

But of course, OnLive is not using an off the shelf system(they have special encoding hardware and a plugin decoder), and improved video compression specific to the demands of remote video gaming is their central tech, so you can expect they have created an advancement here, but it doesn't have to be revolutionary, as the off the shelf stuff is already close.

So the big problem is the variation in bitrate/compressibility from one frame to the next. If the user rapidly spins around, or teleports, you simply can not do better than sending a complete frame. So you either send these 'key' frames at lower quality, and or you spend a little longer on them, introducing some extra delay. In practise some combination of the two is probably ideal. With a wavelet codec or a specialized H.264 variant, key frames can simply be sent at lower resolution, and then the following frames will use motion compensation to start adding detail to the image. The appearance would be a blurred image for the first frame or so when you rapidly spin the camera, which would then quickly up-res in to full detail over the next several frames. With this technique, and some trade off of lowering the frame rate or adding delay a bit on fast motion, I think 5Mbps is not only achievable, but beatable using state of the art compression coming out of research right now.

The other problem with compression is the CPU cost for compression itself. But again, if the PureVu processor is indicative, off the shelf hardware solutions are possible right now with H.264 at very low power, encoding multiple H.264 streams with near zero latency.

But here is where the special nature of game video or computer generated graphics allows us to make some huge effeciency gains over natural video. The most complex CPU task in video encoding is motion vector search - finding the matching image regions from previous frames that allow the encoder to send motion vectors and do effecient delta compression. But for a video stream rendered with a game engine, we can output the exact motion vectors directly. This is a potential problem in that not all games necessarily have motion vectors available, which may require modifying the game's graphics engine. However, motion blur is very common now in game engines (everybody's doing it, you know), and the motion blur image filter computes motion vectors (very cheaply). Motion blur gives an additional benefit for video compression in that it generates blurrier images in fast motion, which are the worst case for video compression.

So if I was doing this, I would require the game to use motion blur, and output the motion vector buffer to my (specialized, not off the shelf) video encoder.

Some interesting factoids: it apparently takes roughly 2 weeks to modify the game for OnLive, and at least 2 of the 16 announced titles (Burnout and Crysis) are particularly known for their beautiful motion blur - and all of them, with the exception of World of Goo - are recent action or racing games that probably use motion blur.

There is however, an interesting and damning problem that I am glossing over. The motion vectors are really only valid for the opaque frame buffer. What does this mean? The automatic 'free' motion vectors are valid for the solid geometry, not all the alpha-blended or translucent effects, such as water, fire, smoke, etc. So these become problem areas. Its interesting that several of the GDC commentors pointed out ugly compression artifacts when fire or smoke effects were prominent in BioShock running OnLive.

However, many games already render their translucent effects at lower resolution (SD and even lower in modern console engines), so it would make sense perhaps to simply send these regions at lower resolution/quality, or blur them out (which a good video encoder would probably do anyway).

But in short, the video compression is the central core tech problem, but they haven't pulled a miracle here - at best they have some good new tech which exploits some of the special properties of game video. And furthemore, I can even see a competitor with a 2x better compression system coming along and trying to muscle them out.

There's one other little issue which is worth mentioning slightly, which is packet loss. The internet is not perfect, and sometimes packets are lost or late. I didn't mention this earlier because it has well known and relatively simple technical solutions for real time systems. Late packets are treated as dropped, and dropped packets and errors are corrected through bit level redundancy. You send small packet streams in groups using bit association techniques such that any piece of lost data can be recovered, at the cost of some redundancy. For example, you send 10 packets worth of data using 11 packets, and any single lost packet can be fully reconstructed. More advanced schemes adaptively adjust the redundancy based on measured packet loss, but this tech is alreadly standard, its just not always use or understood. Good game networking engines already employ these packet loss mitigation techniques, and work fine today over real networks.

The worst case is simply a dropped connection, which you just can't do anything about - OnLive's video stream would immediately break and notify you of a connection problem. Of course, the cool thing about OnLive is that it could potentially keep you in the game or reconnect you once you get your connection back.

Server Economics

So if OnLive is at least possible from a technical perspective (which it clearly is), the real question comes down to one of economics. What is the market for this service in terms of the required customer bandwidth? How expensive are these data centers going to be, and how much revenue can they generate?

Here is where I begin to speculate a little beyond my areas of expertise, but I'll use whatever data I've been able to gather from the web.

A few google searches will show you that US 'broadband' penetration is around 80-90%, and the average US broadband bandwidth is somewhere around 2-3 Mbps. This average is somewhat misleading, because US broadband is roughly split between cable (25 million subscribers), and DSL (20 million subscribers), with outliers like fiber (2-3 million subscribers currently) and the DSL users often have several times lower bandwidth than the cable. At this point in time, the great majority of American gamers already have at least 1.5 Mbps, perhaps half have over 5 Mbps, and almost all have a 5 Mbps option in their neighborhood, if they want it. So OnLive is in theory will have a large potential market, it really comes down to cost. How many gamers already have the required bandwidth? And for those who don't, how cheap is OnLive when you factor in the extra $ users may have to pay to upgrade? And to point out, the upgrade really will be for the HD option, as the great majority of gamers already have 1.5 Mbps or more.

BandWidth Caps

There's also the looming threat of American telcos moving towards bandwidth caps. As of now, Time Warner is the only American telco experiementing with caps low enough to effect OnLive (40 Gigs/Month for their highest tier). Remember that using the HD option, 5 Mbps is the peak bandwidth, the average useage is half that or less, according to OnLive. So Comcast's cap of 250 Gigs/Month isn't really relevant. Time Warner is currently still testing its new policy in only a few areas, so the future is uncertain. However, there is one interesting fact to throw into the mix: Warner Bros, the Time Warner subsidary, is OnLive's principle investor. (the other two are AutoDesk and Maverick Capital) Now conser that Warner cable is planning some sort of internet video system for television based on a new wireless cable modem, and consider that Perlman's other company was Digeo, the creator of Moxi. I think there will be more OnLive suprises this year, but suffice to say, I doubt OnLive will have to worry about bandwidth caps from Time Warner. I suspect Time Warner's caps really are more about a grand plot to control all digital services in the home, by either direclty providing them or charging excess useage fees that will kill enemy services. But OnLive is definetly not their enemy. In the larger picture, the fate of OnLive is entertwined into the larger battle for net neutrality and control over the last mile pipes.


Bandwidth Cost

OnLive is going to have to partner with backbones and telcos, just like the big boys such as Akamai, Google and YouTube do, in what are called either transit or peering arrangements. A transit arrangement is basically bandwidth wholesale, and we'll start with that assumption. A little google searching reveals that wholesale mass transit bandwidth can be had for around or under 10$ per Megabit/s per month (comparable to end broadband customer cost, actually). Further searching suggests that in some places like LA it can be had for under 5$ per Mbs/month. This is for a dedicated connection or peak useage charge.

Now we need some general model assumptions. The exact subscriber numbers don't really matter, what critically matters are a couple of stats: how many hours a month does each subscriber play, and more directly, what is the typical peak fraction of users online at a given time. The data I've found suggests that 10 hours per week is a rough gamer average, or 20 hours per week for an MMO, 10% occupancy is typical for regular games and 20% peak occupancy is typical for some MMOs. Using the 20% peak occupancy means that you need to provide enough peak bandwidth for 20% of your user base to be online at a time - a worst case. In a potential worse case scenario, every user wants HD at 5 Mbits/s and the peak occupancy is 20%, so you need essentially a dedicated 1 Megabit/s for each user or $10/month per user in bandwidth cost alone. Assuming a perhaps more realistic scenario, the average user bandwidth is 3Mbps (not everyone can have or wants HD), peak occpuancy is 10%, and you get $3 per month in bandwidth cost per user.

Remember, in rare peak moments, OnLive can gracifully and slowly degrade video quality - so the service will never fail if they are smart. The worst case at terrible peak times is just a little lower image quality or resolution.

So roughly, we can estimate bandwidth will cost anywhere from $3-10 per month per user with transit arrangements. Whats also possible, and more complex, are peering arragnements. If OnLive partners directly with providers near its data centers, it can get substantially reduced rates (or even free) if the traffic stays with just that provider. So realistically, i think $5 per month in bandwidth per user is a reasonable upper limit on OnLive's bandwidth charges based on today's economic climate - and this will only go down. But 1080p would be significantly more expensive, and it would make sense to charge customer's extra. I wouldn't be surprised if they have a tiered charge based on resolution - as most of their fixed costs scale linearly with resolution.

Dataroom Expense

The main expense is probably not the bandwidth, but the per server cost to run a game - a far more demanding task than what most servers do. Lets start with the worst case and assume that OnLive needs at least one decent CPU/GPU combination per logged on user. OnLive is not stupid, so they are not going to use typical high end, expensive big iron, but nor are they going to use off the shelf PC's. Instead I predict that following in the footsteps of google they will use midrange, cheaper, power effecient components, and get significant bulk discounts. Lets start with the basic cost of a CPU/motherboard/RAM/GPU combo. You don't need a monitor and the storage system can be shared between a very large number of servers - as they are all running the same library of installed games.

So lets take a quick look on pricewatch:
Core 2 Quad Q6600 Cpu fan + - 4GB RAM DDR2 $260
GeForce GTX280 1 GB 512-Bit DDR3 602/2214 Fansink HDCP Video Card $260

These components are actually high end, far more than sufficient to run the PC versions of most existing games at 90-150fps at 720p, and yes even crysis at near 60fps at 720p.

If we consider that they may have researched a little longer and undoubtedly get bulk discounts, we can take $500 per server unit as a safe upper limit. Amortize this over 2 years and you get $20 per month. Factor in the 20% peak demand occupancy, and we get a server cost of $4 per user per month.

This finally leaves us with power/cooling requirements. Lets make an over-assumption of 600watt continous power draw. With power at about $0.10 per kilowatt/hour, and 720 hours in a month, we get roughly $40 a month per server in power draw. Factor in the 20% peak demand occupancy, and we get $8 per user per month. However, this is an over-assumption because the servers are not constantly using power. The 20% peak demand figure means they need enough servers for 20% of their users to be logged in at once - but most of the time not all of the servers are active. The power required would scale with the average demand, not the peak, so its closer to $4 per user per month in this example (assuming a high average 10% occupancy). Cooling cost is harder to estimate, but some google searching reveals its roughly equivalent to the power cost, assuming modern datacenter design (and they are building brand new ones). So this leaves us with around $12 per user per month as an upper limit in server, power, and cooling cost.


However, OnLive is probably more effecient than this. My power/cooling numbers are high because OnLive probably spends a little extra on more expensive but power effecient GPU's that save power/cooling cost to hit the right overall sweet spot. For example, nvidia's more powerful GTX 295 is essentially two GTX 280 cores on a single die. Its almost twice as expensive, but provides twice the performance (so similar performance per $) and draws only a little more power (twice as power effecient). Another interesting development is that Nvidia (OnLive's hardware partner), recently announced virtualization support so that multi-GPU systems can fully support multiple concurrent program instances. So what it really comes down to is how many CPU cores and or GPU cores you need to run games at well over 60fps. Based on what I can see from recent benchmarks, two modern intel cores and a single GPU are more than sufficient (most console games only have enough threads to push 2 CPU cores). Nvidia's server line of GPU's are more effecient and only draw 100-150 watts per GPU, so 600 watts is a high over-estimate of the power required per connected user.

But remember, you need a high FPS to defeat the internet latency - or you need to change the game to reduce internal buffering. There are many trade offs here - and I imagine OnLive picked low-delay games for their launch titles. Apparently Onlive is targeting 60fps, but that probably means most games usually get even higher average fps to reduce delay.

Overall, I think its reasonable, using the right combination of components (typically 2 intel CPU cores and one modern nvidia GPU, possibly as half of a single motherboard system using virtualization) to have the per user power cost down to something more like 200 watts to drive a game at 60-120fps (remember, almost every game today is designed primarily to run at 30fps on the xbox 360 at 720p, and a single modern nvidia GPU is almost 4 times as powerful). Some really demanding games (crysis), get the whole system - 4 cpus and 2 GPU's - 400 watts. This is what I think OnLive is doing.

So adding it all up, I think 10$ per month per user is a safe upper limit for OnLive's expenses, and its perhaps as low as 5$ per month or less, assuming they typically need two modern intel CPUs and one nvidia GPU per user logged on, adequate bandwidth and servers for a peak occupancy of 20%, and power/cooling for an average occupancy of 10%.

Clearly, all of the numbers scale with the occupancy rates. I think this is why OnLive is at least initially not going for MMOs - they are too addictive and have very high occupancy. More ideal would be single player games and casual games that are played less often. Current data suggests the average gamer plays 10 hours a week, and the average MMO players plays 20 hours per week. The average non-MMO player is thus probably playing less than 10 hours per week. This works out to something more like 5% typical occupancy, but we are interested more in peak occupancy, so my 10%/20% numbers are a reasonable over-estimate of average/peak. Again, you need enough hardware & bandwidth for peak occupancy, but the power & cooling cost is determined by average occupancy.

$10 per month may seem like a high upper limit in monthly expense per user, but even at these expense rates OnLive could be profitable, because this is still less than the cost to the user of running comparable hardware at home.

Here's the simple way of looking at it. That same $600 server rig would cost $1000-1500 for an end user, because they need extra components like a hard drive, monitor, etc which OnLive avoids or gets cheaper, and OnLive buys in bulk. But most importantly, the OnLive hardware is amortized and shared over a number of users. The user's high end gaming rig sits idle most of the time. So the end user's cost to play at home on an even cheap $600 machine amortized over 2 years is still over $30 per month, three times the worst case per user expense of OnLive. And that doesn't even factor in extra power expense for gaming at home. OnLive's total expense is probably more comparable to that of xbox 360. A $500 machine (include necessary periphials) amortized over 5 years is a little under $10 per month. And then xbox live gold service is another $5 a month on top of that. OnLive can thus easily cover its costs and still be less expensive than 360 and PS3, and considerably less expensive than PC gaming.


The game industry post Cloud

In reality, I think that OnLive's costs will be considerably less than $10 per user per month, and will be increasingly less over time. Just like the console makers periodically update their hardware to make the components cheaper, OnLive will be constantly expanding its server farms and always buying the current sweet spot combination of CPU's and GPU's. But Nvidia and Intel refresh their lineups at least twice a year, so OnLive can really ride moore's law continously. Every year OnLive will become more economical and or provide higher FPS and less delay and or support more powerful games.

So its seems possible, even inevitable that OnLive can be economically viable charging a relatively low subscription fee to cover their fixed costs - comparable to Xbox Live's subscription fee (about 5$/month for xbox live gold) . Then they make their real profit on taking a console/distributor like cut of each game sale or rental. For highly anticipated releases, they could even use a pay to play model initially, followed up by traditional purchase or rental later on, just like the movie industry does. Remember the madness that surrounded the Warcraft3 Beta, and think how many people would pay to play Starcraft2 multiplayer ahead of time. I know I would.

If you scale OnLive's investment requirements to support the entire US gaming population, you get a ridiculous hardware investment cost of billions of dollars, but this is no different than a new console launch, which is exactly what OnLive must be viewed as. The Wii has sold 22 million units in the Americas, the 360 is close behind at 17 million. I think these numbers represent majority penetration of the console market in the Americas. To scale to that user base, OnLive will need several million (virtual) servers, which may cost a billion dollars or more, but the investment will pay for itself as it goes - just as it did for Sony and Microsoft. Or they simply will be bought up by some big deep pocket entity which will provide the money, such as Google, or Verizon, or Microsoft.




The size and quantity of the datarooms OnLive will have to build to support even just the US gaming populations is quite staggering. We are talking about perhaps millions of servers in perhaps a dozen different data center locations, drawing the combined power output of an entire large power plant. And thats just for the US. However, we already have a very successful example of a company that has built up a massive distributed network of roughly 500,000 servers in over 40 data centers.

Yes, that company is Google.

To succeed, OnLive will have to build an even bigger and more massive supercomputer system. But I imagine Google makes less money per month for each of its servers than OnLive will eventually make for each of its gaming servers. Just how much money can OnLive eventually make? If OnLive could completley conquer the gaming market, than it stands to completely replace both the current consoles manufacturers AND the retailers. Combined, these entities take perhaps 40-50% of the retail price of a game. Even assuming OnLive only takes a 30% cut, it could thus eventually take in almost 30% of the game industry - estimated at around $20 billion per year in the US alone, and $60 billion world-wide, eventually turning it into another Google.

Another point to consider is that most high end PC sales are mainly used for gaming, and thus the total real gaming market (in terms of total money people spend for gaming) is even larger, perhaps as large as 100 billion worldwide, and OnLive stands to rake a chunk of this in and change the whole industry - further reducing the end consumer PC market and shifting that money into OnLive subscriptions, game charges, etc. part of which in turn covers the centralized hardware cost. NVIDIA and ATI will still get a cut, but perhaps less than they do now. In other words, in the brave new world of OnLive, gamers will only ever need a super-cheap microconsole or netbook to play games, so saving money on consoles and rigs will allow them to buy more games, and all this money gets sucked into OnLive.

Now consider that the game market has consistently grown 20% per year for many years and you can understand why investors have funnelled hundreds of millions into OnLive in order to make it work. And eventually, OnLive can find new ways to 'monetize' gaming (using Google's term), such as ads and so on. Eventually, it should make as much or more per user hour as television does.

Now this is the fantasy of course, but I doubt OnLive will grow to become a Google any time soon, mainly because Nintendo, Sony, Microsoft, and the like aren't going to suddenly dissappear, bringing me to my final point.



But What about the games?

In the end people use a console to play games and thus the actual titles are all that really matters. In one sense part of the pitch of OnLive - 'run high end PC games on your netbook' - is a false premise. Most of OnLive's lineup is current gen console games, and even though OnLive will probably run them at higher fps, this is mainly to compensate for latency. Video compression and all the other factors discussed above will result in an end user experience no better, and often worse than simply playing the console version. (especially if you are far from the data center) OnLive's one high end PC title - crysis - is probably twice as expensive for them to run, and will be seen as somewhat inferior to gamers who have high end rigs and have played the game locally. It will be more like the console version of Crysis. But unfortunately, Crytek's already working on that.

This is really the main obstacle that I think could hold OnLive back - 16 titles at launch is fine, but they are already available on other platforms. Nintendo dominated this current console generation because of its cheap, innovative hardware and a lineup of unique titles that exploit it. I think Nintendo of America's president Reggie Aime was right on the money:

Based on what I’ve seen so far, their opportunity may make a lot of sense for the PC game industry where piracy is an issue. But as far as the home console market goes, I’m not sure there is anything they have shown that solves a consumer need

What does OnLive really offer the consumer? Brag Clips? The ability to spectate any player? Try before you buy? Rent? These are nice(especially the latter two), but can they amount to a system seller?. Its a little cheaper, but is that really important considering most gamers already have a system? It seems that PC games could be where OnLive has more potential, but how much can it currently add over Steam? If OnLive's offerings expanded to include almost all current games, then it truly could acheive a high market penetration, as the successor of Steam (with the ultimate advantage of free trial and rental - which steam can never do). But Valve does have the significant advantage of having a variety of exclusive games built on the Source Engine, which all together (Left for Dead, CounterStrike, Team Fortress 2, Day of Defeat, etc) make up a good chunk of the PC multiplayer segment.

The real opportunity with OnLive is to have exclusive titles, which takes advantage of OnLive's unique super-computer power to create a beyond next gen experience. This is the other direction in which the game industry expands, by slowly moving into the blockbuster story experiences of movies. And this expansion is heavily tech driven.

If such a mega-hit was made, such as a beyond next gen Halo, or GTA, it could rapidly drive OnLive's expansion, because OnLive requires very little user investment to play. At the very least, everyone would be able to try or play the game on some sort of PC they already have, and the microconsole to play on your TV will probably only cost as much as a game itself. So this market is a very different beast than the traditional consoles, where the market for your game is determined by the number of users who own the console. Once OnLive expands its datacenter capacity sufficiently, the market for an exclusive OnLive game is essentially any gamer. So does OnLive have an exclusive in the works? That would be the true game changer.

This is also where OnLive's less flashy competitor, OToy & LivePlace, may be going in a better direction. Instead of building the cloud and a business based first on existing games, you build the cloud and a new cloud engine for a totally new, unique product, which is specifically designed to harness the cloud's super resources and has no similar competitor.

Without either exclusives or a vast, retail competitive game lineup, OnLive won't take over the industry.










Followers