The Problem of Shards, Servers, and Queues In MMOs 253
An editorial at GamesIndustry takes a look at a couple of problems many MMOs have failed to solve as the genre has evolved over the last decade: log-in queues and a split player base. The most recent example is Aion, which launched in Europe and North America a few weeks ago. Players on some of the game's servers had to deal with lengthy queues until enough people left the starting areas and spread throughout the game. To NCSoft's credit, the queues are mostly gone already, and it wasn't simply launching with too few servers that was the problem (nor was simply launching more servers a perfect solution, as Warhammer proved). In fact, several servers had no queues at all, but many players had set their sights on the more popular ones — a problem facing other MMOs as well. At this point, it becomes a matter of programming — how can the developers for these MMOs build the networking aspect of the game such that more hardware can easily be allocated when it's needed, and also make it easier for people to play together without the restriction of different shards or servers? EVE Online has done well with a single game universe, but it's not clear how far that model can scale upwards.
Eve online runs Windows Server (Score:5, Interesting)
Re:Champions did it too (Score:3, Interesting)
I rather wouldn't play a game which world is "instanced" like that on zones. You're basically in the same world, but you're not. It just complicates things, and in that case multiple servers would be better.
EVE Online's one world model would be the ideal and it seems to work good there. Of course its also divided into zones and most popular ones can get laggy if theres lots of players and stuff going on - but its still the same world where everyone is.
Having one single world would also make the areas with fewer players more interesting (most 10->60 areas in WoW have been quite empty for long time)
It's not just technical scale (Score:4, Interesting)
The question of scale for an MMO applies to more than just the ability of the servers to host an increasing number of simultaneous players in a single virtual world. It's also about gameplay, and the MMO paradox: the more massive the world, the less important each player. I would argue that one of the factors in WoW's enduring success is that Blizzard knew when to add new servers not purely for performance reasons, but also to keep the number of players in any particular server at a particular sweet spot.
Too few players and there's no sense of a living, persistent world; too many players and that world is stifling and uninviting.
Actually, it will be interesting to see how things play out with Sony's MAG -- an action game that sits somewhere between classic multiplayer and MMO scale.
Re:Champions did it too (Score:3, Interesting)
Re:Computational Problem (Score:3, Interesting)
The fundamental design flaw they all have is that servers represent space in the game, it's a flawed assumption about the best model to use.
I'm probably just being naive here, but isn't the flaw really that the servers represent a fixed amount of space in the game, while at the same time the amount ("density") of user activity in a given space can vary, and therefore the amount of processing and I/O needed to support that space can become more than the server can support?
If so, perhaps a solution would be to make the server's "game-space-allocation" variable... i.e. if a server gets too busy, it allocates another server (from Amazon's E3 cloud or wherever) and transfer half of its game-space (and therefore half of its load) over to the new server. Conversely, if a server becomes underutililized it could merge itself back together with another underutilized server to cut costs.
Of course that would still leave unsolved the problem of seamless interactions across neighboring servers, but it's Sunday so I'm not going to think about that.
Easy fix (Score:2, Interesting)
Re:Champions did it too (Score:4, Interesting)
Re:Computational Problem (Score:5, Interesting)
This fellow was inadvertently correct. Representing space (volume) by putting sections of it onto single computers is a bad idea. Inevitably, no matter how good your design or how well-ordered your content is, some areas are going to become more popular than others. Hence, you're going to get congestion.
A much better model is representing player (and non-player) actions as work units then distributing them evenly across a network of linked computers then getting an integrated result for each "region" (zone, map, city, whatever) each server frame. Make the server frames something like 50 frames per second and have player actions lag about 2-3 frames behind server-side action and you'll see little delay on the client machines but help mitigate potential race conditions between player actions (both players simultaneously attacking and reporting that they attacked on server frame 2,348,342 and both score a fatal blow on the other).
To mitigate player lag you can distribute update packets based on the density of the update vs the distance of the events from the player vs the player's average data rate.
Of course, that's just my two cents.
Re:Champions did it too (Score:3, Interesting)
Champions Online has the best strategy, in my opinion. As far as I've been playing, they've never had an issue with lag, because they can cap each zone-instance's population to whatever they deem the best. They don't have serious overpopulation problems (like where you can't do a quest because you're waiting for 50 other people to finish it), because of the same reason. You can jump back and forth between instances by clicking one button, and zones with your friends are clearly marked (though it'd be nice if it told you WHICH friends).
What they are unfortunately lacking is a world-wide group searching interface, like a global LFG channel. Currently you are force to instance-hop when you're looking for party members.
I don't buy the "immersion-breaking" factor. What's far more immersion-breaking is finding a new friend in real life who plays the game but plays on a different server, then having to cough up some money to switch servers (in WoW's case).
Re:Champions did it too (Score:3, Interesting)
read the anon's post. Eve is the only game to be able to handle tons of people in one area. Wow horribly crashed during AQ opening...do you remember the >3000 MS pings during the event? That thing was beyond laggy both for the pc's and the servers.
population maximums (single server)
Eve: ~45K players at once
Wow: ~8K players at once
Champions online: less.
Basically, nobody can handle the parallelism of eve, mostly because they actually had open instances within the cluster that people could go to (deadspace, missions, etc).
This creates way more social aspects than any other game, but gameplay itself has to be handled differently. Seriously, the social aspects of single server are a thousandfold greater.
an MMO with that many people, as smart as it would be to see companies do such a thing with their technology, would just crash at the idea of 500 people trying to kill the same rabbit for x quest. Only now is blizzard getting around it by letting people see their own live instancing of certain things (aka cataclysm at the same time as the barrens).
Technology takes time. Give it 10 years or so before this is fixed.
Re:Eve online runs Windows Server (Score:3, Interesting)
Re:Champions did it too (Score:4, Interesting)
Ditto. It's really annoying, from my experience with Tabula Rasa (which uses the same schema.)
"Hey where are you?"
"City A."
"I'm in City A too, meet near the bank?"
"I'm standing in the bank."
"Oh, I am too but I don't see you..."
"Which City A are you in?"
"City A3"
"Oh damn, I'm in A6, let me find a portal so I can move from City A back into the other City A."
There are few ways to break immersion quicker than that.
Re:Computational Problem (Score:4, Interesting)
Whether it was meant that way or not, that's a good point; I would wonder if it wasn't servers that represented space in the game (in terms of your computational point of view) but rather database connections - because when you think about it, you're really just moving data structures around during game play.
All the clever graphics runs in the memory of the gamers' PCs; details about in-game items and their status (i.e. item stats, equipped, in the backpack, where in the backpack, their state of repair etc.) just live in little pages that move up and down the link between gamer and "gamespace".
True, there's a lot of item contention involved (who killed what mob, rolling on loot etc.) but I wonder if the true question of managing parallel game spaces and making them work isn't embedded in the replication of tables from one database instance to another. This is well established technique for most databases. That could imply the need for very fast interprocessor communications, presumably because warlocks consume rowlocks.
The most complex software I've been into deeply enough to notice was VMS, some years ago - and that appeared to be a case where clever data structures did nearly all the work (that and REI of course ;) and good data structures meant a lot less algorithm. It's an approach. To recap, I think it's easier to think of "game space" as a database issue, rather than a processor one. Your Beowulf cluster of hot grits just provides CPU cycles, really, and that's not quite as difficult to share.
Re:Computational Problem (Score:3, Interesting)
They are as prepared to solve it as the business model is prepared to pay for it, and frankly, there's no proof that spending the amount of money it would take to increase the amount of concurrent players in a specific area at once would be worth the investment. So, why?
And furthermore, your presumption about the limitations being the result of a design flaw regarding servers representing game space is incorrect.
The problem is simply one of technical capability and cost: technology isn't good enough and the cost is too high (these are correlated). Servers are irrelevant; the problem is the network. The technological networking requirements to handle multiple concurrent players in a game space grows logarithmically with the number of players who can interact in 'real time'.
Two players in the same area need only send and receive information about themselves and the other player. Fifty players in the same area means the server must send the actions of all 50 of those players to each and every player. In other words, on a rough maximum, that's approximately 600 times as much data going around than with the two player scenario (2500 vs 4).
Up this number to 250, and you have over 60,000 times as much data moving around. At 1000 players you have 250,000 times as much data moving around than with the original 2 players.
Let's imagine that with all of the overhead -- network security, game action, other information, etc -- each player must be sent several kilobytes a second over the net. For solo players, this would be the minimum and if they had a really slow connection, like say dial-up, they really wouldn't be able to enjoy the experience. Remember, this is for solo players who are alone in the MMO world. But what about those well-organized huge battles?
Even if only 1k/second of each player's network resource requirement was movement/action information, and you have 1000 players in an area, then you are potentially already up to 1000kb/sec of bandwidth that each player would need to just to experience the event in 'real time'. And for the server? The server has to send this data to all 1000 of those players in 'real time', which means that for an epic battle in say Star Wars Galaxies of the Rebels vs. the Imperials, with 500 players on each side, all in a fantastic melee, the server is forced to have an upload bandwidth of 1,000,000kb/s, or 1000mb/s or 1gb/s.
Now, I don't know about you, but I think maintaining a 1 gigabyte per second upload bandwidth for what amounts to a very small percentage of the playerbase is not exactly a feasible situation given the technology we have today.
disclaimer: I've used rough numbers, only to serve as examples. I believe my premise is sound, though I admit that the final numbers could work out to be much lower than what I've written here. I'm doubtful that anyone with real experience in the matter is going to come forward with 'real' numbers, but I would be grateful if they did.
Re:Nature Online (Score:4, Interesting)
NO is also the only game I know where the biggest flamewars (and even whole PvP events) revolve around whether and what game you get transfered to after permadeath. Curious. What other game do you know where there are players that spend a sizable portion their time pondering what to do when they stop playing?
Re:Eve online runs Windows Server (Score:1, Interesting)
It's also known for requiring an hour or so of downtime every single day or the entire system buckles under the pressure.
So?
A nightly downtime (only an hour -- big whoop) allows for minor software updates, hardware upgrades and changes, resolution of non-critical network stability issues, updates of in-game statistics (mechanics-wise, such as faction standing and such that would cause extreme load if it were updated in real time) and the ability to perform backups while databases are not being modified... thereby making them more reliable and giving CCP the ability -- if it were ever needed -- to give an exact point that systems were restored to that people can look back at, rather than some random time of day when people won't remember what they were doing at the time. Oh, the system also performs a full reboot, completely clearing all resources and also cleaning out (in-game) systems as people go to dock for the night. Can you just imagine how less laggy other MMOs would be if they all had a nightly reboot? I've played several, and -- for a very specific example -- D&D Online has several shards, and on each shard in each location there are several instances that only support a certain number of people. Even with the multi-instancing, things are still laggy as hell. There is also typically a 30-minute queue before you can login. The only time I encounter a queue in EVE is right after downtime and it's usually only 10 minutes or so.
There is nothing wrong with a nightly reboot, no matter what OS a game's servers use.
Re:Champions did it too (Score:3, Interesting)
NCSoft did it both ways with Aion; there are multiple servers, but "inside" each server, a zone can have one or more "channels", each channel being a separate instance of the zone. The starting zones have ten channels, but the other zones typically only have 3 to 5, with the number of channels being managed to prevent players from being scattered too thinly across a zone or crowded too close together. NCSoft responded to the load at launch by adding two more servers; I think they would have done better to add more channels to the starting zones. However, because of the game background, NCSoft was not only managing the population of each server but the proportion of players on each side, so they had a number of variables to play with that complicated things.
NCSoft's superhero MMORPG, City of Heroes, handles load more dynamically; as a zone gets too crowded, the server dyamically spawns new instances of that zone and prevents people from entering a particular instance of a zone when it reaches capacity; I remember during one event seeing the choices of "Atlas Park", "Atlas Park 2", on up to "Atlas Park 9" when changing zones. CoH is a bit of an oddity in that the vast majority of its content is instanced, and it doesn't matter which zone instance you're in when you enter a mission, everyone on the team enters the same mission instance -- so a team that had players scattered across Atlas Park 2, 3, 5, and 7 could all go to their mission door and be back together once they enter the mission (and then choose which of the zone instances they want to exit into when the mission is done, so they stay together).
Why not go fully peer-to-peer? (Score:5, Interesting)
We designed a "peer-to-peer" MMO many years ago, although I have to say we didn't implement it and the devil is definitely in the implementation. Anyway, you can read the design docs here [annexia.org]. After it was clear we weren't going to write it, I published the docs just to give a priority date (1998) to invalidate any stupid patents ...
Rich.
There's more to consider than load (Score:2, Interesting)
This article is primarily about load, but the idea of single game-world has to fundamentally change the game system. In many MMOs, such as WoW, RuneScape and so on, there would be a gaming problem. Even with small world populations it's a minor issue.
Fundamentally it's the suspension of disbelief that would be required.
For example. Across WoW there might be one guild trying to complete some über quest at a time. However, if you compressed all US worlds, there would be hundreds of guilds doing it simultaneously. Now I'm just me, but it would strike me as just slightly odd if 300 people all handed in Onyxia's dead body to Stormwind one after another.
In EVE Online, nobody is a hero. Everyone's one cog that can be a 'Butterfly Effect'. The problem with most modern MMOs is that 99.9% of the time it's lots of people playing the same RPG. Not everyone contributing in some way to an overall plot. Ahn'Qiraj is the only time WoW got near that and as http://www.youtube.com/watch?v=71sVv__DryA perfectly demonstrates, that was frankly rubbish.
Re:Eve online runs Windows Server (Score:1, Interesting)
I don't blame ccp for running eve on Microsoft. I would as well in that situation. Microsoft has worked closely with CCP to optimize the software to run the EVE cluster well. Something they are unlikely to get anywhere else for the same cost.
Re:Computational Problem (Score:4, Interesting)
You're close to right. There's been a lot of work on completely non-spatial distributed server architectures (mostly using distributed hash tables and occasionally multicast between servers) but they don't scale as well as expected. There's a reason for spatial partitioning - things close to each other tend to interact more.
Fixed partitioning is the easy way to do spatial partioning - you drop boundaries and migrate objects that cross them. Often you use design to allow interaction between partitions to be ignored, so you don't need any communication between servers besides migration. Dynamic partitioning would largely solve this problem (as, generally, you're never actually in a region where you can interact with more than a several dozen other players) but is HARD if you want consistency.
Guaranteeing instantaneous consistency is impossible - you've got hundreds to thousands of servers and keeping them all in lockstep at 30Hz with a potential for same-frame interaction between objects just isn't going to happen. Instead, what if you thought of the game as a simulation of abstract moore machines. Every frame each object looks at the state of of the wold and then sets it's state for the next frame and maybe creates other objects. Inter-object events can be modeled as objects that exist for a single frame and the receiver looks for. This means no instantaneous inter-object communication, but that's generally acceptable and likely unavoidable.
Now, network latency between servers could still be a problem: as the system grows it's impossible to keep it in lock-step even with the slightly relaxed communication requirements. This can be solved by employing a technique used in distributed databases: eventual consistency. Lets explicitly allow inconsistent state to exist between servers instantaneously, but guaranty that it will be eventually be resolved. Lets have objects use a subscription model for their observations and send those subscriptions to all servers that might contain matching objects. When a server sees an object matching a remote subscription it sends over the object with enough state to run a dumb predictor (that can't look at the dynamic state of any other objects). The server with the subscribing object can then use that state and predictor to keep the object in sync and then service the actual subscription. Back on the server with the object that mached the subscription, it makes a local copy of the proxy it sent. It updates the copy along the the rest of the work, and compares it with the real object to detect when the remote server made an incorrect prediction. In this event, it sends an update to the subscribing server with the new state.
Now you should say, "Wait! That update won't arrive until the server had already run the simulation for that frame!". Yep, you're probably right. However, because there's this subscription data available, the server can very efficiently re-run the simulation, touching only objects that might have detected the change (and any consequential changes). This might cause further updates to other servers, but because of generally sparse nature of interactions in games, the state on all servers for a given frame will quickly be consistent.
This was almost my thesis before I got pulled into graphics and wii remotes. If this interests you and you'd like to see an early paper about it, drop me a line: [my_username]@[my_username].org.
Re:EVE's JITA is just as laggy as AQ (Score:3, Interesting)
More accurately Eve partitions their blade servers slightly differently than this. A busy system like Jita will be on it's own blade, and probably one of the newer ones. However slow systems with fewer players may be on a blade with 20 other systems.
With alliance and fleet battles, an administrator actually needs to move the busy system on to it's own blade to handle the load. I'm pretty sure this isn't automatic yet. I've heard fleet commanders say they'd talked to a GM and they were moving the system. Some systems are always busy so are already on their own blade.
Another factor is that Eve online limits each player's bandwidth to 28.8k. When you are talking about 500 ships in the same grid it's _impossible_ for there not to be lag. Every client needs to talk to the server which then needs to pass on what that player is doing to everyone else in the grid. There's simply not enough bandwidth to handle the throughput of all this data past a certain point. They limit everyone to 28.8k in the interest of fairness so people don't gain advantage because of their connection speed as well as the fact that their bandwidth is finite. At some point you get packet loss, which is why you activate your guns and nothing happens, or you are firing, run out of ammo and your weapons keep firing for 5 minutes and you can't control your ship. In reality you are probably already dead.
For the same reason when you are on teamspeak with 250 people in channel, no one is allowed to talk but officers and fleet commanders. Otherwise bandwidth gets choked and everyone is talking over each other.
Sorry I'm an ex eve geek =D I still play occasionally now but it's around 1-2 hours a week mission whoring in empire. I used to do the whole pvp/alliance thing (RAZOR, and before that fighting with IRON alliance as a 0.0 guest corp) so do have some 3 years experience with lag in 500+ ship alliance battles, but then I got a life ;-)