dormando (dormando) wrote in goblin_dev,

server design, why's it so weird?

There're lots of ways to approach the problems I outlined before... The "zone overcrowding" and such.

If you use a fine-tuned border system (like goblin should actually have), you could stick with the normal zone server system. Then the border system will dynamically re-allocate users fullly (if you can make this lightweight enough to not be noticable). Goblin doesn't do this though... I have a few other things in mind.

The big thing is redundancy. In goblin's current design, all state data is in at least two places on different servers. If one server fails, the data can almost always be reconstructed (or at least reconstructed very near to what the state actually was), saved to the DB, and restored live. Optimally you'd have a hotspare server sitting at the side - one that doesn't take any part in the game, but has all of the server processes running in case it's needed. If a robot server were to fail, and this system were very well tuned, connected users would likely not see more than a screen hiccup, and not even get disconnected.

When everything's on one server in the normal zone setup, a machine can crash and all current state data (not backed up to DB recently, and usually all monsters/items in the entire area) are lost forever. Having this kind of redundancy slows down the system a lot though :( Maybe I can think of something else...

Here's how I justify the current separation of processes:

The connection servers are on their own, since they're often going to be managing hundreds or thousands of socket connections. They would be spending a huge amount of time just polling sockets with select(2). Pulling that away from everything else would reduce other important latency-bound issues, and allow all of the internal systems to streamline how many sockets they use (one per server, no matter how many clients are connected to the system).

The central servers do a lot of stuff, and generally have the most connections. However, most of what they do isn't hugely CPU intensive (if at all). They cause more overhead for each one that's added to the system, so they get to carry all of the misc and non-memory-insane functions.

The robot servers handle the NPCs and players in a way that scales better, and allows for the redundancy I mentioned above. They're not paired with anything else since they're fairly memory and CPU intensive, and if combined with the map servers, would be reduced to the zone problems detailed before. That's not *entirely* true though... I can see the central servers doing more to keep state, then having the mapservers handle robots (which reduces latency a lot). However, switching entire robots between servers is expensive, and you can't keep the state between hops very easily.

The mapservers are on their own because of they do two big things: Hold the map data, and scan information for line of sight. These kill memory and CPU (not to mention hugely based on the memory bandwidth on the system), respectively. It's likely easier to get more out of each machine by having these sit on their own, but having them abstracted increases latency and overhead quite a bit.

That's it, I think. I can't figure out how to make it scale better than what I have now, yet keep the redundancy without giving every zone a redundant mirror...
I think in a few future entries in here, I'm going to throw out a lot of whackjob and totally useless ideas that just use different systems. Perhaps if I make enough stupid ideas, I can draw a line between them all and get something good out of it :\ Or at least wedge my line of thought into place so I'm capable of coming up with a good solution to the problem.
  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic