Google

Medusa: A High-Performance Internet Server Architecture

What is Medusa?

Medusa is an architecture for high-performance, robust, long-running TCP/IP servers (like HTTP, FTP, and NNTP). Medusa differs from most other server architectures in that it runs as a single process, multiplexing I/O with its various client and server connections within a single process/thread.

Medusa is written in Python, a high-level object-oriented language that is particularly well suited to building powerful, extensible servers. Medusa can be extended and modified at run-time, even by the end-user. User 'scripts' can be used to completely change the behavior of the server, and even add in completely new server types.

How Does it Work?

Most Internet servers are built on a 'forking' model. ('Fork' is a Unix term for starting a new process.) Such servers actually invoke an entire new process for every single client connection. This approach is simple to implement, but does not scale very well to high-load situations. Lots of clients mean a lot of processes, which gobble up large quantities of virtual memory and other system resources. A high-load server thus needs to have a lot of memory. Many popular Internet servers are running with hundreds of megabytes of memory.

The I/O bottleneck.

The vast majority of Internet servers are I/O bound - for any one process, the CPU is sitting idle 99.9% of the time, usually waiting for input from an external device (in the case of an Internet server, it is waiting for input from the network). This problem is exacerbated by the imbalance between server and client bandwidth: most clients are connecting at relatively low bandwidths (28.8 kbits/sec or less, with network delays and inefficiencies it can be far lower). To a typical server CPU, the time between bytes for such a client seems like an eternity! (Consider that a 200 Mhz CPU can perform roughly 50,000 operations for each byte received from such a client).

A simple metaphor for a 'forking' server is that of a supermarket cashier: for every 'customer' being processed [at a cash register], another 'person' must be created to handle each client session. But what if your checkout clerks were so fast they could each individually handle hundreds of customers per second? Since these clerks are almost always waiting for a customer to come through their line, you have a very large staff, sitting around idle 99.9% of the time! Why not replace this staff with a single super-clerk , flitting from aisle to aisle ?

This is exactly how Medusa works! It multiplexes all its I/O through a single select() loop - this loop can handle hundreds, even thousands of simultaneous connections - the actual number is limited only by your operating system. For a more technical overview, see Asynchronous Socket Programming

Why is it Better?

Performance

The most obvious advantage to a single long-running server process is a dramatic improvement in performance. There are several types of overhead involved in the forking model:

  • Process creation/destruction.

    Starting up a new process is an expensive operation on any operating system. Virtual memory must be allocated, libraries must be initialized, and the operating system now has yet another task to keep track of. This start-up cost is so high that it is actually noticeable to people! For example, the first time you pull up a web page with 15 inline images, while you are waiting for the page to load you may have created and destroyed at least 16 processes on the web server.

  • Virtual Memory

    Each process also requires a certain amount of virtual memory space to be allocated on its behalf. Even though most operating systems implement a 'copy-on-write' strategy that makes this much less costly than it could be, the end result is still very wasteful. A 100-user FTP server can still easily require hundreds of megabytes of real memory in order to avoid thrashing (excess paging activity due to lack of real memory).

Medusa eliminates both types of overhead. Running as a single process, there is no per-client creation/destruction overhead. This means each client request is answered very quickly. And virtual memory requirements are lowered dramatically. Memory requirements can even be controlled with more precision in order to gain the highest performance possible for a particular machine configuration.

Persistence

Another major advantage to the single-process model is persistence. Often it is necessary to maintain some sort of state information that is available to each and every client, i.e., a database connection or file pointer. Forking-model servers that need such shared state must arrange some method of getting it - usually via an IPC (inter-process communication) mechanism such as sockets or named pipes. IPC itself adds yet another significant and needless overhead - single-process servers can simply share such information within a single address space.

Implementing persistence in Medusa is easy - the address space of its process (and thus its open database handles, variables, etc...) is available to each and every client.

Not a Strawman

All right, at this point many of my readers will say I'm beating up on a strawman. In fact, they will say, such server architectures are already available - like Microsoft's Internet Information Server. IIS avoids the above-named problems by using threads. Threads are 'lightweight processes' - they represent multiple concurrent execution paths within a single address space. Threads solve many of the problems mentioned above, but also create new ones:
  • 'Threaded' programs are very difficult to write - especially with servers that want to utilize the 'persistence' feature - great care must be taken when accessing or modifying shared resources.
  • There is still additional system overhead when using threads.
  • Not all operating systems support threads, and even on those that do, it is difficult to use them in a portable fashion.

Threads are required in only a limited number of situations. In many cases where threads seem appropriate, an asynchronous solution can actually be written with less work, and will perform better. Avoiding the use of threads also makes access to shared resources (like database connections) easier to manage, since multi-user locking is not necessary.

Note: In the rare case where threads are actually necessary, Medusa can of course use them, if the host operating system supports them. For example, an image-conversion or fractal-generating server might be CPU-intensive, rather than I/O-bound, and thus a good candidate for running in a separate thread.

Another solution (used by many current HTTP servers on Unix) is to 'pre-spawn' a large number of processes - clients are attached to each server in turn. Although this alleviates the performance problem up to that number of users, it still does not scale well. To reliably and efficiently handle [n] users, [n] processes are still necessary.

Other Advantages

  • Extensibility

    Since Medusa is written in Python, it is easily extensible. No separate compilation is necessary. New facilities can be loaded and unloaded into the server without any recompilation or linking, even while the server is running. [For example, Medusa can be configured to automatically upgrade itself to the latest version every so often].

  • Security

    Many of the most popular security holes (popular, at least, among the mischievous) exploit the fact that servers are usually written in a low-level language. Unless such languages are used with extreme care, weaknesses can be introduced that are very difficult to predict and control. One of the favorite loop-holes is the 'memory buffer overflow', used by the Internet Worm (and many others) to gain unwarranted access to Internet servers.

Such problems are virtually non-existent when working in a high-level language like Python, where for example all access to variables and their components are checked at run-time for valid range operations. Even unforseen errors and operating system bugs can be caught - Python includes a full exception-handling system which promotes the construction of 'highly available' servers. Rather than crashing the entire server, Medusa will usually inform the user, log the error, and keep right on running.

Current Features

  • The currently available version of Medusa includes integrated World Wide Web (HTTP) and file transfer (FTP) servers. This combined server can solve a major performance problem at any high-load site, by replacing two forking servers with a single non-forking, non-threading server. Multiple servers of each type can also be instantiated.

  • Also included is a secure 'remote-control' capability, called a monitor server. With this server enabled, authorized users can 'log in' to the running server, and control, manipulate, and examine the server while it is running .

  • A 'chat server' is included, as a sample server implementation. It's simple enough to serve as a good introduction to extending Medusa. It implements a simple IRC-like chat service that could easily be integrated with the HTTP server for an integrated web-oriented chat service. [For example, a small Java applet could be used on the client end to communicate with the server].

  • Several extensions are available for the HTTP server, and more will become available over time. Each of these extensions can be loaded/unloaded into the server dynamically.

    Status Extension
    Provides status information via the HTTP server. Can report on any or all of the installed servers, and on the extensions loaded into the HTTP server. [If this server is running Medusa, you should be able to see it here]
    Default Extension
    Provides the 'standard' file-delivery http server behavior. Uses the same abstract filesystem object as the FTP server. Supports the HTTP/1.1 persistent connection via the 'Connection: Keep-Alive' header.
    HTTP Proxy Extension
    Act as a proxy server for HTTP requests. This lets Medusa be used as a 'Firewall' server. Plans for this extension include cache support, filtering (to ignore, say, all images from 'http://obnoxious.blinky.advertisements.com/'), logging, etc...
    Planned
    On the drawing board are pseudo-filesystem extensions, access to databases like mSQL and Oracle, (and on Windows via ODBC), authentication, server-side includes, and a full-blown proxy/cache system for both HTTP and FTP. Feedback from users will help me decide which areas to concentrate on, so please email me any suggestions.

  • An API is evolving for users to extend not just the HTTP server but Medusa as a whole, mixing in other server types and new capabilities into existing servers. NNTP and POP3 servers have already been written, and will probably be provided as an add-on. I am actively encouraging other developers to produce (and if they wish, to market) Medusa extensions.

Where Can I Get It?

Medusa is available from http://www.nightmare.com/medusa

Feedback, both positive and negative, is much appreciated; please send email to rushing@nightmare.com.