Ashd — A Sane HTTP Daemon

Ashd is a modular HTTP server based on a multi-program architecture. Whereas most other HTTP servers are monolithic programs with, perhaps, loadable modules, Ashd is composed of several different programs, each of which handles requests in different ways, passing requests to each other over a simple protocol (not unlike Unix pipelines). The design of Ashd brings it a number of nice properties, the following being the most noteworthy ones.

Sanity of design
The separation of concerns between different, independent programs is an example of standard Unix philosophy – each program does one thing only, but does it well (I hope). The clean delineation of functions allows each program to be very small and simple – currently, each of the programs in the collection (including even the core HTTP parser program, htparser, as long as one does not count its, quite optional, SSL implementation) is implemented in less than 1,000 lines of C code (and most are considerably smaller than that), allowing them to be easily studied and understood.
Security
Since each program runs in a process of its own, it can be assigned proper permissions. Most noteworthy of all, the userplex program ensures that serving of user home directories (/~user/ URLs, if you will) only happens by code that is actually logged in as the user in question; and the htparser program, being the only program which speaks directly with the clients, can run perfectly well as a non-user (like nobody) and be chroot'ed into an empty directory.
Configuration sanity
Again, since each program only handles a simple task, its configuration can be made quite simple. There is no need for the dirplex program, which handles only service from physical directories, to care about virtual directories, virtual hosts, HTTP protocol parameters or authentication; just as there is no need for the patplex pattern matcher to know about file types or directory hierarchies. Each program's configuration file format can be kept as simple as possible, and most programs are configured simply with command-line options.
Persistence
Though Ashd is a multi-process program, it is not in the same sense as e.g. Apache. Each request handler continues to run indefinitely and does not spawn multiple copies of itself, meaning that all process state persists between requests – session data can be kept in memory, connections to back-end services can be kept open, and so on.

Current Status

Ashd can still be said to be a bit immature, in that it is not yet well-tested, and still lacks certain features that are present in most other HTTP servers. However, it is perfectly usable for most everyday purposes, and it can be said to be "self-hosting" in that it hosts this website, and does so without any obvious problems.

Design Overview

Though the server as a whole is called "Ashd", there is no actual program by that name. The htparser program of Ashd implements a minimal HTTP server. It speaks HTTP (1.0 and 1.1) with clients, but it does not know the first thing about actually handling the requests it receives. Rather, having started a handler program as specified on the command-line when started, it packages the requests up and passes them (with Unix socket file-descriptor passing) to that handler program. That handler program may choose to only look at part of the URL and pass the request on to other handler programs based on what it sees. In that way, the handler programs form a tree-like structure, corresponding roughly to the URL space of the server. In order to do that, the packaged request which is passed between the handler programs contains the part of the URL which remains to be parsed, referred to as the "rest string" or the "point" (in deference to Emacs parlance).

For an actual, technical description of the architecture and protocols, see the ashd(7) manpage.

Example

As a concrete example, here is how the request to /~fredrik/ashd/index is handled by this particular server.

  1. The request is received over HTTP by htparser. It sets the rest string to ~fredrik/ashd/index and passes it to the patplex process that it was instructed to start by way of command-line argument.
  2. The patplex program, instructed by its configuration file, recognizes the initial tilde of the rest string, strips it off, and passes the request to the userplex program. If userplex is not already running, it starts it, passing the control socket (over which requests are passed) on its standard input, with command-line arguments as specified in the patplex configuration file.
  3. The rest string at this point being fredrik/ashd/index, the userplex program strips off the rest string until the first slash, treating the stripped-off part as a username, fredrik. Having done some tests (as configurable with command-line options) to determine that the username is valid, it checks to see if it has a request handler already running for that user. If not, it forks off, logs in as the user in question and starts a request handler. The request handler can be explicitly provided by the user by creating an executable file named ~/.ashd/handler, but is otherwise started as specified on userplex's command line; normally, and in this case, an instance of the dirplex program.
  4. The dirplex program receives the request with the rest string set to ashd/index. Having been instructed (by way of command-line arguments) to handle the physical directory ~/htpub, it starts chipping off slash-separated elements of the rest string. Starting with the ashd element, it finds a directory under htpub with that name, and interprets the next element, index, relative to it. Finding no entry by that exact name, it looks more thoroughly and finds index.html instead. Having found the physical file ~/htpub/ashd/index.html, it does pattern matching on that physical filename according to its configuration, finding that it should fork out the sendfile program to handle the request.
  5. The sendfile program handles the request by sending the file contents exactly as they are back to htparser over the socket passed between the various handler programs, and then, not being a persistent program, exits. The only thing sendfile does with the rest string is to check that it is now empty. htparser itself takes care of any chunking or other transfer encoding that might be necessary for HTTP keep-alive.

The Cast

The Ashd programs of primary interest are the following:

htparser
The "actual" HTTP server. htparser is the program that listens to TCP connections and speaks HTTP with the clients.
dirplex
dirplex is the program used for serving files from actual directories, in a manner akin to how most other HTTP servers work. In order to do that, dirplex maps URLs into existing physical files, and then performs various kinds of pattern-matching against the names of those physical files to determine the program to call to actually serve them.
patplex
Performs pattern matching against logical request parameters such as the rest string, URL or various headers to determine a program to pass the request to. As such, patplex can be used to implement such things as virtual directories or virtual hosts.
sendfile
A simple handler program for sending literal file contents, normally called by dirplex for serving ordinary files. It handles caching using the Last-Modified and related headers. It also handles MIME-type detection if a specific MIME-type was not specified.
callcgi
Translates an Ashd request into a CGI environment, and runs either the requested file directly as a CGI script, or an external CGI handler. Thus, it can be used to serve, for example, PHP pages.
userplex
Handles "user directories", to use Apache parlance; you may know them otherwise as /~user/ URLs. When a request is made for the directory of a specific user, it makes sure that the request handler runs as the user in question. This functionality was actually what prompted me to begin writing Ashd as a whole, since I was severely annoyed by the fact that Apache serves user directories as the www-data (or similar) user. Serving a user directory properly as its owner ensures both that all dynamic content can access all the relevant files they may need, that any files they create or modify can be properly owned by the right user and that no other users need access to one's home directory; and that one user cannot violate the "web space" of other users just by running PHP scripts to do that. It also relieves the web server from various weird security considerations which comes from trusting users with running code as another user.

Outside the main cast, there are also the htls, accesslog, htextauth, callscgi, callfcgi, errlogger and multifscgi programs.

There is also a Python module, which comes with the ashd-wsgi and scgi-wsgi programs for serving WSGI scripts and an undocumented program for serving files with server-side includes. It also contains rather general (documented) modules for writing custom Ashd handlers very conveniently. There are versions of the Python module and programs for both Python 2 and Python 3.

Documentation

Ashd is primarily documented in the same manual pages that this page links to. For a practical introduction, read the accompanying INSTALL file and/or see the simple configuration examples that are included in the examples directory of the source tree.

Download

The latest release of Ashd is 0.10. Download it here.

The latest release of the Python module is 0.4. Download the Python 2 version here, or the Python 3 version here.

The latest source code is available through Git at <git://git.dolda2000.com/ashd>, also viewable through Gitweb.

Performance

Ashd has, at least to my knowledge, not been extensively benchmarked, so its performance characteristics are not well known.

The closest thing I have done to benchmarking on Ashd is running it to serve the moderately busy site havenandhearth.com, where most of the traffic consists of static files. There is dynamic content as well, but it receives far less traffic. On this site, Ashd serves on average about 1.5 million requests per day on about 100 simultaneous HTTP connections, with temporary peaks of slightly above 100 requests per second on 1000-1500 simultaneous connections. A good portion (I would estimate it to about 20%) of the traffic happens via HTTPS. Under these circumstances, the programs involved in the most common requests consume CPU time as follows.

ProgramAverage CPU usage
htparser0.53%
patplex0.041%
dirplex0.12%
sendfile1.3%
accesslog0.036%
Total2.0%

The above measurements are calculated from the cumulative CPU time used by the respective programs after having run for several weeks. By comparison, the PHP engine running the site's discussion forum, which receives about 100,000 requests per day, uses 5.4% CPU. The CPU is an Intel Core i7-920.

In this context, it should be noted that the multi-process architecture of Ashd makes it inherently parallell to some degree, despite the individual programs being single-threaded. It is probably to be expected that htparser will be the first bottleneck, particularly because of its single-threaded nature.

Valid XHTML 1.1! Valid CSS! This site attempts not to be broken.
Author: Fredrik Tolf <fredrik@dolda2000.com>
Last changed: Tue Dec 13 19:22:42 2011