Ashd — A Sane HTTP Daemon

Ashd is a modular HTTP server based on a multi-program architecture. Whereas most other HTTP servers are monolithic programs with, perhaps, loadable modules, Ashd is composed of several different programs, each of which handles requests in different ways, passing requests to each other over a simple protocol (not unlike Unix pipelines). The design of Ashd brings it a number of nice properties, the following being the most noteworthy ones.

Sanity of design
The separation of concerns between different, independent programs is an example of standard Unix philosophy – each program does one thing only, but does it well (I hope). The clean delineation of functions allows each program to be very small and simple – currently, each of the programs in the collection (including even the core HTTP parser program, htparser, as long as one does not count its, quite optional, SSL implementation) is implemented in less than 1,000 lines of C code (and most are considerably smaller than that), allowing them to be easily studied and understood.
Security
Since each program runs in a process of its own, it can be assigned proper permissions. Most noteworthy of all, the userplex program ensures that serving of user home directories (/~user/ URLs, if you will) only happens by code that is actually logged in as the user in question; and the htparser program, being the only program which speaks directly with the clients, can run perfectly well as a non-user (like nobody) and be chroot'ed into an empty directory.
Configuration sanity
Again, since each program only handles a simple task, its configuration can be made quite simple. There is no need for the dirplex program, which only handles service from physical directories, to care about virtual directories, virtual hosts, HTTP protocol parameters or authentication; just as there is no need for the patplex pattern matcher to know about file types or directory hierarchies. Each program's configuration file format can be kept as simple as possible, and indeed most programs lack configuration files entirely and are configured simply with command-line options.
Persistence
Though Ashd is a multi-process program, it is not in the same sense as e.g. Apache. Each request handler continues to run indefinitely and does not spawn multiple copies of itself, meaning that all process state persists between requests – session data can be kept in memory, connections to back-end services can be kept open, and so on.

Current Status

Ashd can be said to be rather mature by now. Having tested it on moderately busy sites (see the Performance section below for an example), no crashes or other signs of instability have been observed over months of continuous operation, and it has not displayed any problems with any particular user-agents. It does lack a few features present in other HTTP servers, but nothing that I, for one, have experienced as a problem; and it also supports a few features not always present in other servers (such as chunked request-bodies).

Design Overview

Though the server as a whole is called "Ashd", there is no actual program by that name. The htparser program of Ashd implements a minimal HTTP server. It speaks HTTP (1.0 and 1.1) with clients, but it does not know the first thing about actually handling the requests it receives. Rather, having started a handler program as specified on the command-line when started, it packages the requests up and passes them (with Unix socket file-descriptor passing) to that handler program. That handler program may choose to only look at part of the URL and pass the request on to other handler programs based on what it sees. In that way, the handler programs form a tree-like structure, corresponding roughly to the URL space of the server. In order to do that, the packaged request which is passed between the handler programs contains the part of the URL which remains to be parsed, referred to as the "rest string" or the "point" (in deference to Emacs parlance).

For an actual, technical description of the architecture and protocols, see the ashd(7) manpage.

Example

As a concrete example, here is how the request to /~fredrik/ashd/index is handled by this particular server.

  1. The request is received over HTTP by htparser. It sets the rest string to ~fredrik/ashd/index and passes it to the patplex process that it was instructed to start by way of command-line argument.
  2. The patplex program, instructed by its configuration file, recognizes the initial tilde of the rest string, strips it off, and passes the request to the userplex program. If userplex is not already running, it starts it, passing the control socket (over which requests are passed) on its standard input, with command-line arguments as specified in the patplex configuration file.
  3. The rest string at this point being fredrik/ashd/index, the userplex program strips off the rest string until the first slash, treating the stripped-off part as a username, fredrik. Having done some tests (as configurable with command-line options) to determine that the username is valid, it checks to see if it has a request handler already running for that user. If not, it forks off, logs in as the user in question and starts a request handler. The request handler can be explicitly provided by the user by creating an executable file named ~/.ashd/handler, but is otherwise started as specified on userplex's command line; normally, and in this case, an instance of the dirplex program.
  4. The dirplex program receives the request with the rest string set to ashd/index. Having been instructed (by way of command-line arguments) to handle the physical directory ~/htpub, it starts chipping off slash-separated elements of the rest string. Starting with the ashd element, it finds a directory under htpub with that name, and interprets the next element, index, relative to it. Finding no entry by that exact name, it looks more thoroughly and finds index.html instead. Having found the physical file ~/htpub/ashd/index.html, it does pattern matching on that physical filename according to its configuration, finding that it should fork out the sendfile program to handle the request.
  5. The sendfile program handles the request by sending the file contents exactly as they are back to htparser over the socket passed between the various handler programs, and then, not being a persistent program, exits. The only thing sendfile does with the rest string is to check that it is now empty. htparser itself takes care of any chunking or other transfer encoding that might be necessary for HTTP keep-alive.

"Screenshot"

The closest thing to a screenshot, the following text dump is an example of how an Ashd process tree might look.
$ ps -AH lS
F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
1 65534  2216     1  20   0  24628   908 ?      Ss   ?          1:54   /usr/local/bin/htparser -Sf -p /var/run/ashd.pid -u nobody -r /var/tmp plain -- errlogger -n ashd patplex /usr/local/etc/ashd/rootpat
0     0  2215     1  20   0   3904   512 ?      Ss   ?          0:00   errlogger -n ashd patplex /usr/local/etc/ashd/rootpat
0     0  2225  2215  20   0   4012   552 ?      S    ?          0:03     patplex /usr/local/etc/ashd/rootpat
4     0  2495  2225  20   0 129380   680 ?      S    ?          0:00       sudo -u www-data accesslog /var/log/http/access.log dirplex /srv/www/htdocs
4    33  2496  2495  20   0   3928   412 ?      S    ?          0:03         accesslog /var/log/http/access.log dirplex /srv/www/htdocs
0    33  2497  2496  20   0   3944   644 ?      S    ?         57:35           dirplex /srv/www/htdocs
0    33  2518  2497  20   0 266024 17404 ?      S    ?          2:10             /usr/bin/python /usr/local/bin/ashd-wsgi ashd.wsgidir
0    33  4032  2497  20   0   4140   620 ?      S    ?          0:00             callfcgi multifscgi 5 php-cgi
0    33  4033  4032  20   0   3900   364 ?      S    ?          0:00               multifscgi 5 php-cgi
0    33  4034  4033  20   0 247204  2332 ?      S    ?          0:01                 php-cgi
0    33  4035  4033  20   0 247204  2400 ?      S    ?          0:01                 php-cgi
0    33  4036  4033  20   0 248508   568 ?      S    ?          0:01                 php-cgi
0    33  4037  4033  20   0 247204  2340 ?      S    ?          0:01                 php-cgi
0    33  4038  4033  20   0 248240  3084 ?      S    ?          0:01                 php-cgi
0    33  1080  2497  20   0   3932   488 ?      S    ?          0:00             callcgi GET /gitweb/?p=ashd.git;a=blame;f=src/htparser.c;hb=HEAD 
0    33  1081  1080  20   0 143944 11136 ?      S    ?          0:00               /usr/bin/perl gitweb/index.cgi gitweb/index.cgi
0    33  1088  1081  20   0   9780  1344 ?      D    ?          0:00                 /usr/bin/git --git-dir=/srv/git/r/ashd.git blame -p HEAD -- src/htparser.c
0     0  3297  2225  20   0  12344   584 ?      S    ?          0:00       userplex -g users -d public_html dirplex -c apache-compat public_html
4   504  3298  3297  20   0   3944   636 ?      Ss   ?          0:00         dirplex -c apache-compat public_html
4   500  3344  3297  20   0   3928   552 -      Ss   ?          0:01         accesslog -a /home/fredrik/.ashd/log/access dirplex htpub
0   500  3419  3344  20   0   3944   664 -      S    ?          0:01           dirplex htpub
0   500  3420  3419  20   0 238960  5252 -      Sl   ?          2:08             /usr/bin/python3 /usr/local/bin/ashd-wsgi3 -m /home/fredrik/.ashd/sockets/pdm3 ashd.wsgidir
0   500  4044  3419  20   0 119412  1672 -      S    ?          0:14             psendfile
0   500  2159  3419  20   0   3932   464 -      S    ?          0:00             htextauth -s ./auth -- dirplex -c ./sub.cf /home/pub
0   500  2160  2159  20   0   3944   524 -      S    ?          0:03               dirplex -c ./sub.cf /home/pub
0   500 31056  3419  20   0   4140   496 -      S    ?          0:00             callfcgi php-cgi
0   500 31057 31056  20   0 247456   732 -      S    ?          0:00               php-cgi
4   506  3586  3297  20   0   3944   664 ?      Ss   ?          0:03         dirplex -c apache-compat public_html
0   506  3830  3586  20   0   7728  1732 ?      S    ?          0:00           callfcgi php-cgi
0   506 15184  3830  20   0 247464  5772 ?      S    ?          0:00             php-cgi
4   505  4045  3297  20   0   3944   600 ?      Ss   ?          0:00         dirplex -c apache-compat public_html
4   507  6376  3297  20   0   3944   496 ?      Ss   ?          0:00         dirplex -c apache-compat public_html
4   510  9476  3297  20   0   3944   632 ?      Ss   ?          0:00         dirplex -c apache-compat public_html
4  1000 12610  3297  20   0   3944   480 ?      Ss   ?          0:00         dirplex -c apache-compat public_html
4   513 24954  3297  20   0   3944   524 ?      Ss   ?          0:00         dirplex -c apache-compat public_html
0   513 24955 24954  20   0   4140   520 ?      S    ?          0:00           callfcgi php-cgi
0   513 24956 24955  20   0 249788   800 ?      S    ?          0:00             php-cgi
4   515 27761  3297  20   0   3944   472 ?      Ss   ?          0:00         dirplex -c apache-compat public_html
4   502 18758  3297  20   0   3944   524 ?      Ss   ?          0:00         dirplex -c apache-compat public_html
$ 

The Cast

The Ashd programs of primary interest are the following:

htparser
The "actual" HTTP server. htparser is the program that listens to TCP connections and speaks HTTP with the clients.
dirplex
dirplex is the program used for serving files from actual directories, in a manner akin to how most other HTTP servers work. In order to do that, dirplex maps URLs into existing physical files, and then performs various kinds of pattern-matching against the names of those physical files to determine the program to call to actually serve them.
patplex
Performs pattern matching against logical request parameters such as the rest string, URL or various headers to determine a program to pass the request to. As such, patplex can be used to implement such things as virtual directories or virtual hosts.
sendfile
A simple handler program for sending literal file contents, normally called by dirplex for serving ordinary files. It handles caching using the Last-Modified and related headers. It also handles MIME-type detection if a specific MIME-type was not specified.
callcgi
Translates an Ashd request into a CGI environment, and runs either the requested file directly as a CGI script, or an external CGI handler. Thus, it can be used to serve, for example, PHP pages.
userplex
Handles "user directories", to use Apache parlance; you may know them otherwise as /~user/ URLs. When a request is made for the directory of a specific user, it makes sure that the request handler runs as the user in question. This functionality was actually what prompted me to begin writing Ashd as a whole, since I was severely annoyed by the fact that Apache serves user directories as the www-data (or similar) user. Serving a user directory properly as its owner ensures both that all dynamic content can access all the relevant files they may need, that any files they create or modify can be properly owned by the right user and that no other users need access to one's home directory; and that one user cannot violate the "web space" of other users just by running PHP scripts to do that. It also relieves the web server from various weird security considerations which comes from trusting users with running code as another user.

Outside the main cast, there are also the htls, accesslog, htextauth, callscgi, callfcgi, httimed, httrcall, errlogger, psendfile and multifscgi programs.

There is also a Python module, which comes with the ashd-wsgi and scgi-wsgi programs for serving WSGI scripts and an undocumented program for serving files with server-side includes. It also contains rather general (documented) modules for writing custom Ashd handlers very conveniently. There are versions of the Python module and programs for both Python 2 and Python 3. The Python 2 module has been verified to work with Jython.

Documentation

Ashd is primarily documented in the same manual pages that this page links to. For a practical introduction, read the accompanying INSTALL file and/or see the simple configuration examples that are included in the examples directory of the source tree.

Download

The latest release of Ashd is 0.12. Download it here.

The latest release of the Python module is 0.5. Download the Python 2 version here, or the Python 3 version here.

The latest source code is available through Git at <git://git.dolda2000.com/ashd>, also viewable through Gitweb.

Performance

Ashd has, at least to my knowledge, not been extensively benchmarked, so its performance characteristics are not well known. It should also be noted that optimization has not been a priority when writing it, with precedence given to brevity and clarity. (Which, on the other hand, means that if optimization should at some point be necessary, there should be much low-hanging fruit to pick.)

The closest thing I have done to benchmarking on Ashd is running it to serve the moderately busy site havenandhearth.com, where most of the traffic consists of static files. There is dynamic content as well, but it receives far less traffic. On this site, Ashd serves on average about 1.5 million requests per day on about 100 simultaneous HTTP connections, with temporary peaks of slightly above 100 requests per second on 1000-1500 simultaneous connections. A good portion (I would estimate it to about 20%) of the traffic happens via HTTPS. Under these circumstances, the programs involved in the most common requests consume CPU time as follows.

ProgramAverage CPU usage
htparser0.53%
patplex0.041%
dirplex0.12%
sendfile1.3%
accesslog0.036%
Total2.0%

The above measurements are calculated from the cumulative CPU time used by the respective programs after having run for several weeks. By comparison, the PHP engine running the site's discussion forum, which receives about 100,000 requests per day, uses 5.4% CPU. The CPU is an Intel Core i7-920.

In this context, it should be noted that the multi-process architecture of Ashd makes it inherently parallel to some degree, despite the individual programs being single-threaded. It is probably to be expected that htparser will be the first bottleneck, particularly because of its single-threaded nature.

Valid XHTML 1.1! Valid CSS! This site attempts not to be broken.
Author: Fredrik Tolf <fredrik@dolda2000.com>
Last changed: Thu Feb 13 03:39:28 2014