This document describes the architecture and protocol of ashd technically. If you want a brief overview, please see the homepage at http://www.dolda2000.com/~fredrik/ashd/.
The basic premise of ashd is that of standard Unix philosophy; it consists of a number of different programs, each specialized to one precise task, passing HTTP requests around to each other in a manner akin to standard Unix pipelines. This document describes the set of protocols and conventions used between such programs that allows them to interoperate.
All requests within ashd are created by htparser(1), which speaks HTTP with its clients, translates the requests it receives into ashd format, and passes them to a specified handler program. The handler program may choose to respond to the request itself, or pass it on to another handler for further processing.
A request in ashd format consists of 4 structural parts:
/a/b/c?d=e
will be given the initial
rest string a/b/c
.
X-Ash-
,
and to safeguard any potentially sensitive such headers from
forgery, htparser(1) strips any headers with that prefix
from the incoming request.
Handler programs are started either by htparser(1) itself, or in turn by other handler programs, and certain conventions are observed in that process.
There are two basic types of handler programs, persistent and transient, which determines the convention used in starting them. A persistent program will continue running indefinitely, and handle any amount of requests during its lifetime, while a transient program will handle one request only and then exit. The convention of transient programs was created mainly for convenience, since it is easier to write such programs. The htparser(1) program will only start a persistent program as the root handler.
A persistent handler program, when started, is passed a Unix socket of SEQPACKET type on standard input (while standard output and error are inherited from the parent process). Its parent program will then pass one datagram per request on that socket, containing the above listed parts of the request using the datagram format described below. By convention, the handler program should exit normally if it receives end-of-file on the socket.
A transient program, when started, has the response socket connected
to its standard input and output (standard error is inherited from the
parent process). It may be provided arbitrary arguments as supplied by
the program starting it, but the last three arguments are the HTTP
method, the raw URL and the rest string, in that order. The HTTP
headers are converted into environment variables by turning them into
uppercase, replacing any dashs with underscores, and prefixing them
with REQ_
. For example, the HTTP Host
header will be passed as
REQ_HOST
. The HTTP protocol version is passed in an environment
variable called HTTP_VERSION
. It is passed in full; i.e. as
HTTP/1.1
, rather than just 1.1
.
The response socket, as mentioned above, is also used for reading the request-body if the client provides one. For such purposes, htparser(1) ensures that the reader sees end-of-file at the end of the request-body, allowing the reader (unlike in, for example, CGI) to not have to worry about the Content-Length header and counting bytes when reading, and also to handle chunked request-bodies in a natural fashion.
To respond, the handler program needs to write an ordinary HTTP response to the response socket. That is, one line containing the HTTP version, status code and status text, followed by any number of lines with headers, followed by an empty line, followed by the response-body. Basic familiarity with HTTP should relieve this document of detailing the exact format of such a response, but the following points are noteworthy:
The datagram format used for persistent handler programs is simply a sequence of NUL-terminated strings. The datagram starts with the HTTP method, the URL, the HTTP version and the rest string, in that order. They are followed by an arbitrary number of string pairs, one for each header; the first string in a pair being the header name, and the second being the value. The headers are terminated by one instance of the empty string.
Along with the datagram, the response socket is passed using the SCM_RIGHTS ancillary message for Unix sockets. See unix(7), recvmsg(2) and sendmsg(2) for more information. Each datagram will have exactly one associated socket passed with it.
Fredrik Tolf <fredrik@dolda2000.com>