| 1 | ashd(7) |
| 2 | ======= |
| 3 | |
| 4 | NAME |
| 5 | ---- |
| 6 | ashd - A Sane HTTP Daemon |
| 7 | |
| 8 | DESCRIPTION |
| 9 | ----------- |
| 10 | |
| 11 | This document describes the architecture and protocol of ashd |
| 12 | technically. If you want a brief overview, please see the homepage at |
| 13 | <http://www.dolda2000.com/~fredrik/ashd/>. |
| 14 | |
| 15 | The basic premise of ashd is that of standard Unix philosophy; it |
| 16 | consists of a number of different programs, each specialized to one |
| 17 | precise task, passing HTTP requests around to each other in a manner |
| 18 | akin to standard Unix pipelines. This document describes the set of |
| 19 | protocols and conventions used between such programs that allows them |
| 20 | to interoperate. |
| 21 | |
| 22 | REQUESTS |
| 23 | -------- |
| 24 | |
| 25 | All requests within ashd are created by *htparser*(1), which speaks |
| 26 | HTTP with its clients, translates the requests it receives into ashd |
| 27 | format, and passes them to a specified handler program. The handler |
| 28 | program may choose to respond to the request itself, or pass it on to |
| 29 | another handler for further processing. |
| 30 | |
| 31 | A request in ashd format consists of 4 structural parts: |
| 32 | |
| 33 | HTTP method, URL and version:: |
| 34 | |
| 35 | The HTTP header line information, exactly as specified by the |
| 36 | client. That is, any escape sequences in the URL are passed |
| 37 | around in non-processed form, since each program needs to |
| 38 | handle escape processing in its own way. |
| 39 | |
| 40 | The rest string:: |
| 41 | |
| 42 | The 'rest string' (sometimes referred to as the 'point', in |
| 43 | deference to Emacs parlance) is the part of the URL which |
| 44 | remains to be processed. Each handler program is free to |
| 45 | modify the rest string (usually, but not necessarily, by |
| 46 | removing leading parts of it) before passing the request on to |
| 47 | another handler. When *htparser*(1) initially constructs a |
| 48 | request, it forms the rest string from the URL by stripping |
| 49 | off the initial slash and the query parameters. In other |
| 50 | words, a request to `/a/b/c?d=e` will be given the initial |
| 51 | rest string `a/b/c`. |
| 52 | |
| 53 | The HTTP headers:: |
| 54 | |
| 55 | The HTTP headers are parsed and passed along with the request |
| 56 | as they are, but *htparser*(1) itself, and some handler |
| 57 | programs, add some more headers, prefixed with `X-Ash-`, |
| 58 | and to safeguard any potentially sensitive such headers from |
| 59 | forgery, *htparser*(1) strips any headers with that prefix |
| 60 | from the incoming request. |
| 61 | |
| 62 | The response socket:: |
| 63 | |
| 64 | Along with the request information, a socket for responding is |
| 65 | passed. A handler program that wishes to actually respond to a |
| 66 | request needs only output a well-formed HTTP response on this |
| 67 | socket and then close it. The details are covered below, but |
| 68 | note that the socket is connected to *htparser*(1) rather than |
| 69 | the client itself, and that *htparser* will do any transfer |
| 70 | encoding that may be required for HTTP keep-alive. The |
| 71 | response socket is also used for reading the request-body, if |
| 72 | the client provides one. |
| 73 | |
| 74 | HANDLERS |
| 75 | -------- |
| 76 | |
| 77 | Handler programs are started either by *htparser*(1) itself, or in |
| 78 | turn by other handler programs, and certain conventions are observed |
| 79 | in that process. |
| 80 | |
| 81 | There are two basic types of handler programs, persistent and |
| 82 | transient, which determines the convention used in starting them. A |
| 83 | persistent program will continue running indefinitely, and handle any |
| 84 | amount of requests during its lifetime, while a transient program will |
| 85 | handle one request only and then exit. The convention of transient |
| 86 | programs was created mainly for convenience, since it is easier to |
| 87 | write such programs. The *htparser*(1) program will only start a |
| 88 | persistent program as the root handler. |
| 89 | |
| 90 | A persistent handler program, when started, is passed a Unix socket of |
| 91 | SEQPACKET type on standard input (while standard output and error are |
| 92 | inherited from the parent process). Its parent program will then pass |
| 93 | one datagram per request on that socket, containing the above listed |
| 94 | parts of the request using the datagram format described below. By |
| 95 | convention, the handler program should exit normally if it receives |
| 96 | end-of-file on the socket. |
| 97 | |
| 98 | A transient program, when started, has the response socket connected |
| 99 | to its standard input and output (standard error is inherited from the |
| 100 | parent process). It may be provided arbitrary arguments as supplied by |
| 101 | the program starting it, but the last three arguments are the HTTP |
| 102 | method, the raw URL and the rest string, in that order. The HTTP |
| 103 | headers are converted into environment variables by turning them into |
| 104 | uppercase, replacing any dashs with underscores, and prefixing them |
| 105 | with `REQ_`. For example, the HTTP `Host` header will be passed as |
| 106 | `REQ_HOST`. The HTTP protocol version is passed in an environment |
| 107 | variable called `HTTP_VERSION`. It is passed in full; i.e. as |
| 108 | `HTTP/1.1`, rather than just `1.1`. |
| 109 | |
| 110 | The response socket, as mentioned above, is also used for reading the |
| 111 | request-body if the client provides one. For such purposes, |
| 112 | *htparser*(1) ensures that the reader sees end-of-file at the end of |
| 113 | the request-body, allowing the reader (unlike in, for example, CGI) to |
| 114 | not have to worry about the Content-Length header and counting bytes |
| 115 | when reading, and also to handle chunked request-bodies in a natural |
| 116 | fashion. |
| 117 | |
| 118 | To respond, the handler program needs to write an ordinary HTTP |
| 119 | response to the response socket. That is, one line containing the HTTP |
| 120 | version, status code and status text, followed by any number of lines |
| 121 | with headers, followed by an empty line, followed by the |
| 122 | response-body. Basic familiarity with HTTP should relieve this |
| 123 | document of detailing the exact format of such a response, but the |
| 124 | following points are noteworthy: |
| 125 | |
| 126 | * The HTTP version is actually ignored; it must simply be there for |
| 127 | completeness. For the sake of forward compatibility, however, |
| 128 | handlers should output "HTTP/1.1". |
| 129 | |
| 130 | * In the header, Unix line endings are accepted; *htparser*(1) will |
| 131 | still use CRLF line endings when passing the response to the |
| 132 | client. |
| 133 | |
| 134 | * The response socket should be closed when the entire body has been |
| 135 | written. *htparser*(1) itself will take care of anything needed for |
| 136 | HTTP keep-alive, such as chunking. It is recommended, however, that |
| 137 | the handler program provides the Content-Length header if it can be |
| 138 | calculated in advance, since *htparser*(1) will not need to add |
| 139 | chunking in such cases. |
| 140 | |
| 141 | * *htparser*(1) will not provide an error message to the client in |
| 142 | case the response socket is closed before a complete response has |
| 143 | been written to it, so a handler program should always provide an |
| 144 | error message by itself if a request cannot be handled for some |
| 145 | reason. |
| 146 | |
| 147 | PROTOCOL |
| 148 | -------- |
| 149 | |
| 150 | The datagram format used for persistent handler programs is simply a |
| 151 | sequence of NUL-terminated strings. The datagram starts with the HTTP |
| 152 | method, the URL, the HTTP version and the rest string, in that |
| 153 | order. They are followed by an arbitrary number of string pairs, one |
| 154 | for each header; the first string in a pair being the header name, and |
| 155 | the second being the value. The headers are terminated by one instance |
| 156 | of the empty string. |
| 157 | |
| 158 | Along with the datagram, the response socket is passed using the |
| 159 | SCM_RIGHTS ancillary message for Unix sockets. See *unix*(7), |
| 160 | *recvmsg*(2) and *sendmsg*(2) for more information. Each datagram will |
| 161 | have exactly one associated socket passed with it. |
| 162 | |
| 163 | AUTHOR |
| 164 | ------ |
| 165 | Fredrik Tolf <fredrik@dolda2000.com> |
| 166 | |
| 167 | SEE ALSO |
| 168 | -------- |
| 169 | *htparser*(1), RFC 2616 |