Commit | Line | Data |
---|---|---|
2f2601ca FT |
1 | ashd(7) |
2 | ======= | |
3 | ||
4 | NAME | |
5 | ---- | |
6 | ashd - A Sane HTTP Daemon | |
7 | ||
8 | DESCRIPTION | |
9 | ----------- | |
10 | ||
11 | This document describes the architecture and protocol of ashd | |
12 | technically. If you want a brief overview, please see the homepage at | |
13 | <http://www.dolda2000.com/~fredrik/ashd/>. | |
14 | ||
15 | The basic premise of ashd is that of standard Unix philosophy; it | |
16 | consists of a number of different programs, each specialized to one | |
17 | precise task, passing HTTP requests around to each other in a manner | |
16c2bec3 FT |
18 | akin to standard Unix pipelines. This document describes the set of |
19 | protocols and conventions used between such programs that allows them | |
20 | to interoperate. | |
2f2601ca FT |
21 | |
22 | REQUESTS | |
23 | -------- | |
24 | ||
b4d2abe8 | 25 | All requests within ashd are created by *htparser*(1), which speaks |
2f2601ca FT |
26 | HTTP with its clients, translates the requests it receives into ashd |
27 | format, and passes them to a specified handler program. The handler | |
28 | program may choose to respond to the request itself, or pass it on to | |
29 | another handler for further processing. | |
30 | ||
31 | A request in ashd format consists of 4 structural parts: | |
32 | ||
33 | HTTP method, URL and version:: | |
34 | ||
35 | The HTTP header line information, exactly as specified by the | |
36 | client. That is, any escape sequences in the URL are passed | |
37 | around in non-processed form, since each program needs to | |
38 | handle escape processing in its own way. | |
39 | ||
40 | The rest string:: | |
41 | ||
42 | The 'rest string' (sometimes referred to as the 'point', in | |
43 | deference to Emacs parlance) is the part of the URL which | |
44 | remains to be processed. Each handler program is free to | |
45 | modify the rest string (usually, but not necessarily, by | |
46 | removing leading parts of it) before passing the request on to | |
b4d2abe8 | 47 | another handler. When *htparser*(1) initially constructs a |
2f2601ca FT |
48 | request, it forms the rest string from the URL by stripping |
49 | off the initial slash and the query parameters. In other | |
50 | words, a request to `/a/b/c?d=e` will be given the initial | |
51 | rest string `a/b/c`. | |
52 | ||
53 | The HTTP headers:: | |
54 | ||
55 | The HTTP headers are parsed and passed along with the request | |
b4d2abe8 | 56 | as they are, but *htparser*(1) itself, and some handler |
2f2601ca FT |
57 | programs, add some more headers, prefixed with `X-Ash-`, |
58 | and to safeguard any potentially sensitive such headers from | |
b4d2abe8 | 59 | forgery, *htparser*(1) strips any headers with that prefix |
2f2601ca FT |
60 | from the incoming request. |
61 | ||
62 | The response socket:: | |
63 | ||
64 | Along with the request information, a socket for responding is | |
65 | passed. A handler program that wishes to actually respond to a | |
66 | request needs only output a well-formed HTTP response on this | |
67 | socket and then close it. The details are covered below, but | |
b4d2abe8 FT |
68 | note that the socket is connected to *htparser*(1) rather than |
69 | the client itself, and that *htparser* will do any transfer | |
2f2601ca FT |
70 | encoding that may be required for HTTP keep-alive. The |
71 | response socket is also used for reading the request-body, if | |
72 | the client provides one. | |
73 | ||
74 | HANDLERS | |
75 | -------- | |
76 | ||
b4d2abe8 | 77 | Handler programs are started either by *htparser*(1) itself, or in |
2f2601ca FT |
78 | turn by other handler programs, and certain conventions are observed |
79 | in that process. | |
80 | ||
81 | There are two basic types of handler programs, persistent and | |
82 | transient, which determines the convention used in starting them. A | |
83 | persistent program will continue running indefinitely, and handle any | |
84 | amount of requests during its lifetime, while a transient program will | |
85 | handle one request only and then exit. The convention of transient | |
86 | programs was created mainly for convenience, since it is easier to | |
b4d2abe8 | 87 | write such programs. The *htparser*(1) program will only start a |
2f2601ca FT |
88 | persistent program as the root handler. |
89 | ||
90 | A persistent handler program, when started, is passed a Unix socket of | |
4db55fdd FT |
91 | SEQPACKET type on standard input (while standard output and error are |
92 | inherited from the parent process). Its parent program will then pass | |
2f2601ca FT |
93 | one datagram per request on that socket, containing the above listed |
94 | parts of the request using the datagram format described below. By | |
95 | convention, the handler program should exit normally if it receives | |
96 | end-of-file on the socket. | |
97 | ||
98 | A transient program, when started, has the response socket connected | |
99 | to its standard input and output (standard error is inherited from the | |
100 | parent process). It may be provided arbitrary arguments as supplied by | |
101 | the program starting it, but the last three arguments are the HTTP | |
102 | method, the raw URL and the rest string, in that order. The HTTP | |
103 | headers are converted into environment variables by turning them into | |
104 | uppercase, replacing any dashs with underscores, and prefixing them | |
105 | with `REQ_`. For example, the HTTP `Host` header will be passed as | |
106 | `REQ_HOST`. The HTTP protocol version is passed in an environment | |
107 | variable called `HTTP_VERSION`. It is passed in full; i.e. as | |
108 | `HTTP/1.1`, rather than just `1.1`. | |
109 | ||
110 | The response socket, as mentioned above, is also used for reading the | |
111 | request-body if the client provides one. For such purposes, | |
b4d2abe8 | 112 | *htparser*(1) ensures that the reader sees end-of-file at the end of |
56e2d434 FT |
113 | the request-body, allowing the reader (unlike in, for example, CGI) to |
114 | not have to worry about the Content-Length header and counting bytes | |
115 | when reading, and also to handle chunked request-bodies in a natural | |
116 | fashion. | |
2f2601ca FT |
117 | |
118 | To respond, the handler program needs to write an ordinary HTTP | |
119 | response to the response socket. That is, one line containing the HTTP | |
120 | version, status code and status text, followed by any number of lines | |
121 | with headers, followed by an empty line, followed by the | |
122 | response-body. Basic familiarity with HTTP should relieve this | |
123 | document of detailing the exact format of such a response, but the | |
124 | following points are noteworthy: | |
125 | ||
126 | * The HTTP version is actually ignored; it must simply be there for | |
64813428 FT |
127 | completeness. For the sake of forward compatibility, however, |
128 | handlers should output "HTTP/1.1". | |
2f2601ca | 129 | |
b4d2abe8 | 130 | * In the header, Unix line endings are accepted; *htparser*(1) will |
2f2601ca FT |
131 | still use CRLF line endings when passing the response to the |
132 | client. | |
133 | ||
134 | * The response socket should be closed when the entire body has been | |
b4d2abe8 | 135 | written. *htparser*(1) itself will take care of anything needed for |
2f2601ca FT |
136 | HTTP keep-alive, such as chunking. It is recommended, however, that |
137 | the handler program provides the Content-Length header if it can be | |
b4d2abe8 | 138 | calculated in advance, since *htparser*(1) will not need to add |
2f2601ca FT |
139 | chunking in such cases. |
140 | ||
b4d2abe8 | 141 | * *htparser*(1) will not provide an error message to the client in |
2f2601ca FT |
142 | case the response socket is closed before a complete response has |
143 | been written to it, so a handler program should always provide an | |
144 | error message by itself if a request cannot be handled for some | |
145 | reason. | |
146 | ||
147 | PROTOCOL | |
148 | -------- | |
149 | ||
150 | The datagram format used for persistent handler programs is simply a | |
151 | sequence of NUL-terminated strings. The datagram starts with the HTTP | |
152 | method, the URL, the HTTP version and the rest string, in that | |
153 | order. They are followed by an arbitrary number of string pairs, one | |
154 | for each header; the first string in a pair being the header name, and | |
155 | the second being the value. The headers are terminated by one instance | |
156 | of the empty string. | |
157 | ||
158 | Along with the datagram, the response socket is passed using the | |
b4d2abe8 FT |
159 | SCM_RIGHTS ancillary message for Unix sockets. See *unix*(7), |
160 | *recvmsg*(2) and *sendmsg*(2) for more information. Each datagram will | |
2f2601ca FT |
161 | have exactly one associated socket passed with it. |
162 | ||
163 | AUTHOR | |
164 | ------ | |
165 | Fredrik Tolf <fredrik@dolda2000.com> | |
166 | ||
167 | SEE ALSO | |
168 | -------- | |
d3ef283f | 169 | *htparser*(1), RFC 2616 |