lib: Fixed mblock bug for epoll.
[ashd.git] / doc / ashd.doc
CommitLineData
2f2601ca
FT
1ashd(7)
2=======
3
4NAME
5----
6ashd - A Sane HTTP Daemon
7
8DESCRIPTION
9-----------
10
11This document describes the architecture and protocol of ashd
12technically. If you want a brief overview, please see the homepage at
13<http://www.dolda2000.com/~fredrik/ashd/>.
14
15The basic premise of ashd is that of standard Unix philosophy; it
16consists of a number of different programs, each specialized to one
17precise task, passing HTTP requests around to each other in a manner
16c2bec3
FT
18akin to standard Unix pipelines. This document describes the set of
19protocols and conventions used between such programs that allows them
20to interoperate.
2f2601ca
FT
21
22REQUESTS
23--------
24
b4d2abe8 25All requests within ashd are created by *htparser*(1), which speaks
2f2601ca
FT
26HTTP with its clients, translates the requests it receives into ashd
27format, and passes them to a specified handler program. The handler
28program may choose to respond to the request itself, or pass it on to
29another handler for further processing.
30
31A request in ashd format consists of 4 structural parts:
32
33HTTP method, URL and version::
34
35 The HTTP header line information, exactly as specified by the
36 client. That is, any escape sequences in the URL are passed
37 around in non-processed form, since each program needs to
38 handle escape processing in its own way.
39
40The rest string::
41
42 The 'rest string' (sometimes referred to as the 'point', in
43 deference to Emacs parlance) is the part of the URL which
44 remains to be processed. Each handler program is free to
45 modify the rest string (usually, but not necessarily, by
46 removing leading parts of it) before passing the request on to
b4d2abe8 47 another handler. When *htparser*(1) initially constructs a
2f2601ca
FT
48 request, it forms the rest string from the URL by stripping
49 off the initial slash and the query parameters. In other
50 words, a request to `/a/b/c?d=e` will be given the initial
51 rest string `a/b/c`.
52
53The HTTP headers::
54
55 The HTTP headers are parsed and passed along with the request
b4d2abe8 56 as they are, but *htparser*(1) itself, and some handler
2f2601ca
FT
57 programs, add some more headers, prefixed with `X-Ash-`,
58 and to safeguard any potentially sensitive such headers from
b4d2abe8 59 forgery, *htparser*(1) strips any headers with that prefix
2f2601ca
FT
60 from the incoming request.
61
62The response socket::
63
64 Along with the request information, a socket for responding is
65 passed. A handler program that wishes to actually respond to a
66 request needs only output a well-formed HTTP response on this
67 socket and then close it. The details are covered below, but
b4d2abe8
FT
68 note that the socket is connected to *htparser*(1) rather than
69 the client itself, and that *htparser* will do any transfer
2f2601ca
FT
70 encoding that may be required for HTTP keep-alive. The
71 response socket is also used for reading the request-body, if
72 the client provides one.
73
74HANDLERS
75--------
76
b4d2abe8 77Handler programs are started either by *htparser*(1) itself, or in
2f2601ca
FT
78turn by other handler programs, and certain conventions are observed
79in that process.
80
81There are two basic types of handler programs, persistent and
82transient, which determines the convention used in starting them. A
83persistent program will continue running indefinitely, and handle any
84amount of requests during its lifetime, while a transient program will
85handle one request only and then exit. The convention of transient
86programs was created mainly for convenience, since it is easier to
b4d2abe8 87write such programs. The *htparser*(1) program will only start a
2f2601ca
FT
88persistent program as the root handler.
89
90A persistent handler program, when started, is passed a Unix socket of
4db55fdd
FT
91SEQPACKET type on standard input (while standard output and error are
92inherited from the parent process). Its parent program will then pass
2f2601ca
FT
93one datagram per request on that socket, containing the above listed
94parts of the request using the datagram format described below. By
95convention, the handler program should exit normally if it receives
96end-of-file on the socket.
97
98A transient program, when started, has the response socket connected
99to its standard input and output (standard error is inherited from the
100parent process). It may be provided arbitrary arguments as supplied by
101the program starting it, but the last three arguments are the HTTP
102method, the raw URL and the rest string, in that order. The HTTP
103headers are converted into environment variables by turning them into
104uppercase, replacing any dashs with underscores, and prefixing them
105with `REQ_`. For example, the HTTP `Host` header will be passed as
106`REQ_HOST`. The HTTP protocol version is passed in an environment
107variable called `HTTP_VERSION`. It is passed in full; i.e. as
108`HTTP/1.1`, rather than just `1.1`.
109
110The response socket, as mentioned above, is also used for reading the
111request-body if the client provides one. For such purposes,
b4d2abe8 112*htparser*(1) ensures that the reader sees end-of-file at the end of
56e2d434
FT
113the request-body, allowing the reader (unlike in, for example, CGI) to
114not have to worry about the Content-Length header and counting bytes
115when reading, and also to handle chunked request-bodies in a natural
116fashion.
2f2601ca
FT
117
118To respond, the handler program needs to write an ordinary HTTP
119response to the response socket. That is, one line containing the HTTP
120version, status code and status text, followed by any number of lines
121with headers, followed by an empty line, followed by the
122response-body. Basic familiarity with HTTP should relieve this
123document of detailing the exact format of such a response, but the
124following points are noteworthy:
125
126 * The HTTP version is actually ignored; it must simply be there for
64813428
FT
127 completeness. For the sake of forward compatibility, however,
128 handlers should output "HTTP/1.1".
2f2601ca 129
b4d2abe8 130 * In the header, Unix line endings are accepted; *htparser*(1) will
2f2601ca
FT
131 still use CRLF line endings when passing the response to the
132 client.
133
134 * The response socket should be closed when the entire body has been
b4d2abe8 135 written. *htparser*(1) itself will take care of anything needed for
2f2601ca
FT
136 HTTP keep-alive, such as chunking. It is recommended, however, that
137 the handler program provides the Content-Length header if it can be
b4d2abe8 138 calculated in advance, since *htparser*(1) will not need to add
2f2601ca
FT
139 chunking in such cases.
140
b4d2abe8 141 * *htparser*(1) will not provide an error message to the client in
2f2601ca
FT
142 case the response socket is closed before a complete response has
143 been written to it, so a handler program should always provide an
144 error message by itself if a request cannot be handled for some
145 reason.
146
147PROTOCOL
148--------
149
150The datagram format used for persistent handler programs is simply a
151sequence of NUL-terminated strings. The datagram starts with the HTTP
152method, the URL, the HTTP version and the rest string, in that
153order. They are followed by an arbitrary number of string pairs, one
154for each header; the first string in a pair being the header name, and
155the second being the value. The headers are terminated by one instance
156of the empty string.
157
158Along with the datagram, the response socket is passed using the
b4d2abe8
FT
159SCM_RIGHTS ancillary message for Unix sockets. See *unix*(7),
160*recvmsg*(2) and *sendmsg*(2) for more information. Each datagram will
2f2601ca
FT
161have exactly one associated socket passed with it.
162
163AUTHOR
164------
165Fredrik Tolf <fredrik@dolda2000.com>
166
167SEE ALSO
168--------
d3ef283f 169*htparser*(1), RFC 2616