dirplex: Improved 404 handling.
[ashd.git] / doc / dirplex.doc
... / ...
CommitLineData
1dirplex(1)
2==========
3
4NAME
5----
6dirplex - Physical directory handler for ashd(7)
7
8SYNOPSIS
9--------
10*dirplex* [*-hN*] [*-c* 'CONFIG'] 'DIR'
11
12DESCRIPTION
13-----------
14
15The *dirplex* handler maps URLs into physical files or directories,
16and, having found a matching file or directory, it performs various
17kinds of pattern-matching against its physical name to determine what
18handler to call in order to serve the request. The mapping procedure
19and pattern matching are described below.
20
21Having found a handler to serve a file or directory with, *dirplex*
22adds the `X-Ash-File` header to the request with a path to the
23physical file, before passing the request on to the handler.
24
25*dirplex* is a persistent handler, as defined in *ashd*(7).
26
27OPTIONS
28-------
29
30*-h*::
31
32 Print a brief help message to standard output and exit.
33
34*-N*::
35
36 Do not read the global configuration file `dirplex.rc`.
37
38*-c* 'CONFIG'::
39
40 Read an extra configuration file. If 'CONFIG' contains any
41 slashes, it is opened by that exact name. Otherwise, it is
42 searched for in the same way as the global configuration file
43 (see CONFIGURATION below).
44
45URL-TO-FILE MAPPING
46-------------------
47
48Mapping URLs into physical files is an iterative procedure, each step
49looking in one single physical directory, starting with 'DIR'. For
50each step, a path element is stripped off the beginning of the rest
51string and examined, the path element being either the leading part of
52the rest string up until (but not including) the first slash, or the
53entire rest string if it contains no slashes. If the rest string is
54empty, the directory being examined is considered the result of the
55mapping. Otherwise, any escape sequences in the path element under
56consideration are unescaped before examining it.
57
58If the path element names a directory in the current directory, the
59procedure continues in that directory, unless there is nothing left of
60the rest string, in which case *dirplex* responds with a HTTP 301
61redirect to the same URL, but ending with a slash. Otherwise, the
62remaining rest string begins with a slash, which is stripped off
63before continuing. If the path element names a file, that file is
64considered the result of the mapping (even if the rest string has not
65been exhausted yet).
66
67If the path element does not name anything in the directory under
68consideration, but contains no dots, then the directory is searched
69for a file whose name before the first dot matches the path
70element. If there is such a file, it is considered the result of the
71mapping.
72
73If the result of the mapping procedure is a directory, it is checked
74for the presence of a filed named by the *index-file* configuration
75directive (see CONFIGURATION below). If there is such a file, it is
76considered the final result instead of the directory itself. If the
77index file name contains no dots and there is no exact match, then,
78again, the directory is searched for a file whose name before the
79first dot matches the index file name.
80
81See also 404 RESPONSES below.
82
83CONFIGURATION
84-------------
85
86Configuration in *dirplex* comes from several sources. When *dirplex*
87starts, unless the *-N* option is given, it tries to find a global
88configuration file named `dirplex.rc`. It looks in `$HOME/.ashd/etc`,
89and then in all directories named by the *PATH* environment variable,
90appended with `../etc/ashd`. For example, then, if *PATH* is
91`/usr/local/bin:/bin:/usr/bin`, the directories `$HOME/.ashd/etc`,
92`/usr/local/etc/ashd`, `/etc/ashd` and `/usr/etc/ashd` are searched
93for `dirplex.rc`, in that order. Only the first file found is used,
94should there exist several.
95
96If the *-c* option is given to *dirplex*, it too specifies a
97configuration file to load. If the name given contains any slashes, it
98is opened by that exact name. Otherwise, it is searched for in the
99same manner as the global configuration file.
100
101In addition, all directories traversed by *dirplex* when mapping a URL
102into a physical file may contain a file called `.htrc`, which may
103specify extra configuration options for all files in and beneath that
104directory.
105
106`.htrc` files are checked periodically and reread if changed. The
107global configuration file and any file named by the *-c* option,
108however, are never reexamined.
109
110When using the configuration files for deciding what to do with a
111found file, they are examined in order of their "distance" from that
112file. `.htrc` files found in the directory or directories containing
113the file are considered "closest" to the file under consideration,
114followed by any configuration file named by the *-c* option, followed
115by the global configuration file.
116
117Each configuration file is a sequence of configuration stanzas, each
118stanza being an unindented starting line, followed by zero or more
119indented follow-up lines adding options to the stanza. The starting
120line of a stanza is referred to as a "configuration directive"
121below. Each line is a sequence of whitespace-separated words. A word
122may contain whitespace if such whitespace is escaped, either by
123enclosing the word in double quotes, or by escaping individual
124whitespace characters with a preceding backslash. Backslash quoting
125may also be used to treat double quotes or another backslash literally
126as part of the word. Empty lines are ignored, and lines whose first
127character after leading whitespace is a hash character (`#`) are
128treated as comments and ignored.
129
130The following configuration directives are recognized:
131
132*include* ['FILENAME'...]::
133
134 Read the named files and act as if their contents stood in
135 place of the *include* stanza. A 'FILENAME' may be a glob
136 pattern, in which case all matching files are used, sorted by
137 their filenames. If a 'FILENAME' is a relative path, it is
138 treated relative to the directory containing the file from
139 which the *include* stanza was read, even if the inclusion has
140 been nested. Inclusions may be nested to any level.
141
142*index-file* ['FILENAME'...]::
143
144 The given 'FILENAMEs' are used for finding index files (see
145 URL-TO-FILE MAPPING above). Specifying *index-file* overrides
146 entirely any previous specification in a more distant
147 configuration file, rather than adding to it. Zero 'FILENAMEs'
148 may be given to turn off index file searching completely. The
149 *index-file* directive accepts no follow-up lines.
150
151*child* 'NAME'::
152
153 Declares a named, persistent request handler (see *ashd*(7)
154 for a more detailed description of persistent handlers). It
155 must contain exactly one follow-up line, *exec* 'PROGRAM'
156 ['ARGS'...], specifying the program to execute and the
157 arguments to pass it. If given in a `.htrc` file, the program
158 will be started in the same directory as the `.htrc` file
159 itself. The *child* stanza itself serves as the identity of
160 the forked process -- only one child process will be forked
161 per stanza, and if that child process exits, it will be
162 restarted the next time the stanza would be used. If a `.htrc`
163 file containing *child* stanzas is reloaded, any currently
164 running children are reused for *child* stanzas in the new
165 file with matching names (even if the *exec* line has
166 changed).
167
168*fchild* 'NAME'::
169
170 Declares a named, transient request handler (see *ashd*(7) for
171 a more detailed description of transient handlers). It must
172 contain exactly one follow-up line, *exec* 'PROGRAM'
173 ['ARGS'...], specifying the program to execute and the
174 arguments to pass it. In addition to the specified arguments,
175 the HTTP method, raw URL and the rest string will be appended
176 as described in *ashd*(7). If given in a `.htrc` file, the
177 program will be started in the same directory as the `.htrc`
178 file itself.
179
180*match* [*directory*]::
181
182 Specifies a filename pattern-matching rule. The
183 pattern-matching procedure and the follow-up lines accepted by
184 this stanza are described below, under MATCHING.
185
186*capture* 'HANDLER' ['FLAGS']::
187
188 Only meaningful in `.htrc` files. If a *capture* directive is
189 specified, then the URL-to-file mapping procedure as described
190 above is aborted as soon as the directory containing the
191 `.htrc` file is encountered. The request is passed, with any
192 remaining rest string, to the specified 'HANDLER', which must
193 be a named request handler specified either in the same
194 `.htrc` file or elsewhere. The *capture* directive accepts no
195 follow-up lines. Note that the `X-Ash-File` header is not
196 added to requests passed via *capture* directives. If 'FLAGS'
197 contain the character `R`, this *capture* directive will be
198 ignored if it is in the root directory that *dirplex* serves.
199
200MATCHING
201--------
202
203When a file or directory has been found by the mapping procedure (see
204URL-TO-FILE MAPPING above), the name of the physical file is examined
205to determine a request handler to pass the request to. Note that only
206the physical file name is ever considered; any logical request
207parameters such as the request URL or the rest string are entirely
208ignored.
209
210To match a file, any *match* stanzas specified by any `.htrc` file or
211in the global configuration files are searched in order of their
212"distance" (see CONFIGURATION above) from the actual file. If it is a
213directory which is being considered, only *match* stanzas with the
214*directory* parameter are considered; otherwise, if it is a file, only
215*match* stanzas without the *directory* parameter are considered.
216
217A *match* stanza must contain at least one follow-up line specifying
218match rules. All rules must match for the stanza as a whole to match.
219The following rules are recognized:
220
221*filename* 'PATTERN'...::
222
223 Matches if the name of the file under consideration matches
224 any of the 'PATTERNs'. A 'PATTERN' is an ordinary glob
225 pattern, such as `*.php`. See *fnmatch*(3) for more
226 information.
227
228*pathname* 'PATTERN'...::
229
230 Matches if the entire path of the file under consideration
231 matches any of the 'PATTERNs'. A 'PATTERN' is an ordinary glob
232 pattern, except that slashes are not matched by wildcards. See
233 *fnmatch*(3) for more information. If a *pathname* rule is
234 specified in a `.htrc` file, the path will be examined as
235 relative to the directory containing the `.htrc` file, rather
236 than to the root directory being served.
237
238*default*::
239
240 Matches if and only if no *match* stanza without a *default*
241 rule matches (in any configuration file).
242
243*local*::
244
245 Valid only in `.htrc` files, *local* matches if and only if
246 the file under consideration resides in the same directory as
247 the containing `.htrc` file.
248
249In addition to the rules, a *match* stanza must contain exactly one
250follow-up line specifying the action to take if it matches. The
251following actions are recognized:
252
253*handler* 'HANDLER'::
254
255 'HANDLER' must be a named handler (see CONFIGURATION
256 above). The named handler is searched for not only in the same
257 configuration file as the *match* stanza, but in all
258 configuration files that are valid for the file under
259 consideration, in order of distance. As such, a more deeply
260 nested `.htrc` file may override the specified handler without
261 having to specify any new *match* stanzas.
262
263*fork* 'PROGRAM' ['ARGS'...]::
264
265 Run a transient handler for this file, as if it were specified
266 by a *fchild* stanza. This action exists mostly for
267 convenience.
268
269A *match* stanza may also contain any number of the following,
270optional directives:
271
272*set* 'HEADER' 'VALUE'::
273
274 If the *match* stanza is selected as the match for a file, the
275 named HTTP 'HEADER' in the request is set to 'VALUE' before
276 passing the request on to the specified handler.
277
278*xset* 'HEADER' 'VALUE'::
279
280 *xset* does exactly the same thing as *set*, except that
281 'HEADER' is automatically prepended with the `X-Ash-`
282 prefix. The intention is only to make configuration files
283 look nicer in this very common case.
284
285404 RESPONSES
286-------------
287
288A HTTP 404 response is sent to the client if
289
290 * The mapping procedure fails to find a matching physical file;
291 * A path element is encountered during mapping which, after URL
292 unescaping, either begins with a dot or contains slashes;
293 * The mapping procedure finds a file which is neither a directory nor
294 a regular file (or a symbolic link to any of the same);
295 * An empty, non-final path element is encountered during mapping; or
296 * The mapping procedure results in a file which is not matched by any
297 *match* stanza.
298
299By default, *dirplex* will send a built-in 404 response, but any
300`.htrc` file or global configuration may define a request handler
301named `.notfound` to customize the behavior. Note that, unlike
302successful requests, such a handler will not be passed the
303`X-Ash-File` header.
304
305The built-in `.notfound` handler can also be used in *match* or
306*capture* stanzas (for example, to restrict access to certain files or
307directories).
308
309EXAMPLES
310--------
311
312The *sendfile*(1) program can be used to serve HTML files as follows.
313
314--------
315fchild send
316 exec sendfile
317
318match
319 filename *.html *.htm
320 xset content-type text/html
321 handler send
322--------
323
324Assuming the PHP CGI interpreter is installed on the system, PHP
325scripts can be used with the following configuration, using the
326*callcgi*(1) program.
327
328--------
329# To use plain CGI, which uses more resources per handled request,
330# but less static resources:
331fchild php
332 exec callcgi -p php-cgi
333
334# To use FastCGI, which keeps PHP running at all times, but uses less
335# resources per handled request:
336child php
337 exec callfcgi multifscgi 5 php-cgi
338
339match
340 filename *.php
341 handler php
342--------
343
344If there is a directory without an index file, a file listing can be
345automatically generated by the *htls*(1) program as follows.
346
347--------
348match directory
349 default
350 fork htls
351--------
352
353The following configuration can be placed in a `.htrc` file in order
354to dedicate the directory containing that file to some external SCGI
355script engine. Note that *callscgi*, and therefore the script engine
356itself, is started in the same directory, so that arbitrary code
357modules or data files can be put directly in that directory and be
358easily found.
359
360--------
361child foo
362 exec callscgi scgi-wsgi -p . foo
363
364capture foo
365--------
366
367AUTHOR
368------
369Fredrik Tolf <fredrik@dolda2000.com>
370
371SEE ALSO
372--------
373*ashd*(7)