OpenBSD httpd Rewrite and Redirects with Examples
Draft! |
This page is a draft and may be incomplete, incorrect, or just a
stub or outline. I've decided to allow myself to put draft pages on
my website as an experiment. I'm hoping they will:
|
Like all OpenBSD software, httpd
is a new, cruft-free distillation of the
the minimal set of good parts that make up a solid Web server.
I’m so happy that OpenBSD httpd
includes URL rewriting. I believe this is a
foundational feature for a general-purpose Web server because it’s pretty
much required to make "Cool URIs" that don’t expose all the messy heterogeneous
technical details on your server. See:
Cool URIs don’t change (w3.org)
The official docs are:
-
https://man.openbsd.org/httpd.conf top-level description of httpd configuration
-
https://man.openbsd.org/patterns.7 (the Lua-style patterns used by httpd)
It’s all there and there are a couple relevant examples, but you’re really on your own to test stuff out to see how all these separate facts compose together for your specific URL rewrite needs. I’m gonna figure it out and make this page.
What is a rewrite?
It’s really important to understand that a URL rewrite is entirely internal to the server. Unlike a redirect, a rewrite is never visible to the person/browser making the request.
The purpose of a rewrite is to handle an incoming URL in some special way that is completely transparent to the end user. The reasons for needing to do this are often technical in nature: to internally redirect a simple, logical external-facing URL into whatever your backend horror show needs to see in order to get the correct content. I’m not judging; we’ve all been there.
For example, you could use rewrites to hide the implementation details for these URLs:
http://example.com/blog/hello http://example.com/blog/goat-herding
…into these clean, semantic URLs:
http://example.com/blog-engine.rb?article=hello http://example.com/blog-engine.rb?article=goat-herding
Or simply allow a request for "foo" to retrieve "foo.html" without the file extension.
Contrasting with redirects
As mentioned above, redirects are a similar concept. The difference is that a redirect is an instruction to the web browser making the request.
The type of redirect you can specify with httpd.conf
are HTTP statuses
which are returned to the browser.
The two types of redirect I use most often are:
-
302 Found - which will cause the browser to go to the new location.
-
301 Moved Permanently - which will cause the browser to go to the new location and remember it.
But evidently 307 Temporary Redirect and 308 Permanent Redirect were added at some point to avoid some weirdo behavior depending on request method. Feel free to use those. I just don’t have any experience with them. I guess I’m old now?
Anyway, here’s some httpd.conf
redirects:
TODO
By location
You can rewrite all URLs coming into your server. But it’s more likely that you’ll want to match something more specific.
Here’s a simple single-page rewrite.
"location" vs "location match"
The httpd.conf(5) manpage (linked above) explains the difference like so:
location path {…}
Specify server configuration rules for a specific location. The path argument will be matched against the request path with shell globbing rules.
location match path {…}
Like the location option, but match the path using pattern matching instead of shell globbing rules, see patterns(7). The pattern may contain captures that can be used in an enclosed block return or request rewrite option.
(Please note that I elided the optional "found/not found" syntax from both location examples, which also lets you only match a rule when the document would otherwise have been found or not. That’s also really nice and useful.)
Macros
You can think of these macros as pre-defined variables that can be placed anywhere in your final URL.
$DOCUMENT_URI The request path. $QUERY_STRING The query string of the request. $QUERY_STRING_ENC The URL-encoded query string of the request. $REMOTE_ADDR The IP address of the connected client. $REMOTE_PORT The TCP source port of the connected client. $REMOTE_USER The remote user for HTTP authentication. $REQUEST_SCHEME The request scheme (http or https). $REQUEST_URI The request path and optional query string. $SERVER_ADDR The configured IP address of the server. $SERVER_PORT The configured TCP port of the server. $SERVER_NAME The name of the server. $HTTP_HOST The host from the HTTP Host header. %n The capture index n from a location match.
(These can be used in both rewrites and redirects.)
To test what these do, I created the following entry in httpd.conf
:
server "willard.ratfactor.com" { listen on * port 80 location match "ma(cro)(test)" { block return 302 "mt.html?\ DOCUMENT_URI=$DOCUMENT_URI\ &QUERY_STRING=$QUERY_STRING\ &QUERY_STRING_ENC=$QUERY_STRING_ENC\ &REMOTE_ADDR=$REMOTE_ADDR\ &REMOTE_PORT=$REMOTE_PORT\ &REMOTE_USER=$REMOTE_USER\ &REQUEST_SCHEME=$REQUEST_SCHEME\ &REQUEST_URI=$REQUEST_URI\ &SERVER_ADDR=$SERVER_ADDR\ &SERVER_PORT=$SERVER_PORT\ &SERVER_NAME=$SERVER_NAME\ &HTTP_HOST=$HTTP_HOST\ &m1=%1\ &m2=%2" } }
Which redirects from the URL "/macrotest" to "/mt.html" followed by a long query string containing examples of each macro.
I created the mt.html macro test page destination for the redirect with the following snippet of JavaScript to print out the query string params:
<html><body> <script> var query_params = new URLSearchParams(window.location.search); for(p of query_params){ document.write("$" + p[0] + " = " + p[1] + "<br>"); } </script> </body></html>
Then I made the request in my browser and…
Requested URL: http://willard.ratfactor.com/macrotest?foo=bar&fiz=buz Redirects to mt.html with: $DOCUMENT_URI = /macrotest $QUERY_STRING = foo=bar (followed by &fiz=buz) $QUERY_STRING_ENC = foo=bar%26fiz=buz $REMOTE_ADDR = 99.59.251.76 $REMOTE_PORT = 52210 $REMOTE_USER = $REQUEST_SCHEME = http $REQUEST_URI = /macrotest $SERVER_ADDR = 0.0.0.0 $SERVER_PORT = 80 $SERVER_NAME = willard.ratfactor.com $HTTP_HOST = willard.ratfactor.com $m1 = cro $m2 = test
I think most of these are pretty self-evident. But a couple may not be so obvious.
For example, REMOTE_USER
is only set if you are using HTTP authentication.
For "$SERVER_NAME vs $HTTP_HOST" and "$QUERY_STRING vs $QUERY_STRING_ENC", keep reading…
$SERVER_NAME
vs $HTTP_HOST
The difference between these two is that HTTP_HOST
is what was requested by the browser. SERVER_NAME
is
what is matched by httpd.
This is best illustrated by an example. Given the
following httpd.conf
:
server "*.ratfactor.com" { listen on * port 80 location match "testname" { block return 302 "test?$s=$SERVER_NAME&h=$HTTP_HOST" }
The result of this example is a redirect to:
/test?$s=*.ratfactor.com&h=willard.ratfactor.com
As you can see, the $SERVER_NAME
is exactly what we have in the server
section of httpd.conf
, including the wildcard for the subdomain.
The $HTTP_HOST
is what I requested with my browser.
$QUERY_STRING
vs $QUERY_STRING_ENC
Here’s exactly what showed up in the browser address bar for the two versions in the redirect above (separated into lines for clarity):
&QUERY_STRING=foo=bar&fiz=buz &QUERY_STRING_ENC=foo=bar%26fiz=buz
The visual difference is that the ampersand in the original query
string sent to "macrotest" has been turned into the encoded byte sequence
%26
in the "_ENC" version.
The effect of this is that when passed without this encoding, the
ampersand separating the foo
and fiz
params is interpreted like so:
QUERY_STRING=foo=bar fiz=buz
Which is just silly.
It’s perfectly fine to pass the raw query string to a redirect or rewrite so long as it’s the only query string:
location match "test1" { block return 302 "t?$QUERY_STRING" } # Request: test1?foo=bar # Redirect: t?foo=bar
Or appended to the end of another query string:
location match "test2" { block return 302 "t?pg=2&$QUERY_STRING" } # Request: test2?foo=bar # Redirect: t?pg=2&foo=bar
But here we need to encode the query string to avoid weird behavior with
the ampersand in the "inner" query assigned to the q
param:
location match "test3" { block return 302 "t?pg=3&q=$QUERY_STRING_ENC&more=true" } # Request: test3?x=3&y=7 # Redirect: t?pg=3&q=x=3%26y=7&more=true
Note
|
Whatever application needs to interpret the q param is going to
need to URL-decode the original query.
|
My actual…
# Relative to the chroot of /var/www/ root "/htdocs/famsite" directory index "index.php" # all PHP is run through php-fpm location "*.php" { fastcgi socket "/run/php-fpm.sock" } # two things commented out because I opted to do without for ascetic reasons: # Special handling for login links #location match "/login/(.*)" { # request rewrite "/login.php?login=%1" #} # Add .php to any url that doesn't match a file and # doesn't have a dot in the last part. #location not found match "/[^.]+" { # request rewrite "$REQUEST_URI.php" #}
Looks useful? (TODO)
I haven’t tested this yet on OpenBSD 7.6, but here’s a great example with a lot of real-world use of the sort I’ve personally done with Apache via Tim Baumgard’s OpenBSD httpd rewrite patches (github.com) repo.
server "www.example.com" { listen on egress port 80 root "/www.example.com" # Let httpd handle assets like images, CSS, etc. directly. location "/assets/*" { pass } # Rewrite the old path for images directory. location match "^/images/(.*)" { request rewrite "/assets/images/%1" } # The setting above could be accomplished without rewrites like so: #location "/images/*" { # root "/www.example.com/assets" # pass #} # URL routing with query string modifications for a legacy script. # All predefined macros are enumerated and described in the httpd.conf(5) # man page, including the $QUERY_STRING_ENC added by the patch. location match "^/legacy/(.*)" { fastcgi socket "/run/php-fpm.sock" request rewrite "/legacy.php?target=%1&query=$QUERY_STRING_ENC" } # Default URL routing for everything else. The query string is untouched. location "*" { fastcgi socket "/run/php-fpm.sock" request rewrite "/default.php" } }
Note: This page began as a tiny part of this blog entry that got out of hand. :-)