OpenBSD httpd Rewrite and Redirects with Examples

Page created: 2024-11-16

Updated: 2025-04-10

Draft!

This page is a draft and may be incomplete, incorrect, or just a stub or outline. I've decided to allow myself to put draft pages on my website as an experiment. I'm hoping they will:

Help me address my backlog of article ideas.
Serve as a "living" TODO list of things to work on.
Be useful to myself or others in their incomplete forms.

As always, I'm happy to accept feedback on anything I publish including draft content.

Back to my OpenBSD pages.

Like all OpenBSD software, httpd is a new, cruft-free distillation of the the minimal set of good parts that make up a solid Web server.

I’m so happy that OpenBSD httpd includes URL rewriting. I believe this is a foundational feature for a general-purpose Web server because it’s pretty much required to make "Cool URIs" that don’t expose all the messy heterogeneous technical details on your server. See: Cool URIs don’t change (w3.org)

The official docs are:

https://man.openbsd.org/httpd.conf top-level description of httpd configuration
https://man.openbsd.org/patterns.7 (the Lua-style patterns used by httpd)

It’s all there and there are a couple relevant examples, but you’re really on your own to test stuff out to see how all these separate facts compose together for your specific URL rewrite needs. I’m gonna figure it out and make this page.

What is a rewrite?

It’s really important to understand that a URL rewrite is entirely internal to the server. Unlike a redirect, a rewrite is never visible to the person/browser making the request.

The purpose of a rewrite is to handle an incoming URL in some special way that is completely transparent to the end user. The reasons for needing to do this are often technical in nature: to internally redirect a simple, logical external-facing URL into whatever your backend horror show needs to see in order to get the correct content. I’m not judging; we’ve all been there.

For example, you could use rewrites to hide the implementation details for these URLs:

http://example.com/blog/hello
http://example.com/blog/goat-herding

…into these clean, semantic URLs:

http://example.com/blog-engine.rb?article=hello
http://example.com/blog-engine.rb?article=goat-herding

Or simply allow a request for "foo" to retrieve "foo.html" without the file extension.

Contrasting with redirects

As mentioned above, redirects are a similar concept. The difference is that a redirect is an instruction to the web browser making the request.

The type of redirect you can specify with httpd.conf are HTTP statuses which are returned to the browser. The two types of redirect I use most often are:

302 Found - which will cause the browser to go to the new location.
301 Moved Permanently - which will cause the browser to go to the new location and remember it.

But evidently 307 Temporary Redirect and 308 Permanent Redirect were added at some point to avoid some weirdo behavior depending on request method. Feel free to use those. I just don’t have any experience with them. I guess I’m old now?

Anyway, here’s some httpd.conf redirects:

TODO

By location

You can rewrite all URLs coming into your server. But it’s more likely that you’ll want to match something more specific.

Here’s a simple single-page rewrite.

"location" vs "location match"

The httpd.conf(5) manpage (linked above) explains the difference like so:

location path {…} Specify server configuration rules for a specific location. The path argument will be matched against the request path with shell globbing rules.

location match path {…} Like the location option, but match the path using pattern matching instead of shell globbing rules, see patterns(7). The pattern may contain captures that can be used in an enclosed block return or request rewrite option.

(Please note that I elided the optional "found/not found" syntax from both location examples, which also lets you only match a rule when the document would otherwise have been found or not. That’s also really nice and useful.)

Macros

You can think of these macros as pre-defined variables that can be placed anywhere in your final URL.

$DOCUMENT_URI     The request path.
$QUERY_STRING     The query string of the request.
$QUERY_STRING_ENC The URL-encoded query string of the request.
$REMOTE_ADDR      The IP address of the connected client.
$REMOTE_PORT      The TCP source port of the connected client.
$REMOTE_USER      The remote user for HTTP authentication.
$REQUEST_SCHEME   The request scheme (http or https).
$REQUEST_URI      The request path and optional query string.
$SERVER_ADDR      The configured IP address of the server.
$SERVER_PORT      The configured TCP port of the server.
$SERVER_NAME      The name of the server.
$HTTP_HOST        The host from the HTTP Host header.
%n                The capture index n from a location match.

(These can be used in both rewrites and redirects.)

To test what these do, I created the following entry in httpd.conf:

server "willard.ratfactor.com" {
        listen on * port 80

        location match "ma(cro)(test)" {
                block return 302 "mt.html?\
DOCUMENT_URI=$DOCUMENT_URI\
&QUERY_STRING=$QUERY_STRING\
&QUERY_STRING_ENC=$QUERY_STRING_ENC\
&REMOTE_ADDR=$REMOTE_ADDR\
&REMOTE_PORT=$REMOTE_PORT\
&REMOTE_USER=$REMOTE_USER\
&REQUEST_SCHEME=$REQUEST_SCHEME\
&REQUEST_URI=$REQUEST_URI\
&SERVER_ADDR=$SERVER_ADDR\
&SERVER_PORT=$SERVER_PORT\
&SERVER_NAME=$SERVER_NAME\
&HTTP_HOST=$HTTP_HOST\
&m1=%1\
&m2=%2"
        }

}

Which redirects from the URL "/macrotest" to "/mt.html" followed by a long query string containing examples of each macro.

I created the mt.html macro test page destination for the redirect with the following snippet of JavaScript to print out the query string params:

<html><body>
<script>
var query_params = new URLSearchParams(window.location.search);
for(p of query_params){
    document.write("$" + p[0] + " = " + p[1] + "<br>");
}
</script>
</body></html>

Then I made the request in my browser and…

Requested URL:

http://willard.ratfactor.com/macrotest?foo=bar&fiz=buz

Redirects to mt.html with:

    $DOCUMENT_URI     = /macrotest
    $QUERY_STRING     = foo=bar (followed by &fiz=buz)
    $QUERY_STRING_ENC = foo=bar%26fiz=buz
    $REMOTE_ADDR      = 99.59.251.76
    $REMOTE_PORT      = 52210
    $REMOTE_USER      =
    $REQUEST_SCHEME   = http
    $REQUEST_URI      = /macrotest
    $SERVER_ADDR      = 0.0.0.0
    $SERVER_PORT      = 80
    $SERVER_NAME      = willard.ratfactor.com
    $HTTP_HOST        = willard.ratfactor.com
    $m1               = cro
    $m2               = test

I think most of these are pretty self-evident. But a couple may not be so obvious.

For example, REMOTE_USER is only set if you are using HTTP authentication.

For "$SERVER_NAME vs $HTTP_HOST" and "$QUERY_STRING vs $QUERY_STRING_ENC", keep reading…

`$SERVER_NAME` vs `$HTTP_HOST`

The difference between these two is that HTTP_HOST is what was requested by the browser. SERVER_NAME is what is matched by httpd.

This is best illustrated by an example. Given the following httpd.conf:

server "*.ratfactor.com" {
        listen on * port 80
        location match "testname" {
                block return 302 "test?$s=$SERVER_NAME&h=$HTTP_HOST"
}

The result of this example is a redirect to:

/test?$s=*.ratfactor.com&h=willard.ratfactor.com

As you can see, the $SERVER_NAME is exactly what we have in the server section of httpd.conf, including the wildcard for the subdomain.

The $HTTP_HOST is what I requested with my browser.

`$QUERY_STRING` vs `$QUERY_STRING_ENC`

Here’s exactly what showed up in the browser address bar for the two versions in the redirect above (separated into lines for clarity):

&QUERY_STRING=foo=bar&fiz=buz
&QUERY_STRING_ENC=foo=bar%26fiz=buz

The visual difference is that the ampersand in the original query string sent to "macrotest" has been turned into the encoded byte sequence %26 in the "_ENC" version.

The effect of this is that when passed without this encoding, the ampersand separating the foo and fiz params is interpreted like so:

QUERY_STRING=foo=bar
fiz=buz

Which is just silly.

It’s perfectly fine to pass the raw query string to a redirect or rewrite so long as it’s the only query string:

location match "test1" {
        block return 302 "t?$QUERY_STRING"
}
# Request:  test1?foo=bar
# Redirect: t?foo=bar

Or appended to the end of another query string:

location match "test2" {
        block return 302 "t?pg=2&$QUERY_STRING"
}
# Request:  test2?foo=bar
# Redirect: t?pg=2&foo=bar

But here we need to encode the query string to avoid weird behavior with the ampersand in the "inner" query assigned to the q param:

location match "test3" {
        block return 302 "t?pg=3&q=$QUERY_STRING_ENC&more=true"
}
# Request:  test3?x=3&y=7
# Redirect: t?pg=3&q=x=3%26y=7&more=true

Note	Whatever application needs to interpret the `q` param is going to need to URL-decode the original query.

My actual…

        # Relative to the chroot of /var/www/
        root "/htdocs/famsite"
        directory index "index.php"

        # all PHP is run through php-fpm
        location "*.php" {
                fastcgi socket "/run/php-fpm.sock"
        }

# two things commented out because I opted to do without for ascetic reasons:

        # Special handling for login links
        #location match "/login/(.*)" {
        #        request rewrite "/login.php?login=%1"
        #}

        # Add .php to any url that doesn't match a file and
        # doesn't have a dot in the last part.
        #location not found match "/[^.]+" {
        #    request rewrite "$REQUEST_URI.php"
        #}

Looks useful? (TODO)

I haven’t tested this yet on OpenBSD 7.6, but here’s a great example with a lot of real-world use of the sort I’ve personally done with Apache via Tim Baumgard’s OpenBSD httpd rewrite patches (github.com) repo.

server "www.example.com" {
	listen on egress port 80
	root "/www.example.com"

	# Let httpd handle assets like images, CSS, etc. directly.
	location "/assets/*" {
		pass
	}

	# Rewrite the old path for images directory.
	location match "^/images/(.*)" {
		request rewrite "/assets/images/%1"
	}

	# The setting above could be accomplished without rewrites like so:
	#location "/images/*" {
	#	root "/www.example.com/assets"
	#	pass
	#}

	# URL routing with query string modifications for a legacy script.
	# All predefined macros are enumerated and described in the httpd.conf(5)
	# man page, including the $QUERY_STRING_ENC added by the patch.
	location match "^/legacy/(.*)" {
		fastcgi socket "/run/php-fpm.sock"
		request rewrite "/legacy.php?target=%1&query=$QUERY_STRING_ENC"
	}

	# Default URL routing for everything else. The query string is untouched.
	location "*" {
		fastcgi socket "/run/php-fpm.sock"
		request rewrite "/default.php"
	}
}

Note: This page began as a tiny part of this blog entry that got out of hand. :-)