I have taken the habit to use a single domain as an umbrella for multiple websites. Which in short means we can benefit from different software stacks being hosted on different platforms.
As a bonus, we can use a custom domain on GitHub Pages with our own SSL certificate.Continue reading
ProxyPassReverse are our best friend to mount an external URL (and its descendants) onto a folder of our very own domain.
Why using a reverse proxy?
We are in 2016 and mentioning reverse proxy in a conversation sounds odd and pretty much dated from a different century. But hey, who cares?
I use reverse proxies for various reasons:
- to isolate components
So instead of putting everything in a larger and larger monolithic website, we can manage them as different git repositories and have a different build process as well (like
/talkson this website)
- to manage different stacks In the case of a conference with a yearly edition or so, we can iterate over the software stack and change accordingly to our needs without having to upgrade legacy editions nor to keep continuing them because we feel obliged to.
- to provide a transparent experience to our users We can host content in different places and still provide a coherent experience to a user without having them to feel the spread of our infrastructure.
- to upgrade individual components One by one rather than the entire stacks. Which makes the life easier in term of Q&A scope. Obviously, we fall in the microservices trap so the more individual projects we have, the more scattered our attention and efforts can be.
- to run Docker containers or web applications We can prevent to expose them directly on the port 80 or 443 – although this is not the point of this article as we are rather focused on proxying external content.
It is a good way to hide complex and purposeful components under a same and apparently unique domain. This is for example how websites like the BBC feel like one website whereas they are in reality composed of dozens and dozens of different websites developed by independent teams.
How does it work?
An HTTP request directed to our hosting provider will usually look like the following examples:
By default we assume the folders
/doc are contained in the same directory as the root of the website.
Let's say we actually have decided to opt in for a whitelabeled content provider for a part of the website and moved another part of it to a static website hosted on GitHub Pages. The above example would evolve into:
It should be clear enough so let's dive a bit more in how to achieve this.
The configuration of a reverse mainly relies on a declaration of Apache
ProxyPass for each path (and its descendants) we would like to host elsewhere:
And that's pretty much it!
Configuring Proxy directives
I found Apache
ProxyPass documentation to be quite clear actually (or maybe I spent too much time reading it). We can manage to exclude folders from the proxying or match only specific patterns with
ProxyPassMatch. I guess all we need is a use case before starting to use them 😊.
This setting has an influence on how our VirtualHost proxy server will advertise the
Host HTTP header to the client.
So in general we will want to have it set to
Off, especially in the case of web browsers and relative link computation.
This reverse directive indicates Apache how to treat location headers emitted by the backend of the proxy.
In other words, if the backend emits some headers like
Content-Location, our proxy will rewrite them to match our VirtualHost.
This is exactly the same principle as
ProxyPassReverse but to rewrite the hostnames contained in any
Cookie header emitted by the backend.
This one will enable the proxy module to deal with signed requests. We could definitely have an HTTP to HTTPS or, better, HTTPS to HTTP – to secure insecure parts of our website. Or to secure them… with a different SSL certificate.
And that's precisely one advantage to use a reverse proxy in front of GitHub Pages to use our custom domain and our own certificate.
Reverse Proxy over HTTPS
GitHub serves every GitHub Pages websites over HTTPS if they have been created after June 15th 2016. So we will have to make sure both our server can talk over SSL with GitHub by enabling
As an alternative, we can also run the following to globally enable
By doing so, we do not need to write the
If we cannot enable
mod_ssl, well we are screwed so best is to raise a support ticket to our hosting service provider if there is a way to enable it.
Is the reverse proxy technique limited to Apache httpd? Of course not:
- Nginx has proxy_pass and a good reverse proxy tutorial;
- Varnish is more powerful but slightly harder to dive into its documentation;
- Node.js express-http-proxy package will help mount proxy routes in our Express application. I personally use it to proxy authentication at the app level and query internal APIs.