Curator 101: Reverse Proxies and Load Balancers

Matthew Orr
Curator Engineer
September 28, 2023


What Does It All Mean?

It is fairly common for Curator to interact with reverse proxies and/or load balancers, so I thought it would be a good idea to explain what those words mean and how it affects your Curator setup. Having one of those network devices can complicate troubleshooting issues with, have implications for the security of, and change the performance when using your Curator portal. As a result, it’s worth gaining some understanding of them if for no other reason than to know which technology to point fingers at when something unexpected happens. After all, who doesn’t want to be the winner of the blame game?

What Is a Reverse Proxy?

In a nutshell, a reverse proxy is a middleman that sits between two places on the internet. It allows one part of the internet to talk to another part of the internet, without allowing those two parts to connect to each other directly. It’s a way of protecting one of them from the other while still allowing them to pass messages to each other.

Think back to school, when you wanted to tell your friend something during class, but they sat too far away. You would write a note on a piece of paper, fold it up, write their name on the outside and pass it to your neighbor, who we'll call Marvin, who then delivers it to your friend. Reverse proxies are like Marvin, but with an important distinction: Instead of Marvin passing the note directly, they get out a new piece of paper and copy the message, then send that new piece of paper on to your friend. When your friend responds, Meddling Marvin opens the response and again copies it onto a new piece of paper which they hand to you. Your friend never sees the piece of paper you wrote on, and you never see the piece of paper they wrote on.

It sounds nosy and intrusive, and it definitely is in some ways, but that’s how a reverse proxy works. They are there to only allow safe messages to reach the other end. Normally, the notes you would pass back and forth to your friend are innocuous, so Meddling Marvin just faithfully recreates the conversation verbatim. However, maybe your nemesis wanted to stir up trouble by saying something offensive to your friend and making it appear like it came from you. Meddling Marvin has an opportunity to intercept this bogus message and refuse to pass it along. Additionally, maybe your friend has a cold and is getting their germs all over the papers they write on. Because Meddling Marvin is copying the message instead of passing it on, this can also act as a barrier to spreading disease.

Furthermore, Meddling Marvin can protect more than just the communication between you and your friend. To abuse the analogy a little, let’s imagine that Meddling Marvin is the single gateway between the people in your classroom and everyone else in the school or even beyond. Anyone from the outside who wants to pass notes to someone in your classroom must go through Meddling Marvin. They are now protecting everyone in the classroom from nefarious notes. This does have the potential to create a bottleneck, but they are also able to protect a lot of people relatively easily, and it makes it easy for outsiders to know how to get a message to someone in the class.

For a real-life example, a lot of Curator portals sit behind a reverse proxy where the users don’t directly connect to it. The users actually connect to the reverse proxy without knowing it. When a user requests Curator’s homepage, they actually make the request to the reverse proxy. The reverse proxy will then make a corresponding request to Curator to ask for the homepage. Curator responds with the homepage data, and then the reverse proxy creates a response back to that user with the same data.

Normally, there will be more than just Curator sitting behind the reverse proxy. The company’s main website and other enterprise systems will likely all be sitting behind that same reverse proxy. The reverse proxy sees the requests coming in from all over the internet, checks them to see whether they are safe and if so, recreates the request to the appropriate system and relays the response back to the original requestor. Since Curator and the rest of these systems don’t have direct access to the internet, it makes it more difficult for a hacker to mess with them. They would first have to make it through the reverse proxy in order to do so.

What is a Load Balancer?

A load balancer is a reverse proxy, but with some added intelligence. As the name suggests, it can balance the load in addition to acting as a middleman.

Let’s say that you are part of a group of people who are all stuffing letters to support a worthy cause, like increasing awareness of Taco Tuesdays. As the printer dumps out the next letter to be stuffed, one person, Delilah, is in charge of handing the letter to whomever is available so that no one person winds up with too much work. Normally, Delegating Delilah can just distribute the letters to the next person in line to spread the load. However, maybe some of the letter stuffers are slower than others. Delegating Delilah can skip them if they are still working on the previous letter they got and give it to someone who is ready for their next letter. Moreover, let’s say that it’s all just too much for one of the letter stuffers and they pass out from exhaustion. Delegating Delilah can see that they are broken and automatically take that person out of the rotation so that none of the letters that come out of the printer end up falling through the cracks while waiting for their recovery.

This is essentially how a load balancer works when there are two or more servers hosting the same site.  They balance the load between all of those servers and automatically skip servers that aren’t behaving well.  There are many different approaches they can use on how they balance the load, such as round robin, blue/green, primary and failover, etc. but the end goal is that users are oblivious to which server they are hitting at any given time, and that whichever one they are hitting is performing well.

What Are the Implications for Curator?

There are three main configurations that can affect your Curator environment.

When Curator Is Behind a Reverse Proxy

There are several things to keep in mind when placing Curator behind a reverse proxy. Typically, these are only an issue when first setting it up, though it’s definitely possible they could cause confusion when troubleshooting issues that arise later.

One of the issues is around SSL/TLS (i.e. encrypting the connection to Curator). A common setup is to have the connections between users and the reverse proxy encrypted, but allowing the connection between the reverse proxy and Curator remain unencrypted since it’s on a private network.  If the reverse proxy doesn’t recreate enough information when it relays requests to Curator, Curator may not know that the connection is encrypted on the other side. If this happens, any links it builds to other pages, images, etc. would use the unencrypted scheme (i.e. http://) instead of the encrypted scheme (i.e. https://). This can cause browsers to flip between encrypted and unencrypted connections when loading pages, which can cause a bunch of different issues, not least of which is a potential security leak. To prevent this, it’s best to properly configure the reverse proxy to pass along information that the end connection is encrypted even if the connection between the reverse proxy and Curator isn’t. There are also a few configuration options on the Curator side to tell it that the connection is encrypted regardless of what it is being told by the reverse proxy, but these should be thought of as a workaround, not a desired solution.

A similar issue you may face is that Curator generates links based on the address the reverse proxy uses to communicate with Curator instead of the address the users use to communicate with the reverse proxy. For example, users might reach Curator (via the reverse proxy) by going to https://curator.yourcompany.com, but the reverse proxy uses Curator’s IP address (something like https://192.168.300.400) to connect to Curator. In this case, if the reverse proxy doesn’t tell Curator that the request was made to https://curator.yourcompany.com, then Curator may generate links to https://192.168.300.400 when building the menus instead. At best, this would cause confusion by users seeing the links going to a different URL. At worst, it may cause all of the links to be broken since it’s rare that a user’s browser would be able to connect to Curator using that IP address, since that’s the main selling point of using a reverse proxy in the first place.

Another thing to watch out for is the reverse proxy responding to users differently than Curator is responding to the reverse proxy. For example, Curator is typically set up to time out after a few minutes in case there’s a process that gets stuck. However, reverse proxies are often configured to time out after only a minute by default. This means that if there’s some process that takes a little longer than a minute, then Curator will happily keep working on it, but the reverse proxy will give up and tell the user that it couldn’t respond to their request. This causes the user to see errors while Curator doesn’t think anything is wrong.  Both things are technically true from their respective points of view. The disconnect is caused by the reverse proxy, which makes troubleshooting it difficult.

Similarly, if there is an actual issue with Curator, it will give an error message that is helpful to explain what happened and provide insight about how to fix it. However, some reverse proxies are configured to respond with a generic error message whenever the downstream system (i.e. Curator) reports any errors. This also makes troubleshooting difficult because it’s impossible to know what the underlying issue is unless there’s a way to bypass the reverse proxy to look at Curator directly.

For more information on that last part, check out this supplemental blog we put out.

When Curator Is Behind a Load Balancer

In addition to the issues above, there are a few unique issues that arise when Curator is behind a load balancer. Load balancers imply that there are multiple Curator servers (a.k.a. nodes) acting as a single Curator portal. When running in this configuration, Curator needs a few things set up in specific ways. For example, the various nodes need to share a common database, encryption key and portion of the file system. Additionally, when you upgrade, clear cache and make other site-wide adjustments, those adjustments need to be made on all of the nodes at the same time or it can get very confused very quickly. The built-in Distributed enterprise plugin automates these processes, but that only works if each node is properly registered with the Distributed plugin.

Because load balancers delegate requests to the various nodes automatically, it could also lead to weird, unpredictable behavior from the user’s point of view if one of the nodes isn’t working correctly but the other is. For instance, they could click a link and see an error message or the page looking strange, but then they click that exact same link again and it works perfectly. It could cause the user to question their own sanity. Typically, load balancers will see these errors and remove the sick node from the rotation, but that doesn't always happen.

For documentation on setting up Curator behind a reverse proxy or load balancer, review this page.

When a Load Balancer Is Between Curator and Tableau Server

One very important issue when communication between Curator and Tableau Server is through a reverse proxy or load balancer is how to set up trusted ticket authentication between the two. Trusted ticket authentication is a way for Curator to tell Tableau Server that a user has already authenticated themselves, so Tableau doesn’t need to do it itself. The important part of trusted ticket authentication is setting up the trust. The way to tell Tableau Server to trust an application is by telling it which web address the application will use when making requests. This address is known as a trusted host. Any request originating from that address is automatically trusted and is allowed to request an authentication token (i.e. trusted ticket) on behalf of any user on the Tableau Server. The trouble is that a reverse proxy or load balancer can mask where the request originates from, making it look like the request is coming from the reverse proxy itself. This has led to Tableau admins incorrectly using the reverse proxy’s address as the trusted host instead of Curator’s. This is very bad because then any system on the other side of the reverse proxy could make a request for a trusted ticket and Tableau wouldn’t know that it shouldn’t be trusted.

If you avoid adding the reverse proxy as the trusted host, there’s the potential that Curator still may run into issues trying to request trusted tickets. Sometimes Tableau will detect that the request from Curator is being proxied and deny it for that reason. Tableau provides a way to register the reverse proxy as a trustedgateway. This allows it to know that valid trusted ticket requests will be coming through the reverse proxy without allowing the reverse proxy to make trusted ticket requests on its own (or other untrusted systems on the other side of the reverse proxy).

Curator provides an alternative configuration that can allow you to bypass reverse proxy issues around trusted ticket authentication.  Curator’s connection provides a field to specify an alternative Tableau Server URL.  This allows you to specify one address to use when Curator is communicating directly with Tableau Server and an alternative URL for when the user is interacting with Tableau Server.  In this case, Curator would use a URL directly to Tableau Server (i.e. not through the reverse proxy or load balancer), and the alternative URL would allow users to use the address that routes them through the reverse proxy or load balancer to take advantage of the benefits of it.

For more information on trusted ticket authentication in combination with reverse proxies/load balancers, check out this blog.