Most Azure Platform-as-a-Service (PaaS) offerings include support for private endpoints. A private endpoint is a special type of Network Interface (NIC) that gets deployed into one of your Azure Virtual Networks. This means the private endpoint will be assigned a private IP address from your virtual network. In fact, depending on the type of PaaS service, the private endpoint might actually get multiple IP addresses from your virtual network. These private endpoints can be used to privately access the PaaS service. See below for a simple example showing a couple of PaaS services and their private endpoints.
Configuring DNS resolution so that it works properly with private endpoints is not a trivial thing. Daniel Mauser, a cloud networking specialist at Microsoft, put together a great guide called Private Endpoint DNS Integration Scenarios. I suggest you read that first, because I will not be discussing the finer details of DNS integration here on this post.
Before I can get into the main point of this post, let’s discuss some important facts.
First, I’d like to point out that each private endpoint is deployed to a specific region in Azure. For example, if you deploy a private endpoint in the Central US region, then it must be connected to a virtual network that is deployed to the Central US region. If you have a small Azure infrastructure that’s only deployed to 1 Azure region, then you don’t need to worry about reading the rest of the article. This post is specifically about Azure implementations that span 2 or more Azure regions.
Second, not all PaaS services offer multi-region options, or some services implement it differently than others. For example, Azure Batch and Azure Kubernetes Service (AKS) are examples of services that do not have built-in multi-region capabilities. Then, we have services such as Storage Accounts and KeyVaults that do have built-in multi-region capabilities. Finally, we have services that do their own unique thing. An example of this is PostgreSQL Flexible Server, which supports Geo-Replication with an added extra layer of Virtual Endpoints on top.
The last point that I want to cover is around private endpoints and DNS. It is very common to have all Private Endpoints store their DNS records in shared Private DNS Zones. In a typical enterprise-scale landing zone architecture you’ll have a shared Connectivity subscription that holds the global Private DNS Zones that are shared by all landing zones. Commonly, each PaaS service has a single, dedicated Private DNS Zone, as outlined here. This single zone is then linked to all hubs in your hub and spoke architecture. I show a high-level example of this below.
The Problem
The problem comes into play when utilizing one of the PaaS services that support built-in multi-region AND you are using a single private endpoint for that service in a disaster recovery scenario AND you are using a single Private DNS Zone for said PaaS service.
In short, the private endpoint exists only in 1 Azure region. Specifically, it is using an IP address from that region’s virtual network. What happens if this region goes down? Well, at that point, nobody can talk to the IP address of the private endpoint, because that region’s virtual network is down. That means this private endpoint is essentially dead, even if that PaaS resource was failed over to another region. I show an example of this below.
The next logical thought you might have is to create a 2nd private endpoint for the PaaS service, and to place the 2nd endpoint in the 2nd region’s virtual network. I show this example below. This is a totally valid scenario. A single PaaS services supports multiple private endpoints spread across different regions/virtual networks. However, you will run into a problem when utilizing a single, shared private DNS Zone. As stated here, “two private endpoints can't use the same Private DNS Zone for the same endpoint.” In other words, when using single private DNS zone, the DNS record(s) can only point to one region (one private endpoint) at a time.
The Solution
So, what is the solution to this problem? Actually, there are two:
Solution 1
Solution 1 keeps the single private DNS zone. In this scenario, DNS can point to only 1 private endpoint at a time. This means that during a disaster, after the PaaS resource fails over to the other region, you must manually update the DNS records to point at the 2nd private endpoint. You may want to pre-create the 2nd private endpoint before a disaster happens. If not, then in the event of a region going down you will be crossing your fingers, and praying that the PaaS resource is healthy enough to allow the creation of the 2nd private endpoint.
Another thing to note about this solution is that it results in non-optimal routing between regions. In this scenario, when Region B needs to talk to the private endpoint in Region A, it will traverse whatever layer 3 solution you have for inter-region routing. That could be express route, VPN gateways, or another solution.
Solution 2
Solution 2 is to create 2 private DNS zones of the exact same name. This is only possible if you create the zones in separate resource groups. Private DNS Zones are global, but you can consider one of these to be dedicated for Region A, and the other zone dedicated for Region B. In a hub and spoke network setup, the hub for Region A would be linked to the first DNS zone, and the hub for Region B would be linked to the second DNS zone.
Be careful if you use Azure Policy to automate the creation of the DNS records. As stated here, “this scenario requires manual maintenance/updates of the Private Link DNS record set in every region as there is currently no automated lifecycle management for these.”
This allows workloads in Region A to talk directly to the private endpoint in Region A. Likewise, workloads in Region B would talk directly to the private endpoint in Region B. This scenario is pictured above. Also, solution 2 results in optimal routing between regions. Traffic from Region B immediately hits the Microsoft backbone network and would not need to traverse your layer 3 routing.
Wrap Up
Private networking for Azure PaaS resources is an incredibly complex subject. Even more so when you add multi-region disaster recovery to the mix. When I sat down to research and tackle this problem for my day job, I realized just how little information is out there on this subject. So, I decided it would be a good subject for a new blog post to document my findings.
With that being said, I have to give a special shoutout to Adam Stuart from Microsoft on his excellent GitHub repos and YouTube videos on this exact subject. You can find links below.
Comments