CI/CD with Azure Data Factory - Part 2
Updated: Apr 1
If you haven't already, please read Part 1 of this series before continuing.
After completing part 1, you'll have some ARM Templates stored in a Git repo, in a branch named "adf_publish". To be more specific, the branch will actually contain 2 different versions of the ARM templates:
A standard, all-in-one template, which includes:
An ARM Template named ARMTemplateForFactory.json. This single template includes every ADF entity
A parameter file named ARMTemplateParametersForFactory.json
A Linked ARM Template version. The files, which are placed under a subfolder named linkedTemplates, include the following:
An ARM Template named ARMTemplate_master.json. This template actually doesn't contain any ADF entities, it simply calls one or more child templates
One or more child ARM Templates, with the first one being named ARMTemplate_0.json and subsequent files will increment the number
A parameter file named ARMTemplateParameters_master.json
Now, it is your responsibility to pick one of those versions and deploy the ARM templates to your other ADF environments, such as test and production. But, unfortunately, it's not just as easy as doing a simple deployment. There's 3 key things that you need to take into account before doing your deployment to the other environments: (1) the pre/post deployment script, (2) override parameters, and (3) linked ARM templates.
The pre/post deployment script
Microsoft provides a PowerShell script that you should run in your deployment pipeline. You should run the script before the ARM templates are deployed (pre), and you should also run the script after the ARM templates are deployed (post).
The script will stop ADF triggers before deployment, and restart ADF triggers after deployment. It will also clean up any deleted ADF entities. Here is a link to the Microsoft article that goes over the script in detail. And, here is a link to the latest version of the script.
The script uses the Azure PowerShell modules, so make sure you have those installed. The script also requires the use of PowerShell Core (6 or 7). If you are using Azure DevOps Pipelines, then the Azure PowerShell task if perfect for this, as it includes the Azure PowerShell modules by default, and you can specify PowerShell Core with a simple parameter.
Note: Make sure to read the Microsoft article in detail, as you must use different script parameters when running the pre-script versus the post-script.
This is a huge subject and I won't spend a ton of time on it. The high-level summary is as follows. This CI/CD process is generating ARM Templates based on your Dev ADF instance. When you deploy these ARM Templates to other environments they will automatically have all of the same settings as the Dev environment.
You may not want that for certain resources. For example, say you had a Linked Service in Dev that pointed to a Dev Database. But, when you deploy that Linked Service to Production, you need the Linked Service to be changed so that it points to a Prod Database. How would you go about that? Enter override parameters.
ADF will automatically parameterize certain things for you. To see all of the items that ADF parameterizes for you, then you can look at the file ARMTemplateParametersForFactory.json. One of the most obvious ones is the name of the ADF instance itself, as you can't have the Prod ADF instance named the same as the Dev ADF instance.
When deploying the ARM Templates, you can pass override parameters that include custom values that are specific to each of your environments. This way you can still use 1 set of ARM Templates for all of your environments. But, when deploying to Test you'll pass in overrides that are specific to Test, when deploying to Prod you'll pass in overrides that are specific to Prod, etc.
You can also create your own custom parameters, but I won't go into detail on those.
Linked ARM Templates
At the beginning of this post, I mentioned that the ARM Template will come in 2 different forms. My recommendation is to keep it simple and use the standard, all-in-one template ... until you can't anymore. You see, ARM Templates have a limit where they can only contain a maximum of 800 resources per template. So, once your ADF grows in size, you may not be able to use the all-in-one template anymore.
Microsoft has thought about this, and that's why they also give you the Linked ARM Templates to use in scenarios where you have too many resources for the standard templates.
Using Linked ARM Template in your ADF CI/CD pipeline is not exactly straightforward and there are a couple of gotchas to watch out for. I'll explain everything below.
Obviously, the master linked template file will need to call the child templates. However, what may not be so obvious is that these child templates need to be stored in a publicly accessible location that Azure Resource Manager has access to.
Per Microsoft: "When referencing a linked template, the value of uri can't be a local file or a file that is only available on your local network. Azure Resource Manager must be able to access the template. Provide a URI value that is downloadable as HTTP or HTTPS."
A public Azure Storage Account is a great option for this. You can lock it down with proper RBAC roles. However, the deployment still needs to access the storage account by way of SAS token, and that SAS token is logged in the ARM Template deployment history! Furthermore, you can NOT enable the resource firewall on the Storage Account.
Per Microsoft: Although the linked template must be externally available, it doesn't need to be generally available to the public. You can add your template to a private storage account that is accessible to only the storage account owner. Then, you create a shared access signature (SAS) token to enable access during deployment. You add that SAS token to the URI for the linked template. Even though the token is passed in as a secure string, the URI of the linked template, including the SAS token, is logged in the deployment operations. To limit exposure, set an expiration for the token ... Currently, you can't link to a template in a storage account that is behind an Azure Storage firewall.
So, to summarize what we have learned so far, we need to:
Create an Azure Storage Account with the firewall set to "Allow public access from all networks"
Copy the child ARM Templates from the "adf_publish" branch into a new Container in the new Storage Account
Create a short-lived SAS Token on the Storage Account. (it must be short-lived because this SAS Token will be logged in your deployment history)
Deploy the master linked template and pass it the location of the storage account holding the child templates, as well as the short-lived SAS Token
How are you supposed to do all of that in a CI/CD pipeline in a secure and repeatable fashion? Well, if you use Azure DevOps Pipelines, one option is the AzureFileCopy task. This task is absolutely perfect for our scenario because:
It allows us to copy local files to an Azure Storage Account. (our pipeline will have the "adf_publish" branch checked out locally, so we can just copy those local files to the storage account)
Under the hood, this task will automatically create a short-lived SAS token, and use it to copy the files to the Azure Storage Account.
Wouldn't it be great if we could actually access that short-lived SAS token and use it for our Linked ARM Template deployment? Guess what, we can!
This task has 2 output variables:
StorageContainerUri: this is the Uri to the storage container where the files were copied to
StorageContainerSasToken: this is the short-lived SAS token that was automatically created by the task
This task will even let us specify how long the automatically generated SAS token will be valid for. The default is 240 minutes / 4 hours. But, you can reduce that even lower if you want to.
This is great! This task will copy our child templates to the storage account, as well as automatically output the Uri and the short-lived SAS token. Then, we can easily pass both of those output variables to our Linked ARM Template deployment.
The AzureFileCopy task has a couple of quirks worth mentioning. One, it must run from a Windows agent, it's not supported on Linux agents. Two, the Service Connection that is running the task must have "Storage Blob Data Contributor" role on the Storage Account. Unfortunately, inherited roles like Owner or Contributor are not enough, it must have this explicit role assigned.
Azure DevOps Pipeline Example
I have created an Azure DevOps YAML Pipeline which you can use to deploy from one ADF instance to another. It supports both the standard, all-in-one ARM Templates as well as the Linked ARM Templates. It also supports running a "What-if" deployment so that you can preview the changes first. You can find the full pipeline in my GitHub repo. Note: you'll need to update the variables and the repositories within the pipeline to match your own environment.
Here are some diagrams explaining exactly what the pipeline is doing:
For a standard ARM Template deployment:
For a Linked ARM Template deployment: