CI/CD with Azure Data Factory - Part 1
This series of blog posts will discuss Microsoft's recommended method for doing CI/CD within their Azure Data Factory (ADF) service.
For the purposes of this series, let's assume we have 3 different ADF environments: development, test, and production.
The first step would be to deploy all the necessary resources into all 3 environments. This includes the ADF instance itself, as well as any child resources used by ADF, such as managed virtual networks, integration runtimes, and more. You'll also want to deploy any extra resources used by ADF, such as KeyVaults, Storage Accounts, etc. You can use whatever method you wish to deploy these resources, Bicep, Terraform, whatever. Deployment of these resources is not the point of this series.
The second step would be to setup the CI/CD workflow for the entities that you create within ADF. These entities include things like pipelines, datasets, data flows, and more. The 1,000 foot view of this process is as follows:
Connect your development ADF instance to a Git repo
Make changes in your development ADF instance
The development ADF will create Azure ARM Templates, based on its own entities, and place the templates into the Git repo
The ARM Templates can be used to deploy the ADF entities to the other ADF instances, such as test and production.
You can, of course, do this manually. You can go into the development ADF Studio > Manage > ARM Template > Export ARM Template. Then, you can go into one of the other environment's ADF Studio > Manage > ARM Template > Import ARM Template. But, what is the fun of that? Nobody wants to manually do this process every time they need to promote changes to different environments.
Above, you will see a diagram and some screenshots that show the first part of the process: creating ARM Templates from the entities in your development ADF instance. We will cover the rest of the process later in the series.
As you can see, only the development ADF instance is connected to a Git repo. You do not want to connect any other ADF instances to a Git repo. The whole idea behind this workflow is that you make changes ONLY to your development ADF instance, those changes get captured as code in your Git repo, and then you deploy that code to the other environments. By doing it this way, you should never be making manual changes to the test or production ADF instances, as the code will be doing that for you.
You'll need to do some initial setup and configuration when you connect the development ADF instance to a Git repo:
Choose a "Collaboration Branch". This is the branch that is used as the source to build your ARM templates.
Choose a "Publish Branch". This is the branch that holds the ARM Templates that ADF will generate for you.
Let's dig into the first part of the process in further details. These numbers line up with the screenshot found above.
While using ADF Studio on the development instance of ADF, create a new feature branch. Using branches is a standard best practice. You typically do not want to make changes directly to your main branch.
Once you have created your new branch, you can begin to make changes inside ADF Studio. Do whatever you need to do, create a new linked service, create a new pipeline, etc. In the background, ADF will automatically save all of your changes as JSON files in your branch.
Debug and test your new resources in ADF Studio. Do your pipelines run correctly? Do your linked services connect properly? etc.
Once you feel confident that your new changes are working correctly, you can use ADF Studio to create a pull request. In this example we are using GitHub, so ADF will automatically open the GitHub web interface and it will initiate the pull request process for you. You will need to complete the form to open your pull request.
Using GitHub, work with your team to get the necessary approvals on your pull request. Once you have met all the requirements, you can merge the pull request. This will update the collaboration branch with the latest changes that you made in your feature branch.
Back in ADF Studio, you can click on the "Publish" button. This will automatically read the latest code from the Collaboration Branch and use it to build ARM Templates which it will store in the Publish Branch.
Step 6 is manual. Every time your PR is merged you must go to ADF and hit the "Publish" button in order to generate the ARM Templates. You can automate this if you want, Microsoft has an NPM package called ADFUtilities which can be used to automate Step 6. See this link for more information.
So, that's part 1 of this process. As you can probably guess, part 2 involves using the ARM Templates from the Publish Branch in order to deploy changes to the remaining ADF environments. Stay tuned!