In my head, I consider Synapse Analytics to be somewhat of a "version 2" of the Azure Data Factory (ADF) resource. However, I don't think Microsoft officially calls it that. Synapse includes all of the features from ADF. It also adds on its own Synapse-specific features on top.
CI/CD with Synapse is very similar to how it's done with ADF. I don't intend for this blog post to be a full "how-to" for Synapse CI/CD. Instead, I will use this post to point out where it differs from the ADF CI/CD process, as well as provide any notes that you will need to ensure success. With that being said, to fully understand this post you will need to first read my CI/CD with Azure Data Factory series:
The Publishing Process
At a high level, the overall publishing process for Synapse is exactly the same as ADF:
First, deploy all of your Synapse resources (Synapse, storage account, Spark pool, etc.) into all of your environments (dev, test, stage, etc.)
Connect your Dev instance, and only your Dev instance, to a Git repository
Create a new branch, make some changes in Synapse Studio, PR your changes back to the main branch, and then click "Publish" in Synapse Studio
This is where the Synapse CI/CD process starts to diverge from the ADF CI/CD process:
The Publish Branch for Synapse has a default name of "workspace_publish" instead of "adf_publish"
In ADF, the publish step can be automated by using the ADF Utilities NPM package. To my knowledge, no such package exists for Synapse. So, in other words, there is no automated publishing for Synapse
The template that is generated in your Publish Branch is a mix of standard ARM resources and non-standard ARM resources. Meaning you can NOT deploy this JSON file with standard ARM deployment methods, as ARM won't know what to do with some of the resources that are included.
Deployment
The last bullet point above is important. If you can't use standard deployment methods, then what are you supposed to do? Microsoft provides 2 options:
For Azure DevOps Pipelines, Microsoft provides a Synapse extension. You need to install this into your Azure DevOps Organization. Once installed, you can use the included tasks to perform your CI/CD deployments.
For GitHub Workflows, Microsoft provides a Synapse GitHub Action that you can use to perform CI/CD deployments.
The pre/post deployment script
For ADF, Microsoft provides a special PowerShell script that you should run before and after the deployment to do certain things. One of those things it to stop the ADF Triggers before the deployment, and start them again after deployment is finished.
There is no such script for Synapse.
If you're using the Synapse extension for Azure DevOps, then luckily that includes a task that can be used to start & stop Synapse Triggers. The task for this is named "toggle-triggers-dev@2" and you'll see examples of how to use it in my repository.
If you're using GitHub Actions, then you're out of luck. You must create a custom script to stop & start the Synapse Triggers. You must incorporate this script into your workflow. Note: there is an old GitHub Issue where Microsoft claims they are working on an Action that can toggle Triggers, but to my knowledge this was never released.
Azure DevOps Pipeline Example
I have created an Azure DevOps YAML Pipeline which you can use to deploy from one Synapse instance to another. You can find the full pipeline in my GitHub repo. Note: you'll need to update the variables and the repositories within the pipeline to match your own environment.
Here is a diagram explaining exactly what the pipeline is doing:
Feature Comparison Table
| Azure Data Factory | Azure Synapse Analytics |
Automated Publish | None | |
CI/CD Templates | Standard ARM | NON-standard |
CI/CD Deployment | Any ARM deploy method | |
Toggle Triggers | GitHub Action: none/custom |
留言