How Interchangeable Are Integration pipelines Between Azure Data factory and Azure Synapse Analytics?

0
258

Inspired by an earlier blog where we looked at ‘How Interchangeable Delta Tables Are Between Databricks and Synapse‘ I decided to do a similar exercise, but this time with the integration pipeline components taking centre stage.

As I said in my previous blog post, the question in the heading of this blog should be incredibly pertinent to all solution/technical leads delivering an Azure based data platform solution so to answer it directly:

Question: How Interchangeable Are Integration Pipelines Between Azure Data Factory and Azure Synapse Analytics?

Answer: Very interchangeable! 

Or, to ask the question another way:

Question: Can we use the same integration components in Azure Data Factory and Azure Synapse Analytics at the same time?

Answer: Yes!

The only caveat to both these questions is that in the source control configuration for each resource you set the ‘root folder’ to the same location. In my case this was just the root of the repository itself because I created the test case from scratch. Link below if you want to view the contents.

https://github.com/mrpaulandrew/AzureIntegrationPipelines


Not convinced? Watch this…

 


With the above in mind, for me, as an architect, things now get very interesting when designing a data platform solution.

  • Delta Tables interchangeable as an open source standard when working with Apache Spark as the compute.
  • Data Lake storage interchangeable and accessible by lots of different resources by the very nature of the underlying distributed file system.
  • Orchestration components interchangeable between integration resources when accessing the same Git repository and using the same pipeline artifacts.

Therefore, in a given data platform architecture (before Synapse arrived) where there were a common set of core components, listed below. Now there isn’t any reason why (in most cases) we can’t switch things over to Azure Synapse Analytics, if we wanted to.

Pre-Synapse Core Resources

  1. Data Lake
  2. Databricks
  3. Data Factory

Post-Synapse Core Resources

  1. Data Lake
  2. Synapse – Spark Pools
  3. Synapse – Integration Pipelines

The other great thing, as data engineers we wouldn’t need to do much work for these resources in our solution to become almost plug and play. We could even run solutions in parallel with some creative code branching!

Now, trolls, I fully appreciate my initial test in the video was very very simple, mainly due to a lack of time. So, I will continue this work and test all the integration components including debugging in both resources at the same time to see if we uncover any side effects to this repo sharing. So, stay tuned.

For now I wanted to plant the seed of architecture interchangeability so you could consider trying out the same and maybe unlock Synapse in a future data platform solution because it’s fairly easy to do so, I think you’ll agree 🙂


Many thanks for reading.

Previous articleCreate a Teams chat group with Power Automate and other new actions and triggers
Next articleMicrosoft Dataverse Intro in 20 Minutes #Dataverse
avatar
Group Manager & Analytics Architect specialising in big data solutions on the Microsoft Azure cloud platform. Data engineering competencies include Azure Synapse Analytics, Data Factory, Data Lake, Databricks, Stream Analytics, Event Hub, IoT Hub, Functions, Automation, Logic Apps and of course the complete SQL Server business intelligence stack. Many years’ experience working within healthcare, retail and gaming verticals delivering analytics using industry leading methods and technical design patterns. STEM ambassador and very active member of the data platform community delivering training and technical sessions at conferences both nationally and internationally. Father, husband, swimmer, cyclist, runner, blood donor, geek, Lego and Star Wars fan!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.