Associative Grouping using Spark (Part 3)

This is part of series of posts about associative grouping: Part 1 – Associative Grouping using tSQL Recursive CTE’s Part 2 – Associative Grouping using tSQL Graph In the first two parts of this series we looked at how we could use recursive CTE’s and SQL Server’s graph functionality to find overlapping groups in two columns in a table, in order to put them into a new super group of associated groups. Since then I’ve been doing most of my Read more…

Introducing AzureDataPipelineTools

A few months ago my friend Richard Swinbank posted a blog, More Get Metadata in ADF, about the limitations of using the Get Metadata activity in ADF to get information about files in a data lake. This to a twitter conversation as a bunch of other data engineers had been building the same tools for different companies. I suggested we get a community project going, so we can stop re-inventing the wheel and maybe add more things to it as Read more…

Azure Data Factory: Dev Mode vs Published Code

I’ve worked with quite a few people new to Azure Data Factory, and one thing that seems to confuse new users is the difference between the developer sandbox where we build pipelines, and the published/deployed code. Understanding this is key to working with Git and using CI/CD pipelines to deploy your code, and getting other Azure services to integrate nicely to call your pipelines. Connecting to ADF A good first place to start is to understand the different ways we Read more…

Azure DevOps: SqlPackage Deployment Timeouts

I recently looked at an Azure DevOps pipeline for a client that was timing out whenever an index change or other long running task was deployed using a DacPac. All deployments that ran within a few minutes successfully completed, but those taking longer than 10 minutes were failing. A quick google for the issue shows a few helpful pages, a SqlPackage.exe bug from 2016 & a blog post from 2018. These suggested using either a couple of parameters when running Read more…

Associative Grouping using tSQL (Part 2)

This is part of series of posts about associative grouping: Part 1 – Associative Grouping using tSQL Recursive CTE’s Part 3 – Associative Grouping using Spark In part one of this series we looked at how we could use recursive CTE’s to find overlapping groups in two columns in a table, in order to put them into a new super group of associated groups. Since I wrote that post, SQL Server 2019 CTP 3.1 has been released, and with it Read more…

Associative Grouping using tSQL (Part 1)

This is part of series of posts about associative grouping: Part 2 – Associative Grouping using tSQL Graph Part 3 – Associative Grouping using Spark Recently I was asked by a friend to have a look at an interesting query problem he had been looking at. He was trying to find overlapping groups in two columns, in order to put them into a new super group of associated groups. The simplest way to describe the problem is with an demo, Read more…

Azure Data Factory Lookup: First Row Only & Empty Result Sets

When using the lookup activity in Azure Data Factory V2 (ADFv2), we have the option to retrieve either a multiple rows into an array, or just the first row of the result set by ticking a box in the UI. This allows us to either use the lookup as a source when using the foreach activity, or to lookup some static or configuration data. Recently when connecting to a source system to retrieve configuration data, I discovered that the development Read more…

Signing Stored Procedures for Server Level Permissions

What’s the Problem Stored procedures are often used to allow users to access data in tables that they do not have explicit permissions to use. The mechanism that makes this possible is ownership chaining, whereby in the context of the stored procedure, the permissions of its owner are used (often dbo). This means users only have a limited set of explicit permissions (i.e. EXEC on the stored procedure), but are able to execute CRUD statements using the stored procedures. However, Read more…