FU

Monday, June 7th, 2021 4:35 PM

Connecting Collibra Cloud to Azure Blob storage

I’m building an ML Ops framework that Collibra catalog will point to for metadata on artifacts including data sets, model output, inference results, and logging. This will be in Azure blob storage and possibly include ADLS gen2. Does anyone have this same experience? I can connect to e.g. Synapse, SQL, or connect to files like json and csv so I assume I’ll need my blog storage to fit into one of those formats.

1.2K Messages

3 years ago

The closest I can think of is the “Amazon S3” system integration, based on AWS Glue, offered as an OOTB Collibra system integration.
image

There is no such service on Azure as far as I know, but creating abstractions on top of Blob storage with Databricks, Snowflake, Presto (and its derivatives like Okera, Sunburst…) or Synapse Serverless might do the trick.
Also, It looks like Azure Purview is doing a great job of inventorying files, so probably a good idea to leverage Purview for the technical metadata crawling.

No other ideas so far.

Loading...