P

Wednesday, April 7th, 2021 12:48 PM

Collibra Data Lineage

Please post here your remarks, questions or feedback regarding the Marketplace listing Collibra Data Lineage

1.2K Messages

3 years ago

Hello,
My use case is to connect to Snowflake and get the technical lineage information and publish in collibra.
I’ve configured the lineage harvester tool as per documentation provided on link(https://productresources.collibra.com/docs/cloud-user/2020.12/Content/CollibraDataLineage/TechnicalLineage/InstallationAndConfiguration/ta_prepare-config-file.htm)

Now when I run the harvester tool, in the show lineage tab of asset I see below parsing error.
'[msg-P00452]: Nothing to process here (input was null) A common cause is that the Harvester’s SQL user doesn’t have the right permissions"

The snowflake user has full read permissions on database.

Did anyone faced such issue? Happy to know how to resolve such issue?

9 Messages

In order to fully access INFORMATION_SCHEMA it is required by Snowflake to use elevated user permissions (admin). Please elevate the permissions for the user which you have specified in Lineage Harvester for Snowflake data source.

If you cannot elevate the permissions you might use to the admin user to create a view on top of INFORMATION_SCHEMA and grant access to that other non-elevated user. This should overcome the permissions issues. In such case you have to follow up and modify also SQL statements in Lineage Harvester /sql/snowflake/ so it is reading from that new view instead.

262 Messages

Hi Martin,

Can you please guide on all necessary pre-requisites for snowflake metadata ingestion? The below article talks about running snowcd command from the jobserver and whitelisting the snowflake hosts.

https://support.collibra.com/hc/en-us/articles/360014046437-Snowflake-connectivity-issue

Do we need to do this always? Do we need to whitelist only the main end point type :DEPLOYMENT" or other endpoints of type “STAGE”?

also, any certificated need to be imported into the jobserver?

Br,
Noor.

9 Messages

Hi Noor, I recommend you reach out to Collibra support with this question.

1.2K Messages

3 years ago

Hello @martin.masarik.collibra.com Does lineage harvester support NTLM login(window login) for SQL server ?

I’ve been trying A-Z to connect to SQL server from harvester tool but no luck so far, Interesting point is that NTLM is still supported by collibra provided JDBC driver for SQL Server.

I’ve been in contact with collibra support but they also told me that NTLM is NOT supported with harvester.Any idea when this would be supported.Since NTLM is supported with driver we can’t now step back on this.

9 Messages

Hi @karanpreet.singh , lineage harvester officially supports only user/pass authentication. However since LH is using JDBC driver from Microsoft you can try to make it work by passing custom properties. This is possible by modifying lineage-harvester.conf as outlined bellow. However I cannot offer further guidance on this.

“customConnectionProperties” : “instanceName=my-instance;databaseName=my-database;integratedSecurity=true;domain=my-domain;authenticationscheme=NTLM;user=my-user;password=my-password”

We plan to add support for NTLM later this year once Lineage Harvester is ported to a new component called Edge.

262 Messages

3 years ago

How do we handle technical lineage for data sources where there are no out of the box connectors? Can you please elaborate?

1.2K Messages

Hi Noor. You can create technical lineage manually by creating a custom JSON file. Instructions to do so are in our documentation here:

https://productresources.collibra.com/docs/cloud-user/latest/#CollibraDataLineage/TechnicalLineage/InstallationAndConfiguration/ta_prepare-a-json-file-for-technical-lineage.htm

7 Messages

3 years ago

@noor.shaik - If you find out, let me know?

1.2K Messages

3 years ago

Hi,
Is there a way to ‘default’ a view when a user goes into Diagram i.e. we want the first display to be LOB / Application / Data Domain and NOT have any Data Models open unless the user chooses to click on the “+” symbol?

157 Messages

Hey mate,

I think you’re referring to a traceability diagram here and not tech lineage (slight differences in product offerings).

Answering your question quickly, you can add the relation to the Data Model as a Boxed/Boxing Node and there’s an option to set the Node as collapsed by default (see below).

Shout if you have any issues

image

1.2K Messages

Thanks, Alvin,

Certainly helped clear up the ‘mess’. Still not quite right for what my management want to see but I’m new to Collibra so will do some more digging.

1.2K Messages

3 years ago

Hi @paulo.taylor Is there possibility to feed technical lineage with transformations from external systems let’ say with collibra system APIs, without using lineage harvester?

9 Messages

Assuming we are talking about importing transformation details into Collibra Data Lineage then this has to be done via JSON file and ingestion via Lineage Harvester and not via Catalog APIs.

3 years ago

Hi there @paulo.taylor, @martin.masarik.collibra.com,

For more than 1 month now, we cannot run the lineage harvester successfully to sync the technical lineage inside the DGC (failing 9 out of 10 times in average.). I’m talking about the “master batch”. The error seems to be a communication failure between DGC and Techlin.

Full error message: Received ‘410 Gone’ response code to GET https://techlin-aws-us.collibra.com/api/batch/dd9716a7-eea3-4351-a17a-431f8cbae488/status/ request. Response body:
{“slug”: “dd9716a7-eea3-4351-a17a-431f8cbae488”, “status”: “failed”, “message”: “PROCESSING ERROR: “HttpServerError: Received ‘500 Internal Server Error’ response code to POST https://XXXX.collibra.com/rest/catalog/1.0/internal/technicalLineage/relations/importRelations request. Server error after 5 tries. Response body:\n{“statusCode”:500}””}

We raised support cases with Collibra Support but no luck. Sometimes even the technical lineage ran forever and we killed it after 40 hours or so. Usually it should run in approx. 3 hours. This all started when we added Power BI in the mix.
That time the lineage harvester completed successfully in like 3 hours but now it is either running wild for more than 40 hours and then we stop it, or it is failing with those 500/410/424 errors, or it is completing successfully 1 out of 10 times.

This is very unpredictable for us and we cannot use it like this.

Here are some of the cases raised: #67173, #69188, #68439,

One other note is that on DGC it seems to be completing successfully as I see “Synchronization of batch for id: techlin COMPLETED”

How can we get additional help in fixing this issue?
Seems like a communication issue between DGC (hosted on cloud) and Techlin server.

Thanks,
Alex

2 Messages

We have been struggling with the same issues and I’m curious if this remains unresolved for you

Harvester is unable to complete the master batch update and can run for 24 hours or more before we must manually kill the process. Support for Data Lineage has been anemic for the past 1.5 months which makes it incredibly challenging to successfully deploy this product. Our efforts have been full-stopped because we can no longer use Harvester to update the TechLin servers.

Since this morning, we have begun receiving the ‘410 Gone’ response code that terminates the master batch process. Is there anyone from Collibra paying attention to this thread that can address the issues of two customers who are having the exact same problem?

@martin.masarik.collibra.com
@paulo.taylor

My open case #72103

9 Messages

Hi Peyton,

first of all I am sorry to hear about you having troubles with Lineage harvester. We have indeed worked with Adobe and resolved the reported issue. I had a quick peak at your ticket and it is not related to the same problem.

I am not sure whether you raised more Support tickets but the one you linked was reported 6 working days ago. Engineering already provided partial fix earlier today which indeed you reported back is not solving the issue in full. I am going to work with the team to understand the issue in full and provide an update on this before end of this week via Support ticket.

Martin

637 Messages

 • 

1.2K Points

2 years ago

Can anyone help Shruti with this ask re ‘Collibra Edge lineage harvest capability’? @martin.masarik.collibra.com, is there any documentation we can direct Shruti to?

9 Messages

@kristen.freer please reach out to Chandra internally and point him towards that post. Thank you.

262 Messages

2 years ago

@martin.masarik.collibra.com @paulo.taylor
Hi Martin
Can you please shed some light on technical lineage harvester’s support for SparkSQL -

Can it simply read from a Databricks hive metastore, parse the table structures & link them across several databases (say, 3) by just doing name matching (assume table names, column names are same)? Or, it needs SQL files as input so it can understand the tables across those 3 databases are really linked via the SQLs? In my case, I have a metadata driven ETL load processes (that is fully parameterized) and I think there is anything meaningful there that can be fed to the lineage harvester (unless the harvester can parse parameterization)

9 Messages

Hi @noor.shaik , supported are SparkSQL statements. You can upload plain SQL statements as files via Lineage Harvester or use JDBC driver which is shipped with Lineage Harvester. That driver allows connection only to AWS host in regard to SparkSQL.

262 Messages

@martin.masarik.collibra.com
May I know what is expected in those SparkSQL statements/SQL files?

Currently, I have seen a colleague who feeds a Snowflake .SQL file to Lineage Harvester. The SQL file contains a SELECT query on INFORMATION SCHEMA that gets the view names & their source code (just 2 columns in select clause). I believe lineage harvester uses the OUTPUT of this SQL to link views with underlying tables (?). The same Snowflake schema has a stored procedure that moves data to another schema. I guess then the link of data movement between these 2 schemas cannot be established as lineage harvester cannot parse yet stored procedures yet. So, the .SQL file in this case then can contain links/middle man (stored procedure) information so lineage is established between the tables in these 2 schemas?

9 Messages

2 years ago

@noor.shaik Indeed output of that SELECT query is used to extract view definitions from the database. Then we scan for lineage and will capture and display lineage from underlying tables to that view. What you can upload as a file is any plain DML operations like INSERT, UPDATE, etc. Stored procedures are not currently supported for SparkSQL.

2 years ago

@martin.masarik.collibra.com Hi Martin,

I modified the suggested customConnectionProperties for azure sql server

“customConnectionProperties” : “instanceName=my-instance;databaseName=my-database;integratedSecurity=true;domain=my-domain;authenticationscheme=ActiveDirectoryPassword;user=my-user;password=my-password”

I took the following as reference : https://docs.microsoft.com/en-us/sql/connect/jdbc/setting-the-connection-properties?view=sql-server-ver16&viewFallbackFrom=sql-server-ver17

But I was not successful . It throwed the following error:

Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Cannot open server “******.com” requested by the login. The login failed. ClientConnectionId:c1699240-8a35-4a27-bf97-2e6af14c6517

9 Messages

Hi @balasubrahmanyam.estamsetty, unfortunately this is not something I can help with. I recommend you raise a Support ticket for this and see if you get can help with this custom properties option. Thank you.

Loading...