guillaume_le_galiard's profile

71 Messages

 • 

2.2K Points

Friday, September 22nd, 2023 8:13 AM

Data Citizens Circles: How to leverage more value with Collibra Data Quality

Hi!

Thank you again for your active participation to our Circles event yesterday.
Stay tuned, we are planning another Data Quality Circles Session in Q4 2023.

The recording is available here

The presentation deck can be downloaded here:
2023 09 21 NAMER DC Circles - Data Quality in Action.pdf (6.7 MB)

If you would like to access existing content of the past Data Citizens Circles please visit [here](https://datacitizens.collibra.com/forum/c/data-citizens-circles/l/top)

Questions/Answered live during the session

Q1. Is this for the Collibra DQ Sass option or the Collibra on-prem solution?
A1. The basic integration of information works for both DQ Cloud (SaaS) and the on-prem solution.

Q2. Does this mean we don’t need Edge installed?
A2. DIC/DQ API Integration does not use Edge. Correct (unless there is “DQ Cloud” involved.)

Q3. Can customers who use Job Server for DIC still use this integration?
A3. Yes this integration is API based so Job Server or not has no impact on its functionality.

Q4. Does this mean that we will be able to trigger catalog ingestion from DQ module?
A4. Eventually - yes. This is part of the target feature set. Specifics are always sorting given our backlog prioritization, but we intend to ultimately have a fully by directional DQ/DIC integration.

Q5. Can we control which sets of DQ rules come across. for example, can we prevent adaptive rules/results from coming over to DIC?
A5. Currently, all of the DQ results and rules come over, but they can be filtered in views. There are some enhancements coming that should improve the amount of control that you have and how they are displayed.

Q6. Can we profile data without DQ module?
A6. No.

Q7. In prior releases there were limitations on certain types of DQ / Metrics that could be computed by pushdown versus Spark - perhaps it was in Data Profiling. Are there any remaining?
A7. Yes. We are an Agile-Scrum development shop. What you can expect is feature parity will come, over time, in a phased release fashion. For example Patterns (FPGrowth) for Snowflake is currently scheduled for DQ 2023.11. In the same way, for each DQ Pushdown platform we anticipate an agile release schedule with features rolled out over multiple DQ releases.

Q8. what are the resource requirements for running this on-prem server (e.g. Oracle)? For Snowflake what size of Warehouse did you use for size of records you used in your example?
A8. It depends on your data size. We have details available online and happy to discuss more
Performance Settings

Q9… Can I view these detailed DQ Dimension scores in DIC with the integration which was presented earlier?
A9. This isn’t currently on the roadmap for built in integration but being that the metastore allows access to all of these metrics that functionality isn’t prohibited.

Q10. Is there a way to visualize report (showing pass/break record counts) for a specific DQ Rule over time
A10. Yes, there is a report called “Rules Passing Fraction Roll-Up” that brings this functionality.

Q11. will you consider a future session for Databrick’s pushdown?
A11. Yes, of course.

Q12. I am interested to see the template of the Quality Breaks data to be stored with source database. Is it same as the current structure of the records in Owl metastore?
A12. Its a different schema or structuring, not forcing you to replicate the schema of metastore - rather just leveraging the results lists

Q13. Are there any plans to build some type of transform/mapping capability in this interface?
A13. Yes as George demonstrated, but happy to discuss more!

Q14. Snowflake recently released their DQ capability, like the Data Metric Function in private preview. During a demonstration of that feature Snowflake indicated that they are working with Collibra. How does Collibra envision integrating those Snowflake features into Collibra DQ?
A14. We have a strong partnership with Snowflake, just as many tools exist in tech, there is ultimately 1 ticketing system (typically Salesforce)… We would see Collibra DQ working with Snowflake DQ, ultimately rolling both results into DiC as the catalog of catalogs.

Q15. “Rules Passing Fraction Roll-Up”: It seems this functionality shows a summary aggregated by dimensions but not by specific Rule. Please correct if I am missing to locate the feature.
A15. You are correct. For each rule specifically it may require a custom pull from the metastore.

Q16. What technology is behind the scene enabling SQL generation from Rule workbench? Is the Generative AI?
A16. Yes Generative AI

Q17. Will there be further work on expanding the UI along exception handling and marking items as resolved?
A17. There is an aspect of exception handling in the Status column in rule breaks. A user can choose between Validate, invalidate, and resolved. Resolved in this context would mean the DQ break has been fixed in the data upstream.

Q18. Are there any additional reports being planned to cover dimensions like Validity, Accuracy, Timeliness etc,?
A18. Yes constantly considering reports, happy to chat should you have something that is scalling!

Q19. When will the Dataset Dimension Report - Filters be available?
A19. This is currently in beta version. Please ask your CSM to enable these for you.

Q20.what is the technology behind the push down mechanism where you are executing things in parallel… i understand you ahve moved away from spark (in pull up)
A20.Its built in the native SQL of the application. Running on its compute (ie warehouse for Snowflake)

Q21. I meant about the SQL generation feature in Rule workbench - when would this be coming out?
A21. This model uses Vertex which is basically Google’s version of ChatGPT.

Q22. Very excited about the enhancements to the DQ Tool. Will there supporting step by step guides provided to customers?
A22. Definitely, will get the training out soonest

Q23. Can you assess data quality for same “thing” stored in different databases?
A23. yes, our source feature can support disparate data sources…

Q24. Will you be able to see outliers in DIC? All or will there be a limit?
A24. Yes you will be able to see outliers, all could become redundant, we have an archive feature that we would consider linking to avoid showing all data in DiC, the same as how the tool generally shows previews and not all the data.

Q25. Is the DQ to DIC intigration only available if DQ is in the cloud?
A25. No. DQ on-prem or DQ Cloud. Both.

Q26. So, you can’t correct data on systems?
A26. No, because we think that Data Observability should be separated from Data Cleansing.
Please read the interesting content.

Q27. You can’t write to any system?
A27. We can write out a Link-ID report of Primary Keys of the records which broke rules

No Responses!
Loading...