A

Monday, March 27th, 2023 2:49 PM

Are you using ChatGPT in the Data Intelligence space?

Are you using ChatGPT in the Data Intelligence space? Are you using it to generate certain descriptions or automate content creation? Whatever your use case may be, we want to hear about it! Sharing your experience helps other data citizens get inspired and discover the untapped potential of AI to come up with creative solutions that solve real-world problems and drive business growth.

We are excited to hear about the amazing ways you are using ChatGPT (or similar) to revolutionize your businesses and achieve new levels of success. Please share your stories below.

9 Messages

1 year ago

Yes! In fact, we had a discussion about it a couple of months ago here: https://datacitizens.collibra.com/forum/t/chatgpt-for-collibra/2781
It looks like the Collibra team had a very similar idea as the one I had. I enjoyed reading your recent blog article:
Mastering the art of data intelligence: empowering Collibra with ChatGPT | Collibra
It mentions it’s not planned as an official feature but imagine what integrating this type of technology throughout Collibra would do for your customers. Any activities involving English (or other language for that matter) has traditionally been the realm of the data stewards - whether it be assistance in writing up data dictionary fields, business terms, policies, or standards, anything that can be done to offload manual activities would be a big value add. We’ve already employed ChatGPT in each of these scenarios - works great!

19 Messages

100% agree - thanks for the comment! I personally really like the efficiency + human oversight

1.2K Messages

1 year ago

I saw a nice integration in a data catalog with something called “Trident.AI”, that provides three interesting features:

  • Propose definitions (with an interactive chat, which allows the steward to refine and tweak the proposition)
  • Writing SQL queries from natural language, based on the data dictionary and debugging
  • Explaining SQL queries

I know DataShift has been toying around a workflow to query ChatGPT for business terms propositions.

In the end, user experience is key for this type of interaction. A good engine with a poor UX would not yield impactful features.

19 Messages

Interesting find on Trident.AI, just watched it - what we do differently in the blog is adding business context on where the column is located context-wise (what table, schema,…) and I also prefer to (by default) enable the user to change the suggested description. In that way we have human oversight on what goes in the catalog

@arthur.burkhardt what are you up to with the GPT buzz?

1 year ago

Hey Arthur, Alexandre,

Indeed, it’s also on our radar! The chatGPT integration we thought of is indeed very similar to the one you propose above. One specific but interesting use case we’ve worked on too: integration with DeepL, for companies with multiple main languages (as we often do in Belgium :slight_smile: ). Write it in your language, deepL translates and gives a proposed translation inside the workflow. @sam.datashift.eu @maarten.rahier

19 Messages

I love DeepL @martijn.vanhauwaert!
Didn’t know they had an API, great, I should look into it

I’m even wondering, would DeepL or gpt-3.5-turbo be cheaper to use? Any findings?

5 Messages

Hi @alexandre.tkint.collibra.com ,

DeepL has a free tier that allows up to 500k characters per month. gpt-3.5-turbo costs $0.002/1k tokens (1k tokens = 750 words roughly).
Further the client we implemented the Deeple solution for was already working with Deepl and had some licenses available. So that made the choice easier :wink:

We also tested out ChatGPT to propose business term definitions, but so far the results were a little bit disappointing. Especially for organization specific terms. We are looking further into it how we can improve the results by giving a more specific request.

9 Messages

I’ve found some success in getting ChatGPT to produce good definitions. Here are some things to try:

  1. Provide it with your definition rules/standards.
  2. Provide it with an example or two of a well-formed definition.
  3. Provide it with an example or two of a poorly-formed definition - you can adjust these based on the types of errors you see it return.
  4. Instruct it to return 3 competing definitions for a user to choose from.
  5. Provide it with a reasonable amount of context.
  6. Allow the user to provide up to 5 keywords.
  7. Allow the user to explain in natural language why the returned text doesn’t hit the mark.

Organization-specific terms can certainly be challenging for a LLM model as it’s not going to have access to the company’s internal materials, but by incorporating some of the ideas above I believe you’ll at least have a good starting definition for many of them.

19 Messages

I like these suggestions @wes.arnquist.solidigm.com!! Especially providing examples of good / bad definitions

Thank you!! Hope this helps for all data citizens here

1 year ago

Hello, if we could also get advantage of ChatGPT for better search experience in Colibra that would be great. NLP done by it will be much better for target results for users, rather than search just through keywords. A lot of business users get lost in huge search results sometimes.

Maybe it can automatically mark checkboxes to filter search results?..

19 Messages

That sounds like a really good suggestion @vshahinyan.questrade.com, on what kind of assets/relations would you focus on a first trial?

Thanks @alexandre.tkint.collibra.com, I think we can target anything, e.g. ‘Customer report’ can trigger filtering the result only for reports… actually any improvement case that can lighten search output processing for consumers would be great.

8 Messages

1 year ago

This is very cool!!!

16 Messages

1 year ago

Hello!

We haven’t broached the subject yet in our enterprise but I have had conversations about just this with my team and the potential to utilize AI to back-fill descriptive information for reports when it’s not feasible to have a person do it, especially since some of our sources have hundreds of reports that have no descriptions or very poor ones.

19 Messages

How is that going so far? Any progress? I’m very curious!

Loading...