S

77 Messages

 • 

255 Points

Monday, July 26th, 2021 2:40 PM

Asset usage statistics

Rather than using Collibra’s free text “Tag” we have created our on Asset Type of Tag, so we can control the values. Now, I want see which tags are being used and which are not. Is there a method to count the number of occurences of relationships between the individual tags and the items to which they are connected?

For instance the tag “security” could be connected to multiple instances of data set or policy. I want to know how many of each kind.

Thanks

1.2K Messages

3 years ago

I’m surprised by the first comment. What do you mean by “control the values”?
Tags have specific permissions that allow you to decide who can create new tags, so that you can manage appropriately the creation of new tags. Other people can add tags to assets, without the permission to create new tags. And it’s definitely not free text.

That aside, if you have created a specific relation, you can just query the relation API for this specific relation type and it will retrieve the information you are looking for.

Alternatively, you can do a backup of the database, restore it into an external postgres server and perform SQL queries. This last option is especially useful to explore obsolete relations or attributes (i.e. characteristics that do not belong to the metamodel anymore).

77 Messages

 • 

255 Points

3 years ago

Arthur,

I am trying to set the permissions so that the SysAdmin can create a list of tags for others to use, but not allow others to create or change tags.

Thanks for the idea for the API. I also discovered that under the Stewardship icon, there is a count of tags in use, which is exactly what I needed - and without using the API .

Scott

262 Messages

but it is not showing what all asset types a tag is assigned against, isn’t it? Only when you click on the tag from the ‘Tags’ page, you see in the search results page, on the left hand side navigator, the asset type.

262 Messages

3 years ago

i think there should be a robust analytics capability in Collibra. the OOTB one looks half-baked (atleast for my requirements), for example - the reporting data layer, the asset grid new feature. Ofcourse, through the APIs, anything & everything can be done seems.

157 Messages

3 years ago

Agree with the sentiments shared.

I think more granular and robust statistics specifically around Tags would be a prime enhancement idea for the OMRE.

I’m not sure if @lugovyi.dmytro is still working on that piece, but would be a fantastic addition to an already heavily utilised offering.

43 Messages

OMRE doesn’t extract assets as assets since it would be simply duplication of the data. Regarding the request about tags, I’m not 100% sure I understand the purpose of the modification. Are you looking for the way how to show all the assets which have been tagged with some specific tag? In this case there are few options - search filters and global views. Or there is a necessity to add something on top? Could you please guys elaborate a bit on this? How the request from the end-user will look like?

157 Messages

That’s fair, I think the platform would benefit greatly from an uplift in the current way of managing tags. Sounds like the OMRE isn’t the answer to that need.

Few things I’d love to be see:

  • Tag count by Asset Types
  • Tag Count by Community
  • What Tags have been merged from other Tags?
  • Who is creating New Tags?
  • Who is adding existing Tags? (Different to Created By and Last Modified By)
  • Last date at which an instance of a Tag was added (different to last modified)

77 Messages

 • 

255 Points

Dmytro,
I am trying to see which tags are being used often and which are not being used often. Those that are not being used often will be combined/eliminated. The creation of tags is controlled by the DGC to minimize duplicates and semi-duplicates (ie car v cars). The use of the the new Asset Type eliminates Collibra’s inability to accept blank spaces in tags and the lack of migration.

1.2K Messages

To get a result like (pivot table)

Tag Data Set Column Business Term etc.
production 234 23 865 3
use.case.1 643 34 2 4
use.case.2 754 54 4 1

You can do the following:

Use the search API

  1. get all the tags from the /tags endpoint
  2. iterate over all the tags and query the search API. The search API being very efficient, you could send 100,000 requests per hour I guess (one request per tag)

In this example, I’m asking the search API: “How many assets per asset type for the tag some.tag?” It limits to 0 results (I don’t care for the raw results) and the top 1000 asset types (I hope you have less asset types than that.) :joy:

data = {
  "keywords": "*",
  "filters": [{"field": "tags","values": ["some.tag"]}],
  "aggregations": [{"field": "assetType","limit": 1000}],
  "limit": 0
}
r = c.post('search', json=data)
{'total': 164,
 'results': [],
 'aggregations': [{'field': 'assetType',
   'values': [{'id': '00000000-0000-0000-0000-000000031302', 'count': 164}]}]}

Use a workflow

You can iterate over all your assets with at least one tag and stream aggregate their tags. Probably less efficient than the method above, except if you have a lot of tags (think >100,000) and a lower number of assets.

Restore a database backup

You could also restore a database backup and execute a query like below. This would be a bit cumbersome, but the most efficient for a very large volume of both tags and assets.
select t."name", at2."name", count(*)
from terms_tags tt 
join tags t on tt.tag_id = t.id
join representations r ON tt.term_id = r.id
join asset_types at2 on r.asset_type = at2.id
group by t."name", at2."name" 

77 Messages

 • 

255 Points

3 years ago

I have given up and created my own Asset Type “TAG”. This will allow me to get around Collibra’s “no blank space” and “doesnt migrate” issues.

Loading...