Blogs

Extending Data Lineage Beyond the Data Catalog

Haley Teeples March 30th, 2022

A Post-Webinar Q&A with Moa

Moa Passador, Director of Solutions Engineering at Zaloni, hosted a webinar discussing all things data observability. Moa’s webinar, “Data Observability: Extend data lineage beyond the data catalog,” provides a deep dive into data catalogs and governance platforms. Throughout the webinar, he addresses how these tools are vital to helping organizations track data lineage and enhance overall observability. With improved observability, data stewards and data citizens will experience value from end-to-end data lineage with improved data quality and data they can trust for their everyday business decisions. 

This blog post will summarize some of the viewers’ questions during the live Q&A portion of the webinar. Below, you will find all of the questions that were addressed during the webinar and a couple that Moa was able to answer afterward: 

 

Question 1: When the tool scans the database, is the scanning done from metadata tables? Does this tool also scan stored procedures to find the lineage? If some tables do not have the proper reference keys, does the lineage need to be supported by code?

Moa’s Response:  When scanning a database, we’re basically crawling the database. In this case, we’re not searching for stored procedures; we’re searching for everything that is a table or a view. Now, I understand that if you have stored procedures, you may want to expose them. One of the ways that we can do it is by using our lineage API to add the exposed stored procedure inside the lineage. That way, we can show all of the transformations within the data life cycle. Additionally, we can catalog those stored procedures as a digital asset to make it easier for users to find the latest code version. 

 

Question 2: What’s the difference between custom attributes and metadata? Or are they the same?

Moa’s Response: Custom attributes reside inside Arena’s metadata, they are customizable and can be used to add business information to entities and as a filter in the search criteria. Those attributes can be updated manually via Arena’s Workflow and API.

 

Question #3: Are data rules already predefined in the tool? Do we need to create our own data rules from scratch?

Moa’s Response: When we install and ship the product, it comes with about 50 or so data rules that our customers can use. For example, social security numbers are a field that won’t change from company to company, so this kind of data quality rule comes out of the box. Customers always have the option to create and customize any type of data quality rule within the platform. 

 

Question #4: You mentioned bad data, but how do you improve it? Do you clean it through tools like SQL or do you have to set up reliable sources for cleaner data?

Moa’s Response: In a production environment, a data engineer would set up the environment so that the bad data entity would be proactively sending messages to the engineers when there is a mismatch in the data. That allows the data engineer to review the bad records then update the data quality rule to consider those issues or find out why they are ingesting bad data from the source.

 

Question #5: What is the difference between data lineage being obtained using log scraping vs. plugins like Apache Atlas uses?

Moa’s Response: Arena’s Data Lineage is purpose-built to automatically run jobs using our workflow capability. If we use a workflow to scrap logs, then the lineage will show up automatically. Zaloni Arena can also pull information from Apache Atlas and use that in our metadata.

 

Question #6: Can you connect and show lineage to any BI Tool?

Moa’s Response: Yes, assuming the BI tool you are using has a set of APIs that we can use to capture their metadata. If the BI tool doesn’t have APIs, we can still build Business Glossary objects that would encapsulate and show lineage from source to a BI report.

 

Suppose you haven’t had the opportunity to watch Moa’s webinar yet. In that case, you can view that here, and while you are visiting our website, take a look around to learn more about the Zaloni Arena platform, get a customized demo from our data experts, or sign up for a 14-day free trial

 

business glossary

about the author

Haley Teeples is a recent graduate from North Carolina State University and has been working at Zaloni for over a year. She initially joined Team Z as a Marketing Intern and then worked as a Technical Documentation Intern on the Engineering team. With a clear passion for content creation and learning in the data management space, today, Haley serves as an associate on Zaloni's Product Marketing team.

zaloni zine header