Active Data Hubs and the Data-Driven Organization

Avatar photo Team Zaloni April 30th, 2019

Business users need data, lots of it, and faster than ever. Waiting weeks or months for a programmer to code up a report had been the norm until recently. But this practice is no longer efficient or acceptable.

Business users have also become more technically savvy. They have a better idea than in previous generations what kinds of decisions they need data to make and what data they want. Many have become used to writing mini-programs such as Microsoft Excel macros to derive insights from their data. They now want to derive insight and value from the many sources of big data their organizations collect.

Self-service access to data

Giving each team and user control over queries can speed up decisions by orders of magnitude, a speed that every organization needs in order to respond to changes in their environments today. As just one example, continual changes in US tariff regulations over the past year have strained the planning capacities of many companies. You can wake up in the morning and find yourself in a totally different business environment.

Self-service requires work at many levels as well as new architectures for data storage:

  • Potential users need to be able to find data. It’s not enough to collect the data; the organization must tag and categorize it and create a data catalog that allows searches and queries. It’s also important to have a comprehensive online taxonomy, which is a list of terms and their relationships. In retail sales, for instance, grills might be a subset of appliances and gas grills are a subset of grills. Formalizing relationships like that can help people search for useful datasets. In another retail company, grills might be classified as lawn furniture instead of appliances; that will affect searches.
  • A process must be in place for giving the users access to the data they want. This might include having a data owner vet the access, copying the data to a new repository, anonymizing or masking sensitive parts of the data, and checking later to make sure the user adheres to the contract provided with the data.
  • Tools must support access by people with modest technical skills. A web interface might allow some queries to be generated by filling out a form, which is then translated into source code and run by the underlying system. Customizable dashboards can also provide crucial information quickly.

The organizational impact of being data-driven

Only when self-service becomes universal and everyone in the organization is trained to consult the data before making a decision, does the organization become truly data-driven. This is a new way of running a business.

This is where active data hubs play a huge part in helping organizations make the shift to being data-driven. With active data hubs, such as those implemented through Zaloni Arena DataOps platform, users can interact with, enrich and provision data from a curated, managed catalog that hosts data from any source. Because the platform centrally manages and automates relevant governance over the data, business units get self-service access to trusted data that matters to them.

Organizations can start by delivering a data hub to a single line of business, and then use Arena to develop data hubs for other lines of business, growing over time into an enterprise data-driven culture. As a result, organizational impacts of being data-driven change conversations altogether, from:

  • “What do you think?” to “What does the data tell us?”
  • “Have you seen a situation like this before?” to “What data backs up your assertion?”

By changing the conversation, organizations stay focused on the best interests of the company, clients, and stakeholders when proposing or opposing various courses of action for progress, innovation, and growth.

Excerpt from The Data Lake Maturity Model, eBook by Scott Gidley, VP of Product at Zaloni. Get the full eBook.

active data hub

about the author

This team of authors from Team Zaloni provide their expertise, best practices, tips and tricks and use cases across varied topics incuding: data governance, data catalog, dataops, observability, and so much more.