November 9th, 2017
If data lakes enable us to incorporate more data into our analyses while reducing cost and speeding up time to insight, why do only 20% of Hadoop data lakes ever make it into production? What we see as the number one problem for most companies is that they abandon their data lake POC because delivering a return on investment takes way too long.
Yes, building a data lake stack is a complex undertaking (although our solution can cut your time to production by 75%). You must consider various options for storage and computation, including on-premises, cloud, hybrid, even dual and multi-cloud environments. In addition, you must make many decisions concerning data management and governance. Compounding the complexity is the fact that the big data ecosystem is fragmented as well as constantly evolving.
To make sense of it all, we find it is helpful to approach the undertaking of building a data lake stack as a long-term journey – and a worthwhile one.
Let’s say you’re thinking about incorporating a data lake into your data ecosystem, or you were an early adopter and already have an (under-leveraged) data lake that you want to clean up. What could your big data journey look like? We like to frame the journey in terms of data “maturity.” These are stages that an organization goes through as it implements a next-generation data architecture with a data lake at its core.
We have developed a Big Data Maturity Model to show the stages that organizations move through when modernizing data environments with data lakes. By understanding what stage your company is in allows you to (ideally) avoid typical mistakes that often are made when moving to the next stage. And potentially, identify ways to “leapfrog” more quickly through the maturity process.
Zaloni’s Big Data Maturity Model
Many organizations live today in Stage 1, which we call “ignore.” They are leveraging the limited but reliable data warehouse technologies that they have built over decades. Some organizations have begun experimenting with a data lake — primarily for inexpensive storage, or siloed pockets of analytical insight — without a good handle on what’s in there. This results in a dark and murky “data swamp.” We call this second stage “store.”
Stage 3 is the goal for most organizations today. A governed data lake allows you to have full visibility into your data and its lineage, regardless of location (i.e., on-premises or in the cloud) or data type. You can apply security and privacy and can implement rules and workflows to address data quality, where data is stored, and how it is accessed. Ultimately, a governed data lake enables you to eliminate data silos, streamline analytics and confidently combine disparate data types to find new business insights.
A governed data lake sets you up for a more advanced data environment that leverages automation and machine learning to create an intelligent data lake. You can build advanced capabilities such as text mining, forecast modeling, data mining, statistical model building, and predictive analytics. The data lake becomes “responsive,” with an automated data lifecycle process, and self-service ingestion and provision. Business users have access and insight into the data they need (for instance 360 views of customer profiles), and they don’t need IT assistance to extract the data they want.
Big data is not just about implementing the right technology. It also requires the right people, processes and data. Therefore, you might find yourself at various stages simultaneously. For example, from a technology perspective you may be at “Stage 2: store;” yet from a process perspective you may be at “Stage 1: ignore.” When assessing your data maturity, be sure to evaluate where you stand across all necessary components.
A classic mistake we see is companies focusing on investing in technology without taking into consideration whether data best practices are in place and if people are trained to derive the most value from the data. For more details on how to balance your big data planning efforts, watch this webinar I recently did with Matt Aslett, Research Director at 451 Research.
Our mission at Zaloni is to help customers move through these maturity stages as seamlessly as possible. Our data lake management platform and professional services experts help early adopters clean up and operationalize their data lake implementations, as well as help those who are just starting on their journey “leapfrog” to success. To learn more, contact us.