Why Smart Companies are Complementing Their Data Warehouses with Data Lakes

Avatar photo Team Zaloni May 3rd, 2017

As the amount of data collected by companies continues to grow at an exponential rate, companies are quickly realizing that their current data warehouses aren’t prepared to handle the quantity, diversity and speed of today’s data landscape. Many companies are turning to managed data lakes, to reduce costs, store more data, improve business intelligence and discover new revenue streams.  

When we say “data lake,” we’re referring to a centralized repository, in Hadoop or in the cloud, for large volumes of raw data of any type from multiple sources. It’s an environment where data can be transformed, cleaned and manipulated by data scientists and business users. A data lake requires a data lake management platform to manage ingestion, apply metadata and enable data governance so that you know what’s in the lake and can use the data with confidence.

There are many differences between a data lake and a traditional data warehouse. In the chart below you will find a more detailed view of the differences between a data lake and a data warehouse and their benefits.


Difference-Data Lake & Data Warehouse

Modernizing your data environment

Modernizing a data environment is challenging and very few companies today want to throw away their data warehouse for a data lake. But complementing an existing enterprise data warehouse with a data lake, creating a hybrid architecture, can be a smart first step for many companies. It provides more flexibility and speed in terms of data processing and capturing unstructured, semi-structured and streaming data, and frees up bandwidth in the data warehouse for well-defined, repeatable business intelligence activities. It’s also a use case that typically produces a guaranteed return on investment.  The benefits of a data warehouse augmentation include:

Data Warehouse Augmentation: Cut Costs, Increase Power

Save millions in storage costs. Scale-out architectures (e.g., Hadoop, AWS S3, Azure, etc.) can store raw data in any format at a fraction of the cost of the data warehouse. Providing higher capacity in a smaller footprint can save an organization millions. In fact, Zaloni helped one client achieve 20 times the storage capacity of their data warehouse at 50% of the cost of a previously planned data warehouse upgrade. Another client achieved a 100x cost reduction per terabyte of stored data.

Significantly speed up processing. A data lake’s flexible architecture enables faster loading of data and parallel processing, resulting in faster time to insight. For example, one Zaloni client quadrupled the throughput of its system after migrating processing to a data lake. The data lake is also much more effective than the data warehouse for processing the increasing amount of unstructured and semi-structured data that’s important for analytics today.

Maximize DW for BI. Costly data warehouse resources shouldn’t be wasted on low-value activities such as data transformation. One Zaloni client realized that 90% of their data warehouse platform was being used for ETL processes, leaving little processing power available for high-value analytics and business intelligence activities. A data warehouse augmentation with a data lake made it possible for the enterprise to employ more strategic use of its assets.

Extract more value, more quickly from more data. Lower cost means enterprises can store more data in an accessible format—in an “active archive” versus on tape. Extending data retention periods for historical data and eliminating time-consuming backup processes supports more in-depth trend analyses that can lead to further business insights and more effective business strategies. In addition, data lakes enable organizations to immediately start up sandboxes for data scientists.

In practice: Data warehouse augmentation case studies 

Zaloni has years of experience helping organizations augment their data warehouses with managed data lakes. Below are a couple case studies where a successful data warehouse augmentation was implemented:

Verizon: Data warehouse augmentation

Verizon, a leading wireless network provider that designs, builds and operates networks, information systems, and mobile communication technologies wanted to speed and enhance its data analytics capability and improve archiving and recovery using a scalable big data platform. Nearly 90% of Verizon’s enterprise data warehouse (EDW) platform was used for extract, load and transform (ELT) processes. The company needed to free up CPU so that the EDW could be used for true data warehousing, as well as business intelligence practices. In addition, existing archival and restore processes were manual and high risk. Verizon needed disk-based and system-driven backups, as well as a solution to support longer data retention periods for historical data and enable ad-hoc analytics and data mining.

DW Augmentation Solution: Verizon’s new hybrid Hadoop and Teradata environment dramatically increased storage capacity and processing speed and reduced costs. This was done without impacting the division’s upstream/downstream business systems and ultimately the business end user experience. With timesaving, local data processing and all raw data – above and beyond what was in the EDW – stored on the Hadoop platform, data could easily be accessed for analytics if it became important in the future.

Results: The solution reduced CapEx by $33 million over five years, increased storage capacity 20x and achieved a 100x cost reduction per terabyte ($200K/TB for Teradata, $2K/TB for Hadoop).

Pechanga Resort and Casino: Data warehouse offload 

Pechanga Resort and Casino, California’s largest resort and casino, manages a comprehensive customer loyalty program. The company wanted to be able to use existing and future data to optimize and grow its loyalty program by leveraging data science and advanced analytics.

DW Augmentation solution: Pechanga’s automated and configurable ingestion framework ingested data into the data lake from approximately 5,000 tables from Oracle and the SQLServer within weeks – compared to the months it would have taken to do it manually. Zaloni’s data lake management platform tracked lineage and metrics of every ingestion, allowing for a wider range of analytics, faster. Data could be accessed for self-service analytics in near real-time, or stored in raw format to be used if it became important in the future.

Results: Pechanga was able to prove the potential value of specific use cases, convincing its board of directors to move forward into production, with the end goal of optimizing the customer loyalty program and improving the customer experience. In addition, the solution provided a framework for the future onboarding of other sources without needing additional development.

Getting Started

If you’re interested in learning more about how to modernize your data architecture with a data lake, Zaloni has several resources that may be of interest to you, including:

Webinar: Data Warehouse Augmentation – Cut Costs, Increase Power with Verizon 

Pradeep Varadan, Verizon’s Wireline OSS Data Science Lead and Scott Gidley, Zaloni’s VP, Product Management, discuss the benefits of augmenting your DW with a data lake. They also address how migrating to a data lake allows you to efficiently exploit original raw data of all types for data exploration and new use cases. Watch now

ebook: Architecting Data Lakes – Data Management Architecture for Advanced Business Use Cases 

In this eBook, Ben Sharma, Zaloni’s CEO, and author Alice LaPlante, discuss best practices associated with building, maintaining and deriving value from a data lake in production environments. They address architectural considerations and required capabilities. Download now

You can also contact us at any time to learn more about Zaloni’s products and services and set up a live demo.

about the author

This team of authors from Team Zaloni provide their expertise, best practices, tips and tricks and use cases across varied topics incuding: data governance, data catalog, dataops, observability, and so much more.