This article is an excerpt from “Architecting Data Lakes: Second Edition” by Ben Sharma. Get the full ebook today!
In the past, most data lakes resided on-premises. This has undergone a tremendous shift recently, with most companies looking to the cloud to replace or augment their implementations.
Whether to use on-premises or cloud storage and processing is a complicated and important decision point for any organization. The pros and cons to each could fill a book and are highly dependent on the individual implementation. Generally speaking, on-premises storage and processing offers tighter control over data security and data privacy, whereas public cloud systems offer highly scalable and elastic storage and computing resources to meet enterprises’ need for large-scale processing and data storage without having the overheads of provisioning and maintaining expensive infrastructure.
Also, with the rapidly changing tools and technologies in the ecosystem, we have also seen many examples of cloud-based data lakes used as the incubator for dev/test environments to evaluate all the new tools and technologies at a rapid pace before picking the right one to bring into production, whether in the cloud or on-premises.
If you put a robust data management structure in place, one that provides complete metadata management, you can enable any combination of on-premises storage, cloud storage, and multi-cloud storage easily.
Finish reading this chapter and more in your own copy of “Architecting Data Lakes.”