Papers

Architecting Data Lakes – O’Reilly eBook

March 16th, 2020

We are all well versed with data warehouses capturing data from enterprise systems such as the CRM, inventory, and sales inventory transaction systems but new technologies, including mobile, social platforms, and IoT, are driving much greater data volumes, higher expectations from users, and rapid globalization of economies. To meet new business needs, organizations are turning away from data warehouses to scale-out architectures such as data lakes, using Apache Hadoop and other big data technologies. However, despite growing investments, very few enterprises report ultimately deploying their Big Data PoC project into production. This is mainly because those organizations fail to do data lakes right, falling short when it comes to designing the data lake properly and in managing the data within it effectively.

Whereas when organizations architect data lakes according to best-practice it has proven to be highly beneficial for advanced business use cases.

In this eBook, you can learn best practices associated with building, maintaining, and deriving maximum value from data lakes vs data warehouses in production environments. Most importantly, there is a detailed checklist to assist you in constructing a data lake in a controlled yet flexible way.

You will examine: 

  • Reference for a data lake architecture
  • The difference between having a data lake vs data warehouse
  • How data lakes overcome challenges presented by data integration in a traditional Data
    Warehouse
  • Key data lake attributes, such as ingestion, storage, processing, and access
  • Why implementing data management and governance is crucial for the success of your data lake architecture
  • How to curate the data lake through data governance, acquisition, organization, preparation, and provisioning
  • Methods for providing secure self-service access for users across the enterprise
  • How to build a future-proof data lake tech stack that includes storage, processing, and data management
  • And, Emerging trends that will shape the future of architecting data lakes

Organizations are designing and deploying data lakes for scale, with robust, metadata-driven data management platforms, which give them the transparency and control needed to benefit from a scalable, modern data architecture.

If you are concerned with building a data lake architecture today, this is a must-read book, which will not only serve you now but also help you scale in the future.

About the Author
Ben Sharma, CEO, and co-founder of Zaloni, is a passionate technologist with experience in solutions architecture and service delivery of big data, analytics, and enterprise infrastructure solutions. Previously with NetApp, Fujitsu, and others, Ben’s expertise ranges from business development to production deployment in a wide array of technologies, including Hadoop, HBase, databases, virtualization, and storage. Ben is the co-author of Java in Telecommunications and holds two patents.

architecting data lake vs data warehouse
arrow pointing to form