Microservices Data Lake: 6 Advantages of this Approach

Gaurav Chakravarti Gaurav Chakravarti March 12th, 2019

Microservices Data Lake

A machine is a sum of its parts, which need to be maintained or periodically upgraded. If you can make changes without replacing everything, shutting down production or affecting other components, you come out ahead. The same holds true for your data analytics “machine.” With traditional monolith applications, which are built as a single unit, this isn’t possible. In contrast, microservices are an approach to developing software out of independent components that are individually developed, deployed and maintained, and connected to each other via APIs. Each microservice can be thought of as an independent application for solving a specific business problem. This approach makes a lot of sense for the data lake. For example, you might build a microservice focused on ingesting data from AWS, and another microservice to inventory unstructured data for the data catalog. Enabling these processes to work independently brings a number of advantages that can help you derive more value from your team and the data lake.

1. Easier deployment and maintenance with microservices

Deployment and maintenance becomes more flexible, as the overall platform isn’t affected by modifications of different components. This dovetails nicely with a DevOps approach, where teams continuously build, deploy and manage applications. A microservices approach means teams can deploy components of a complex application separately using preferred technologies and tools.

2. More flexibility in solving use cases

Additionally, microservices give you flexibility in how you solve a business problem – with a single end-to-end component or several independent components, enabling you to adapt solutions to various factors or team skill sets.

3. High availability

Microservices make your platform more “fault tolerant:” if one component goes down, it doesn’t impact the overall platform. Microservices enable you to have nodes or services on different servers so that they always are available even when a server fails. When something does go wrong, microservices make it easier to pinpoint the problem, pull it out and fix it.

4. Future-proofed architecture

A microservices approach means you’re not locked into a particular vendor and can more easily and cost-effectively change out technologies as needed or to incorporate new technologies. This is key, as the big data technology landscape is constantly evolving.

5. Ability to play to your strengths

The independent nature of microservices enables a company to play to each individual team’s strengths and not necessarily need to hire new skills, as different microservices can be built using different programming languages and technologies teams are already familiar with. This can make a lot of sense as applications become more complex and are codependent on multiple teams, technologies, and systems.

6. Easier testing

With microservices, testing becomes more modular and streamlined. Even when different teams own individual microservices, you are testing the communication or output of the microservices through APIs, not the inner workings of each microservice itself. This supports faster development and continuous delivery to meet business requirements.

Data lakes and microservices: A smart approach

Data management across a large enterprise in a hybrid environment can be very complex, with increasing volumes and types of unstructured and semi-structured data, and increased demand for data from different business teams for a variety of use cases. Microservices help make development, deployment, and maintenance of the data lake more flexible and agile – not to mention more resilient to problems or disasters.

microservices data lake secure the unrealized power of your data click for demo

My colleague, Sabby Gupta, goes into more detail in this on-demand webinar: How to Use Microservices to Build a Data Lake on AWS.

about the author

Gaurav was an architect then director for big data at Zaloni before becoming Principal Data & Analytics Platform Engineer at Blue Cross NC.