Blogs

Arena Backstage Pass: How Spark is Utilized in the Zaloni Arena Platform

Jatin Hansoty February 2nd, 2022

The uniqueness and importance of Spark

For the next addition in our Arena Backstage Pass blog series, we are talking about all things Spark. Zaloni’s marketing team sat down with Director of Engineering, Jatin Hansoty, to discuss everything from the importance of Spark, how it is a unique offering, and how it has enhanced and broadened various features of the Zaloni Arena platform. 

 

Introduction

Jatin started with the basics, describing how Spark initially began as an open-source research project, where it gained a lot of popularity in the developer community. In 2014, Spark was adopted by the Apache Software Foundation and was released to the public a year later where it has continued to grow and get adopted by countless developers and organizations. Today, Spark is heavily used across the industry. With its rich set of APIs advantages over Map-Reduce and rapid adoption, Zaloni Engineering recognized it as the technology that would take the Zaloni Arena platform to the next level for performance and adaptability.

What is Spark, and why is it beneficial?

Spark is an open-source processing system that is used for large-scale data processing. At Zaloni we were heavily invested in the Hadoop platform for building out data lakes and data platforms and were keenly aware of the engineering effort for tuning and optimizing traditional map-reduce jobs. The Spark DAG based architecture raises the level of abstraction for developers with the promise of faster and more efficient execution plan. However, Jatin believes the strategy that made Spark such a winner was that it came with modules for SQL, streaming, ML, etc along with the core, and by doing so, it conveyed the vision that it was a platform for all big data developer personas. 

Tell me about Spark in Arena?

As mentioned earlier, Spark provided a replacement for Map Reduce with improved developer friendliness and better performance. That alone would have been enough of an incentive, but at the time when it was becoming popular, Zaloni was thinking about ways to evolve their support for streaming workloads. Ultimately, Spark’s support for streaming was the most intriguing factor for Zaloni. 

Zaloni’s overall approach was not Spark-centric. While Spark was an appealing technology for Zaloni to adopt, the team wanted Zaloni product users to be able to continue to work at a higher level of abstraction so that they cared about capabilities like “data quality” and logical processing units like “filter” and “join” rather than just writing and running their own spark jobs. So the first product feature Zaloni released that used Spark was called Transformations – a visual drag-and-drop designer for defining data processing jobs for batch and streaming workloads. Zaloni’s platform, Arena, handled translating the visual recipe into Spark and executing it on a cluster. Arena users were working with logical abstractions of the data and processing on the data while Arena was dealing with files, tables, SQL, Scala, Parquet, clusters, scheduling – all the nuts and bolts to make it simple for the user. 

After that, on one track, Zaloni started providing Spark versions of existing capabilities and started building new features natively using Spark. In this upcoming year, Zaloni is making another big push on their “sparkification” initiative as part of a product transformation into a SaaS offering with a more streamlined architecture.

To learn more about Zaloni, the Zaloni Arena Platform, or Spark, don’t be afraid to reach out to our data experts for a customized demo or solution that best fits your data needs. 

 

business glossary

about the author

Jatin Hansoty is Zaloni’s Director of Solutions Architecture, leading a team of Zaloni’s top engineers. His experience spans companies ranging from start-ups to Fortune 500, including Sensus, Fidelity, and Fujitsu.

zaloni zine header