For the next addition in our Arena Backstage Pass blog series, we are talking about all things Spark. Zaloni’s marketing team sat down with Director of Engineering, Jatin Hansoty, to discuss everything from the importance of Spark, how it is a unique offering, and how it has enhanced and broadened various features of the Zaloni Arena platform.
Jatin started with the basics, describing how Spark initially began as an open-source research project, where it gained a lot of popularity in the developer community. In 2014, Spark was adopted by the Apache Software Foundation and was released to the public a year later where it has continued to grow and get adopted by countless developers and organizations. Today, Spark is heavily used across the industry. With its rich set of APIs advantages over Map-Reduce and rapid adoption, Zaloni Engineering recognized it as the technology that would take the Zaloni Arena platform to the next level for performance and adaptability.
Spark is an open-source processing system that is used for large-scale data processing. At Zaloni we were heavily invested in the Hadoop platform for building out data lakes and data platforms and were keenly aware of the engineering effort for tuning and optimizing traditional map-reduce jobs. The Spark DAG based architecture raises the level of abstraction for developers with the promise of faster and more efficient execution plan. However, Jatin believes the strategy that made Spark such a winner was that it came with modules for SQL, streaming, ML, etc along with the core, and by doing so, it conveyed the vision that it was a platform for all big data developer personas.
As mentioned earlier, Spark provided a replacement for Map Reduce with improved developer friendliness and better performance. That alone would have been enough of an incentive, but at the time when it was becoming popular, Zaloni was thinking about ways to evolve their support for streaming workloads. Ultimately, Spark’s support for streaming was the most intriguing factor for Zaloni.
Zaloni’s overall approach was not Spark-centric. While Spark was an appealing technology for Zaloni to adopt, the team wanted Zaloni product users to be able to continue to work at a higher level of abstraction so that they cared about capabilities like “data quality” and logical processing units like “filter” and “join” rather than just writing and running their own spark jobs. So the first product feature Zaloni released that used Spark was called Transformations – a visual drag-and-drop designer for defining data processing jobs for batch and streaming workloads. Zaloni’s platform, Arena, handled translating the visual recipe into Spark and executing it on a cluster. Arena users were working with logical abstractions of the data and processing on the data while Arena was dealing with files, tables, SQL, Scala, Parquet, clusters, scheduling – all the nuts and bolts to make it simple for the user.
After that, on one track, Zaloni started providing Spark versions of existing capabilities and started building new features natively using Spark. In this upcoming year, Zaloni is making another big push on their “sparkification” initiative as part of a product transformation into a SaaS offering with a more streamlined architecture.
Blogs By: Haley Teeples
News By: Annie Bishop
Blogs By: Matthew Caspento