April 25th, 2018
Ready to take enable architecting the cloud? Get your custom demo of Zaloni Arena today!
Read the transcript here:
[Eric Cavanaugh] Ladies and gentlemen, Hello and welcome back once again to the new possible My name is Eric Cavanaugh I’ll be your host for today’s deep dive webcast, in which we talk all about the cloud. That’s right. The topic for today is the cloud, the rise of Kubernetes is one of the topics, we’ll be talking about, and we have one of the experts on that on the call with us today and we have several presentations to get through so I’m gonna dive right in and talk all about it so Hail to the power of open source. Let’s say I’m a big fan of open sources. Some of you may know I’ve been tracking it now for, well, very closely for about 13 or 14 years, frankly, back when was right after Katrina as a matter of fact, we did a bunch of research on open source and realize that the future is going to be driven largely by this really fascinating movement. First Linus Torvalds helped us out with Linux way back when, then we saw a whole open source movement around data, spit out of Yahoo and Google with Hadoop, and now we’re seeing Of course this rise of open source in the space of web architecture, you have OpenStack out there obviously but you also have Kubernetes, as a managing environment for using container. And I really view containers, as the nexus of open source and SLA service oriented architecture with virtualization technology, because what you’re seeing now is really the razor’s edge of innovation focused on on really compartmentalizing and miniaturizing basically the delivery of applications, these tiny slices of applications that’s basically what containers are. And I think that now we’re really seeing the cloud, take over as the primary domain for innovation and for application design and delivery. You know, I think you’re seeing a bit of a mobile first thing that’s been going on for a while now, but I think in general, even enterprise software is taking a very cloud-first approach. If you look at some of the newer vendors that are coming out they’re doing things like analytics or performance management, even human resources for example with workday and some of these other folks. All this is very much a cloud-first motif that has taken over and I think with good reason. Because the cloud, let’s face it, it’s an excellent marshaling area, not just for data, but also for functionality. And so if you look at some of the statistics here platform as a service is growing rapidly it’s going to continue to grow rapidly. And I think that’s because many organizations have now realized that cloud is the dominant place to store your data to deliver your functionality and its durability their viability, the functionality that you need. And I think a lot of this on prem stuff is going to slowly start to fade away but let’s be honest, legacy systems, usually don’t die right they just fade away until it stops supporting them, and then usually some other vendor comes in and buys up those licenses and continues to support them. So legacy has a very very long tail to it. That’s going to be around for a long time. And in fact, one of my favorite quotes from Gilbert van cutsem. He once told me that the dude is imitating he said so elephants, go to a special place to die, but there is no software graveyard. It all just goes to the cloud. And of course, he ran a company where that’s what they did, he did all day long they took applications that were run on premise, they deploy them in the cloud, and then they continued to run those applications for their clients. So Cloud has a very very long tail to it. And I’ll kind of hand the mic over to Rogers just to kind of give his take on architecting the cloud and what’s happening and why that’s happening, Roger.
[Roger] Thanks a lot for having me. Yeah, I think that, you know, one thing we have to think about with cloud computing is the fact that when we’re talking about legacy it. We’re not talking about something that didn’t work. People have been making payroll on legacy systems for many many many years. And actually I remember a time when payroll was automated that people didn’t trust it. But there are certain core applications and they work just fine. And they may be on legacy systems. You know, for, for another generation. However, data is growing at the rate of 24% a year it says according to Cisco and I tend to believe, which means it doubles every three years. We’re already processing more than a zettabyte, which is a million petabytes of data on the internet per year. And within another generation that’s going to increase to a yottabyte, which is 1000 zettabytes, which would cover the entire planet Earth and storage if we use today’s technology. So it’s data growth that every organization is facing in one way or another, that I think is driving a lot of this innovation, and another statistic I just saw that maybe only 10 to 15% of enterprise IT budgets are actually going to cloud technology per se, because a lot of that’s off that’s not capital expenditure. But 60% of all enterprise IT budgets for new initiatives, is pure cloud and I think the rest of the 40% is related to cloud in one way or another, but my contention is data growth that every organization, experienced in one way or another promote these devices to death from, it’s forcing innovation, whether they want to or not.
[Eric] Yeah, that’s a really good point and I’m going to use that as a nice segue to bring in our first presenter today, we’ve got to tell from Zaloni is going to talk about architecting the cloud and multi Cloud Data Lake Management, with Zaloni’s data platform so with that Parth Patel. I’m actually going to hand the keys to the WebEx over to you and take it away. Tell us what you’re doing out there.
[Parth Patel] Yeah so folks. Thank you for having us. We at Zaloni we understand the need and the new positioning lots and lots of prizes are taking with the architecting the cloud and a cloud-first strategy, and with that also we face challenges of enabling a data platform that can scale across different environments from on prem to hybrid cloud and also multi cloud strategies. So, how do I go through the slides here. You can just use the right arrow at the top of the screen or I can move it for you if you want. Oh, I got it. Great. So, just to give a very quick overview of who we are. We are a big data software platform company that has a data lake management platform that provides you governance and a self service capabilities for your enterprise data lake. And it’s coupled with our years of experience in designing and implementing data platforms across various different environments for large enterprises.
So, with that, very quickly. We all understand that for building a complete data lake stack, it’s really complex and especially in this new era of cloud and hybrid and multi cloud environments so it comprises of multiple different options that customers can pick from and it does get very complex very quickly. So there are a lot of different options to choose from when it comes to storage we saw, there’s on prem storage data centers or the storage offered through different public cloud, as well as now, new specialized Storage Cloud based storage offered to our today’s sponsor like wasabi, not, not just provide multiple options but also increase complexity. All that was the compute layer that you add on top of it, where you have multiple options to run your workloads across different processing environments that are available. So, to be able to actually derive really be able to manage and derive insights from that data and before you even start building applications on top of your data in cloud or, for that matter, even in multiple cloud environments. You need a very effective and a robust data management layer that can now provide you good governance layer, across different environments and entire platform, and also ability to do a lot of data discovery in a self service manner. So that’s where we fit. If we see here. zaloni fits in the white light in the sweet spot of where we provide a data management layer, it’s built on three main pillars, we enable you to build out your data platform, and bring data into your data lake. And from there, we also provide you ability to govern. Bring in your data validation processes data lifecycle management capabilities data provenance data security, as well as data wrangling and data mastering capabilities, into the data lake itself. And then at the end. Once you have built this data governed data lake our data platform environment across different environments, how do you enable not enable your end users or your consumers or your applications that you’ve built on top, to be able to discover the data sets that are being managed to our platform and do a lot of self service like self service Data Prep provisioning and even self service ingestion. So, with those capabilities in mind we developed our xolani data management platform that allows you to build all those leverages all these different capabilities. Now multiple different options that we work with our customers when it comes to deploying a data platform, a managed and governed data platform data platform in cloud environment on AWS. There are different components that are available natively that customers would like to use, and those are. You can group them in storage layer processing layer and the serving layer and in the end, it’s a consumption layer and across all these layers you want a robust data management capabilities that can be provided to a unified platform. So for architecting the cloud, A typical architecture, if you look at it, you see on storage layer can be s3 with new customers can also choose to interchangeably replace SD my storage offered to wasabi. And this land your data in your landing zone, and to our data management platform like ours, you can, and utilizing the zone-based architecture that we have been using used as our reference architecture, you can now manage everything from ingesting data be to pulling data from relational databases or file names or even streams. You can land that data in your storage capture all the metadata not just business and technical, but also the operational metadata. That’s the key. When we provide the management layer and the metadata management capabilities is capturing and keeping up to date, all the metadata related to the data as it goes through the data pipeline from learning to larger trust to refine and recapture that meta data so it also provides you an end to end data provenance, we can build on that so your end consumers can now use and discover data sets that they want to work with. From the data catalog, and directly run as far as the ad hoc analytics be leveraging some polls like Athena, or even provide provision data to any other data warehousing application. Here we are showing redshift but it can be easily provisioned to snowflake or any other data warehousing applications where now you can build your vi n analytics platform on top of that, so this is one of the typical deployments that we see on s3 at the center of it is our scillonian data platform that allows you to ingest capture metadata provision provide your rich data catalog, and then allowing you to do a lot of data quality data validation and leader security and access controls in a single unified for the same. You can see deploy the similar architecture using all the as your competence and same can be spoken about all the Google Cloud components as well. And another reference architecture here and this is a different view, leveraging the same pipeline building out the same pipeline leveraging different components that are available from Azure Data Lake so this are platforms, right you have flexibility to kind of manage your workloads, depending on where you want your data to reside in, and you can move from one platform to the other. Very quickly and easily because all the data pipelines that you’re building out are all metadata and built. So, they are not, you’re not writing custom code it’s all metadata driven you’re actually leveraging the metadata that we capture upon ingestion so now the pipelines can easily be migrated over to any other platform so we can easily support a multi cloud environment in that way.
So, with not now we have additional capabilities with built on top of it. Once you hydrate your data lake you have brought in all the data you have discovered your create a new curated data sets that you can now provision to different applications. Another thing we’ve brought in into a data lake environment is ability to do your data mastering in side your data lake so you do not need to provision any specialized servers or beefy servers that you need to run your data mastering as well as subscribing to expensive licensing, you can actually run your data. Data mastering inside your data environment we leverage spark ml and do probabilistic data matching and linking, and then data mastering and survivorship rules and policies are applied to that to generate your Golden Record. Now, that can be this extension can be now used to generate your are kind of enable your customer data of customer 360, or the product 360, all those different applications can now be supported through this extension that we provide on top of our management platform, and we are continuously adding more and more machine learning capabilities within our platform to kind of build a self organizing data environment, as well as, enabling all these different use cases that we constantly see and get requests from our customers that we continuously build and develop to support those use cases. So with that. You can learn more about our product and platform and our services that we offer on our website as well as we have a white paper that we have written on the data lake reference architecture, and a guide that go on to how to future proof your big data ecosystem leveraging all the new open source as well as cloud based technologies that are available to you.
[Eric] So, question number one question no sharp two part, is he talking about data mastering in the last slide there, you know, I’ve been longing for quite some time now for a solution to be a traditional Master Data Management architectures. Is that possible is that what we’re seeing here is a final way to deal with master data in one clean pass if you will or what do you think about that.
[Parth Patel] So, it’s, it’s a supervised process where you reuse the existing data that you have ingested in your data lake and we have the metadata captured for that. So now we leverage a sample, a random sample of that data to train the model to do the data citation matching and linking so now you’re different because you are ingesting data for a single customer from multiple different sources, we take all those sources combine and match and link, a single record for each customer and apply a survivorship rules and policies to generate the Golden Record, and this is a continuous tuning process so because we use probabilistic learning. Our probabilistic matching algorithms that are available in spark ml. We allow customers to either use the ones that we have available with our extension what we also allow them to use their own algorithms to do the model training. So, we have written ability to do the data mastering in your data lake you’re leveraging your existing infrastructure that is both scalable, as well as an all meta data driven.