February 13th, 2019
Analysts need timely access to enterprise data in order to stay competitive in today’s rapidly changing environment. Typically, business users need to request access through the IT department, which can be a waiting game, either because of technological roadblocks, governance restrictions or both. This adds more work, more process, and more frustration on both sides. Having the ability to find data sets, examine, update, and provision the data themselves allows business users to move quickly and frees IT to work on higher priority items.
A modern data platform should provide a self-service data marketplace that gives right-sized governed access to data. The security permissions allow IT to define who needs access to the correct data at the appropriate stage of the data pipeline. This becomes quite complicated in regulated environments. Users should be able to search for data they have access to, explore and potentially update the metadata associated, and provision it into a sandbox when ready.
Join us as Aashish Majethia, a Senior Solutions Engineer, dives into the self-service data marketplace and what is required to make it successful. He will cover topics including:
– Self-service data preparation
– Governance considerations and how they can enable a more agile data-driven enterprise
Read the webinar transcription:
[Brett Carpenter] Hello everyone and thank you for joining. Today’s webinar is on Empowering your Enterprise with a self service data Marketplace. My name is Brett Carpenter the marketing strategist of Zaloni and I’ll be your MC for this webcast. Our speaker today will be Aashish Majethia Zaloni’s Senior Solutions engineer. Now, we’ll have time to answer your questions at the end of the presentation. So don’t hesitate to ask them to any point using the ask a question box that’s located just below the player window. You’ll also notice the poll given during the webcast, please use the vote tab also located just below the player window to participate. Now before we dive in I’d like to introduce you briefly to Who We Are.
(00:54 Introduction to Zaloni)
Zaloni simplifies big data. We help customers modernize their data architecture and operationalize their Data Lakes to incorporate data into everyday business practices. We supply Zaloni data platform which provides comprehensive data management, governance, and self-service analytics capabilities. We also provide professional services and solutions that help you get your big data projects up and running fast. now with that, I’ll turn it over to Asshish as he discusses how a data Marketplace can help your organization increase the adoption of a data-driven culture. Aashish.
(01:34: Agenda for the webinar)
[Aashish] Thank you for a great introduction today. We want to talk about a self service data Marketplace. So a lot of terms get thrown around in the space. And I think the first thing that we want to do is really talk about what is a self service data Marketplace, define what it means how it might be a little bit different for what you may be doing today. Then we want to talk about what are the benefits right? How do you how does your organization benefit? How do you benefit as a user somebody in IT depending on where you are?
Organization includes benefits might change but overall we think that’ll be a big fit for a lot of organizations. And then we want to talk about what sir considerations to take into play as you’re building one, right? If you really do want to have it in a self service Data Marketplace. You see those benefits within your organization what sort of things do you need to think about in terms of building it and making sure that it’s going to be usable and not just something else that gets built or bought and doesn’t get used.
(02:30: what is self-service data marketplace)
So first what is the self-service data Marketplace? So I guess some background right a lot of organizations in the past have had it in charge of the data itself. Right? So IT is, controls the data. They’re controlling governance. They’re making sure people have the right access in very complex environments if there’s a lot of questions in terms of regulations and who should have access who should be seeing what data and so IT tends to be those data governance, be that data governance body to make sure that adherence is made and so what ends up happening is that business users have to come to IT and they request that data from IT, right? So this user will say hey, this is the sort of analysis I’m trying to do to go to IT, IT Will go in find the data that they’re interested in might have some changes to make to it and then hand it off to business users. And so this Paradigm ends up being a little bit cumbersome. All right. So there’s a lot of overhead for it to service business. They every request that comes in. They have to spend some time to figure out what that request is. They might go back to business and request some additional information to figure out what sort of data they’re looking for. And if there’s an additional consideration to take into play business has to spend a lot of time working with IT to find the data that they’re interested in and in terms of how this came about it makes a lot of sense. But now what we’re seeing is that this is becoming a bottleneck for a lot of organizations to get into this data-driven mindset.
(04:25 A Paradigm shift)
So the new paradigm looks a little bit different where business users still want access to the data and it is still in as part of the process, but now they’ve implemented a product or platform to help facilitate the governance aspects that they put in right so now they’ve kind of stepped away from the day-to-day interactions, but within that platform business users are able to access the data that they’re interested in they’re able to search for that data. They’re able to really go in and service themselves. And that’s this is what we mean by a self service data marketplace where they can go in and do it themselves without having to bother IT and IT can concentrate on higher priority issues and building new things for the organization rather than servicing business users whose data needs, you know at this point are not slowing down, but they had to just increase day-to-day.
(08:30 Characteristics of the Self Service Data Marketplace)
Search again, you can search not only data and the metadata around it right all that kind of contextual information around it. There’s also the ability to search for data assets. So what sort of reports are generated what sort of Tableau reports what sort of other types of BI reports or just daily reports are being generated. How do I search through those? How do I find them? What sort of information is there who’s using this data who’s using these reports? All of that information needs to be searchable, right? We want to be able to understand what’s happening, what sort of other information is there and then we want to be able to search across different infrastructures, right? So you buy another company or by another database or Hadoop system or Cloud whatever it is that your company has and may purchase all of those underlying infrastructures have data you don’t necessarily want to move them, But you want to be able to search across them and to find that Information. you might even had Excel files and things like that. How do you how do you deal with that? The next step?
This is what separates data Marketplace from a catalog is that the results can be checked out and used in your tool choice, right? So the search the catalog is very helpful to find that information. But what do I do with it? What do I do with it, how can I then take it and do something new with it? So those results can be checked out and you can use it in a database or you could make a hive table where it is wherever you want to pull to All that data to you want to be able to do it right from that catalog view. Additionally, you want it to be interactive. So you want to if you have data that you found you could update it. You can go in and say hey, I found some additional metadata. I know who this is. I felt some additional information from outside or I have the data owner. I want to be able to add that additional information. So that other folks can go in and see that information that I’ve added in. I want to add star ratings. No, whatever that information that you want to put in that self service data Marketplace needs to adhere to that and you also want to be able to add new data. Right? I’ve found a new data set online. It seems to be very in line with what our businesses and helps me with what the work. I’m doing. I want to be able to upload that to our Data Marketplace so that other users can use it and I want to make sure that governance is enforced on it. Right? So all of that needs to be as part of the process.
In terms of metadata, you know we’re talking a little bit about metadata here, but we want to understand if there’s data quality if there’s governance, if there’s lineage right? So what has happened to this data what sort of tags are in there is their business glossary things like that. I’ll pay up if you think of this Amazon world. Is this data. Is this a bestseller right. When you’re searching for something you don’t just look at a picture of it and say yeah, this looks great. You do some research, you look at the reviews, you look at you know, how does Amazon rate this to others? Data users look at different things that are similar. So we want to really develop that sort of usable data marketplace where users get some guidance in terms of you know, getting a better idea in terms of this is this data or this asset important. Is it something that we might be interested in. So I think that gives a good idea in terms of the characteristics of the self service data marketplace now, let’s get into what the benefits are, right.
(Benefits of creating a Self-Service Data Marketplace)
So I think that gives a good idea in terms of the characteristics of a self-service data marketplace. Now, let’s get into what the benefits are. Right. So, you know, we talked a little bit about this where we’re going from this traditional Paradigm where it’s– kind of in the middle to it’s– is overlooking, right? This is an eye on The Middle on the second second view there. But information technology is kind of paying attention to what’s happening from from a higher level view without having get into the day and day and so what that provides is you know first off I think it reduces that overhead and frustration for both IT and business users right so now I T is not again just overloaded with data requests and then you know it’s taking them x amount of time it’s taken we’ve heard you know two to six weeks sometimes to get the right data in the right format to business users that’s a lot of time for IT is spend it’s also a lot of time for businesses to spend right and At that point data also has a life cycle right it is it if it’s new data it might have a higher value than than older data is sort of half life right so you know after six weeks that data might not be worth as much we want to have that self-service improve the insight and time-to-market. as users are going in and adding information they’re adding metadata to that data that’s improving that Data Marketplace itself right we’re tell you think about crowdsourcing is as one of the things that people are doing today. take Yelp right as more users use it it becomes more and more valuable to see you know, which restaurants are important which which products are important.
We want to be able to integrate across databases and management tools and analytic applications. We want to be a single Source where people can find the things that they want. You know that the Paradigm is changed in this world where you know before when you wanted to buy something you want to buy Electronics you go to Best Buy or wherever it was you want to buy shoes you go to a Shoe store now people will start with Amazon, right? And so that’s kind of the idea here is that we want to be able to be a One-Stop shop where people can find the data that they’re looking for and it’s not there. They can give that feedback to IT and then it can help with kind of adding the assets that they need to be able to find the information that they want. And that Dynamic catalog again improves the quality elicits trust that hey there’s a lot of people actually using this data set. I’m seeing it go into a lot of different places people have made some changes to it and
They’ve uploaded those changes. How do we kind of see those changes? So it gives you a lot of trust in terms of other folks in the organization and how they’re working with it. So this view here gives you a little bit of indication in terms of the use cases, right? We want to be able to do data integration. We want to be able to do data prep and analytics and data visualization. All those can be fed using the self-service data Marketplace and then again feed that Marketplace. So it’s kind of got a life of its own.
And some of the benefits, in addition, is you’re looking forward and strategically right data scientists who log data and want to get clean recent data. We spend nearly 80 percent of their time cleaning their data want to be able to find recent data want to be able to do Advanced analytics and this is a place where they can do that right with our spending less time on finding data on cleaning up their data. They should be able to find the data very quickly and then if it’s been cleaned already then Making they can start getting into advanced analytics a timely manner. Additionally as an organization. We’re looking at a lot of companies that are developing this data-driven culture. Right? So a lot of decisions used to be made, you know, sort of ad hoc. We’re seeing more and more people using things like Excel, but now we’re getting it to practices of data where people have very specific apis. Are there looking at those can be developed as necessary and can be exposed and so now You kind of have this data-driven culture where people are looking at things systematically and so having a data Marketplace can facilitate that and you know that gets the goal for a lot of organizations today is how do we take the data that we have all the stuff that we’ve been collecting for years. And how do we monetize it? How do we take this user data that we’ve had for that we’ve been generating for 30 years and how do we make that into an asset that we can you know make something usable for it and use that internally or if they’ve got some other information, how do they monetize. And that’s where this in Infonomics concepts comes in. This is something that Gartner had come up with a little while ago, but the idea is that you know, your data has intrinsic value and how do you monetize?
(16:40: Audience Poll)
You know, I talked a little bit about the benefits here, you know just be interested to hear from the folks on the line which of these benefits I just listed a few but which of these benefits provides the business biggest impact for you and your company is it, you know, just being able to find the data, right and the kind of the simplest piece here is that a lot of companies are struggling with that. Is it being able to reduce that overhead on I.T. On these producers and stewards? Is it to improve the time to insight to data consumers so that they can get to their data faster, Or it is collaborating with other users? Right? So business and IT there they tend to be in different organizations. So, you know, how are they collaborating? How are they working with each other? Is it just their emails? I’d love to hear if we put up a poll here. Love to get a view from everyone here to see which of these might be the biggest impact for you and if you have additional ones feel free to send those in a chat in terms of you know, this one is is not listed, but I’d love to see where that comes about. So, yeah, I know in terms of the companies that we work with there are kind of the biggest ones right being able to find that data, right and so a lot of catalogs and things like that are kind of the first step in finding that data and then the IT issues, of course.
Pause for a second. Let you guys take a look at that poll and give it an answer.
(18:38 Poll response!)
All right, you’re getting a couple of votes come in. So it looks like a lot of folks are here saying that they’re able to just finding the data on their own is is one of the toughest things around the next one looks like it’s improving time to Insight for data consumers and then reducing overhead on IT actually so all of the above is tied for the top as well so it looks like people are having issues with the whole process. And then some people are having issues with finding the data and then kind of go in at there.
(19:34 Considerations for building a self-service data marketplace)
You’re building a self-service data marketplace with some of the things to keep in mind. So I think one of the major things is you want to kind of look at your goal at the end. Right? And so when you’re going about that goal instead of looking at the problem in front of you is kind of saying, okay. Well, where do we need to go with this? Sometimes people look at you know, hey, how do I need You know five different tools to do this process. That’s perfectly fine. That’s a legitimate way of going about this. One thing to keep in mind is that you do have to deal with Integrations, right? So every time a tool releases a new version and that version needs to adhere to the other versions of the other tools that you have, right? So anything that integration happens you need to be able to adhere to that integration. It’s really rare that a technology company will maintain those Integrations with all versions of all the tools that you have. Right? So those SLA’s can vary. So that’s something to kind of keep in mind. And what ends up happening is that you end up having a development shop internally to maintain just with Integrations to make sure that your platform stays up and wine.
The other thing is to find tools or platforms that you can collaborate on. And so this collaboration is important, you know people talk about People, process and product and so a lot of what ends up happening is that processes get developed within organizations and those processes tend to be a bit manual, right? Oh, you need to submit this sort of ticket. You need to do the sort of report. This is the way you know, you need to communicate with other people so we can track all the stuff and those get developed based on needs. And so when those needs start to get replicated across multiple companies you start to have products that can handle those and so what you want to do is minimize the amount of process that you have between data scientists, and IT and that is where a platform can help.Iis to be able to help with that collaboration have them communicate internally so that they don’t need to discuss outside and emails and things like that.
I would take a metadata first approach. So the metadata is again that data about the data. When does this data come in who you know who’s touched it what sort of work has been done to it you know where what other things have occurred to what sort of lineages their have people tagged specific attributes you know what are the columns actually mean right there’s a name for the column but what does it actually mean how does the house the flow in is there any sort of additional information about that column that can tell me you know what it what it even though I understand the column means I know where it came from and what sort of processing is designed to it. And so again there’s that whole Amazon view of when I click on a product everything that you see on there on that page about that product is metadata, right and all those reviews all the star ratings. All of that information is exposed and that’s all metadata. So that’s an important part of keeping your platform up and running and then making sure your solution is actionable.
Right. So finding the data is great. I think one important piece is to be able to do something with it. Right? How do I then you know take the information that I found all that metadata. How do I take that and do something with a can I pull that data into a different location and start, you know interacting I want to extract information from that data, right. I come from a data science background. How do I you know, how do I run a regression on there? Whatever is right. So, how do I just do a simple Trend analysis? That sort of basic analysis even that needs to be facilitated by the tool that’s going to help you find that data and then you want to be able to make sure that that platform facilitates Automation and optimization for operationalization.
Right? So once it’s set up, I don’t want to keep going back and redoing the same work right that the platform needs to handle my day-to-day stuff. I want to spend my time and my people doing higher priority items rather than just supporting the needs that they can do the work right? I want to have a product that’s going to do that. So automating processes operationalizing the work that needs to get done is an important part of what the sort of platform would need.
(24:25 Why ZDP (now Arena) for Self-Service Data Marketplace)
Kind of background about Zaloni, So we do offer a self-service data Marketplace and it’s integrated. It’s everything from finding that data shopping for that data handling multiple underlying infrastructures, whether they’re you know, on-premise or cloud or some mix of the two. Looking at data sets across different infrastructures, making sure that the governance is enforced whether that’s on privacy and quality and then each information and then automating and orchestrating workflows for enrichment of that data for performing data quality checks for automatically deleting that data and then we want to make sure that we want to improve the productivity of those data producers of the consumers. Make sure that people are not spending time or repeating analyses making sure that folks aren’t you know, just repeating repeating repeating and asking each other for information and waiting. That waiting time can be quite variable and can be time-consuming.
And then we want to talk about governance and catalog currency infused throughout the process. Right? So as people are looking for data, we want to make sure that they’re talking wit in that catalog right in that catalog. Is it here in to the governance policies that have been set up that people are getting the information that they need that they’re not getting information that they shouldn’t be getting all that is defined within the Zaloni data platforms. If this is something that you guys are interested in we’ll be sending out some information and you’re more than welcome to read through that and then if it’s a fit feel free to additionally reach out one of our one of the best like us
One of examples of centralized data marketplace is a very large financial institution in the United States. And so they had over 40 million customers and consumer and business customers and thousands of retail centers around the world. So they wanted to have a centralized resource for its enormous volumes of data. So it can be more easily accessed by IT teams and various lines of business for data and analytics and so the complexity of the environment made it very difficult for them to kind of get to that ultimate goal, right they needed to collect and integrate metadata. They wanted lots and lots and lots and lots of databases and systems and files. How do you kind of how do you make that into a distributed process that’s centrally managed right? And that’s kind of the issue that they’re running into they wanted to hack the database like that.
And so using Zaloni we were able to implement a unified data platform to create a self-service data marketplace that gave them for governed access to data for the business users while maintaining control for IT. The catalog had searched filtering to find stability of relevant data. They had high quality scores, secure data reducing kind of insight and analytics.
Results. I guess the bank is a leading an industry leading the industry and sophistication of its Big Data initiatives. They’re they’re really at the Forefront of what’s happening in the Big Data space. And so their commitment to automation, standardization, advanced Data profiling are all pieces that we’ve been able to help them with and then after the first phase of the project they’ve applied an enterprise-wide standardized very robust metafile to all the data which will make the data more understandable and usable for self service analytics.
I wanted to take a break here and see if there are any questions from the audience.
And I’ll let Brett kind of manage those questions.
(Questions and Answers)
[Brett Carpenter] All right, perfect. And again if you have any questions use the app could question tab or box. That’s just below the player window and we have we’ve had a few come in over the course of the presentation. So we’ll just jump right in and the first one is
What specific feature and functionality helps with managing the schema changes such that the long lineage already put in place by analysts and scientist remains backwards compatible without creating duplicate sets in the lake.
[Aashish Majethia] Yeah, that’s a very good question. So I guess the question is, you know, we’ve got some schema dress, you know, or I’m buying data or somebody’s handing me some data but they’ve added a new column or they made some changes to it. And now what I was expecting is not what I’m getting and so the platform that you choose I would suggest handled that scheme adrift and until only does so So basically, we’ve got this concept of these metadata entities, which it’s essentially look at a data sets. But as the data is shifting those entities can have versions of the schema. And so as new versions of the schema come in, we want to either automatically or manually be able to update because later on down the line we’re going to have extra columns. We’re not going to know what to do with that bright. So being able to kind of have versions of At schema, of that entity allows me to maintain that lineage, right? I’ve done all this work on the previous version. Now, I’ve got this new version. I want to make sure that I’m not losing all that information, all that metadata around that data. So that’s where that’s schema drift is important to pay attention to and that’s something that we see often with our with our customers. Any question
[Brett Carpenter] Another one is I know you touched on this briefly. But really what’s the difference between a Data Marketplace and a Data Catalog?
[Aashish Majethia] Yeah, that’s that’s a good question as well. So I think I kind of touched on it. So the catalog is an important piece of the puzzle right without having the catalog. It’s very difficult to find the data that we’re interested in. Right? So that catalog is a searchable place. It allows me to kind of find information about the data or find a data sets that I’m interested in all these data assets all you know, the data whether their reports or workflows and things like that all of that needs to be able to found within that catalog, but the catalog then needs to be able to be actionable, right? So that whole action piece of it is I think what’s what the what differentiates a Data Marketplace from both from a catalog so that catalog if what you would need is the ability to then take some action such as provisioning that data. So whether that’s pushing it out to a different location so that I can do additional analysis on it or whether that’s you know, actually adding information to that catalog or adding metadata to the data itself whether it’s adding assets to the self-service Data marketplace itself. Right? How do I kind of manage that whole process? So that’s one piece that I think is is important because when you found the data catalog how do you know what do you do next? Right then? It’s like okay. I found it. That’s great. Now I need to push it to Tableau. Now. I want to download it to excel. You want to be able to facilitate that whole process and that’s where that Data Marketplace allows you to do that from a One-Stop shop.
[Brett Carpenter] All right. Now that lends itself to this next question. How does Zaloni provide self-service such that an end-user can bring his or her own data into the lake, not only providing a view of what exists in the lake
[Aashish Majethia] It does. Yeah, that’s a good question since Zaloni does allow a user to bring his around data in so basically a there’s a wizard that’s geared towards business users. Right who say hey, I’ve got this data set. I want to be able to bring it into the lake. And so they basically point to it wherever it is, they give some information in terms of you know, how is it you know, what format is it is in, Zaloni will pick it up and then Zaloni says, okay. Does this look correct? In terms of the schema what you were expecting? If it doesn’t you can make some changes if it looks correct. Then you go ahead and say OK great. Let me go ahead and and pull this data in I’ll do a profile on it. And then it’ll have its own entry into the self-service Data marketplace. And so then there’s governance restrictions right off the bat there in terms of where that user has access to wherever they have access other folks can get in based on their access permissions as well. And so they can Start sharing that information as long as as long as the permissions are set up properly. And so what ends up happening is that users will find data from the internet. They know they’ll generate their own data. Whatever it is, right, they can pull that data in and then be able to add that to the lake and make it again more trusted more usable Lake where they can Market places I guess of where they can start playing with that data and start sharing with other folks.
[Brett Carpenter] All right.I think we’ve got one more so far and it’s how much ability as a business analyst will I have to get access to the data.
[Aashish Majethia] Yeah, that’s a good question. Yeah, so I think overall, you know, that’s that’s kind of what we’re looking to provide is give that business analyst access to find their own data, right? We’re going to set them up. We’re going to give them username password. We’re going to say, okay great now that you are in the lake, we’ve already set up what sort of permissions you Those can change right and it can kind of manage that process. But at the end of the day whatever is in these buckets you have access to to start playing with to start seeing. You know, what data is about that data what sort of lineage is there? You know how other users use things like that? So all that gets in as part of the usability within Zaloni data Lake.
[Brett Carpenter] yeah. I have about another one come in. Can I add data integration plugin Over the top of what comes out of the box. What’s the extensibility for new protocols and interfaces?
[Aashish Majethia] Yeah thats a good question. Lots of users have newer places that to bring in data, right? I might now say hey, I want data from S3 bucket or tomorrow. I want teradata the next day. I want mysql because I bought a new database whatever is how do I integrate with those and so the way there’s only works is basically anything with a jdbc connection we can set up an integration for and and so we don’t charge for those connectors, but basically you can We can help or you could do it yourself, but they will set up connection to another location. And so it’s actually quite straightforward to access any database. I haven’t run into any source that we haven’t been able to at least bring into the lake and so in terms of the accessibility right, it’s quite expensive though. We can use any jdbc driver you can use rest calls, you know, wherever you need to be able to pull that data in from where we’ve been able to find a way to Do it and set that up in a process, right? So tomorrow you said? Okay. Well, I’m pulling this data for the database but I don’t want to just offload it. I just want to be able to pull in a certain amount and tomorrow I want to pull in another amount. So all of that can be kind of built into the process and orchestrated within the delaunay platform right once it’s in then you can run some prophecies. You can have these different zones within the area here and then give access to folks with the right zone right so I could say okay. Well, this is raw data. I don’t want to give anyone Raw data, I might have you know some work that’s done to remove any sort of personal information. Right Social Security numbers things like that. And then I’ll give people access to the right Zone and then that zone allows the right users the right access at the right time. Good questions.
[Brett Carpenter] all right.
Perfect the looks like that’s that’s all the questions that we have for now. I want to thank all of you for joining the presentation and sticking with us through some of the audio issues and to Aashish for taking the time to speak with us about the benefit and considerations of a self service data Marketplace. If you have any follow-up questions, definitely reach out to us using the contact us form located in the attached. The length of this presentation will be available on demand both on Zaloni’s resources Hub and on brighttalk for future viewing and the slides will also be available in the attachments tab located below the player and thanks again, and we will see you next time. Thank you.