March 7th, 2022
Webinar Transcript:
Well, good afternoon. Good morning, everyone. Thanks for taking the time and attending this webinar. And so hopefully in the next 40 minutes or so, you will take away a few learning, which is, what is governance? What is how is that going to help you as part of your business, business tribal? And what are the tactical steps that you can consider from your planning standpoint, to get something going? So that’s all about for today’s learning, hopefully, we’ll find that useful. Okay. So in terms of the agenda, we want to start with the definition of governance, and what it is it could mean different things to different people. And we want to kind of set the state’s what is the definition? And we’ll get to more around? How the industry has evolved around governance, what are the modern governance capabilities? When we talk about rolling out governance programs, and technologies? How do you measure it? What is the success sort of working metrics, and then we will get to more around the automation when you are getting to a point where you, you know, you’re really doing very well, how do you scale to the next level? Should you? What are the factors or parameters, that should be a consideration to scale your governance program? And then the also, I’ll share some budgetary parameters that you will take into consideration to build a business case. If you haven’t started governance programs, how should we, you know, really put things together to get approval from your stakeholders? So these are all the agendas that we will cover today?
Definition of Data Governance
Okay, so let’s start with the definition. What is the governance, it is really an integrator, an integral part of your data management, its overall program, but it has to be pretty integrated with your data management initiative, or data platform initiative, what you’re trying to accomplish, for your business with a very focus on building the trust of the data, that is the keyword. And we’ll talk about what is trust? How do you sort of measure that, and build the trust, making sure your data sets are available for your consumers on time, and then accessibility, how quickly they can get access. And also, the last one is compliance when making sure that as we are building the data ecosystem for your organization, how you should sort of being part of compliance will be part of the equation. So that’s sort of the definition of what governance means. And it hasn’t necessarily it hasn’t changed over the last two decades or so may be slightly changed here and there. But what has really changed is how it has been perceived in the organization. And we see this in our customers, a lot of customers as we work with them is governance. In historical perspective is used more often as a negative connotation when it comes to creating the business value in the context of overall data, ecosystem, analytics, AI ml, engineering is front and center of those things, but it’s never governance until something breaks until something goes wrong. So what we are finding is this sort of perspective and the perception of that particular function, how it has been used, it’s created a lot of confusion. And so it hasn’t been really front and center of your the poor have data management for to support your business. So some of the things we’ll talk about are how it really started. So let’s start with the protection that has been used as a lets us build the bring the data to build the database or build that in our data lake house and then protection can happen later somebody will come in will do the protection or do the security later. Really in reality that doesn’t happen by the lack of role clarity, which is there is no specific function defined to handle this job. It is always I’m also for example stewardship is defined as the people or the person who is doing the day job and then they have been asked to do that a governance when They have found, so there is really a lack of clarity on their role and as well as incentive and skills. The third one is we find is, is perceived as a hindrance to progress. Because when things are not clear when, when people or organizations don’t have clarity of what it is how to implement, then of course, it’s going to be in hindrance. And that’s been used as a silo business unit, they’re not necessarily aligned to the business people or business functions or line of business, and to drive those initiatives and spin up also very painful to implement, which is when the charter of governance is not clear, and how it enables the business, it’s, then, of course, it’s going to be a pain to implement. So there is a lot of education required as part of this exercise. And hopefully, you learn some of it today in terms of how you should go ahead and get started in educating your internal stakeholders internal, your environment, with certain data points, so that people are eager to listen to you people are paying attention to you to build a business case.
The journey of Data Governance
Now, what has changed over the last few years? Why is it now being used as a very, it’s important function? So the two aspects to that one is, from an industry trend standpoint, that data breaches are happening all the time more frequently than you can imagine. The second one is the compliance with governance, like GDPR, CCPA, every single state within us, they are coming up with its own privacy rules. So there is it’ll get to a point where it’s absolutely not practical for you to keep up with all those things. But there is a lot of commonality among those industry compliance rules as well. The third one is, as part of the COVID Digital Transformation organisms are asked to deliver a new revenue stream, whether it’s the form of data products or services, or doling out certain, you know, new business pipeline. And that is they’re being asked to do while streamlining their existing operations. Right. So that’s more on the industry. from a technology standpoint, almost all of you or most of the companies already have a footprint on the cloud. And migrating to the cloud is really making things more silos in terms of data sets, data sets, where things are, there has, there’s never been any better time to find a point solution than today, which is, a lot of those things are really driven by innovation by open source, which is fantastic. But also, when you are thinking about building a program, building a cohesive solution, if you start to think about the only technology and point solution, you’ll end up those points solution not working together and used by one or two people. And after some point, it’s not being used at all, and those points solutions, never, you know, it works great from a POC or from a learning standpoint, but really want me to start with whether we should you should be thinking about going all in in in a big platform or best braid, but also the discussion should be is that what is your start and end point boundaries? And if you define that boundary, well, then you can figure out what are the really best solution or platform that you need.
Another observation that we have is building the leveraging assets in the machine learning or AI models. And a lot of organizations are starting to pay attention, what is the responsible way to build those models, there was a period of time when you have the data sets available, and you can use it however you want. But the reality has been that the output of those models drives the business decisions, whether to provide a loan or provide to a particular admission. Right? That has been like your ethnicity, your gender has been used to drive those models, output, and bias and that has been used as scrutiny as well as now what do you do about it? Whether How do you prevent some of those sensitive attributes, making sure that they’re not used as a bias. So there we’ll talk about how do you what sort of steps you need to take into consideration and the last one is, you know, anything that you want to do in terms of in a successful and efficient way you need expertise, you need subject matter expertise to drive it to implement and this age where finding a good technical resource is difficult. And with great resignation is making things even more challenging for companies so. So that sort of is the shift that is happening with this shift how it should be should start thinking about, we’ll get to that as a next topic.
The four areas of modern data governance
Now, how do we really define modern data governance capabilities? What should that encapsulate? So let’s start with defining it. So from our from a learning standpoint, we define that as four areas. One is the data discovery and visibility, what is the optimal way, making sure that you can discover the data that’s an application that your business is using, and having visibility of the entire pipeline or entire ecosystem of where the data is coming from how it is being used, how it is being integral to the business users, whether the actual business users are using or not using, so that is one area. And what we’re finding is, if you don’t define that well, and, you know, doing the right way, or implement the right way, you end up spending a lot of time, by the high-end resources, like your analyst, data scientist spending, you know, 60 to 70% of the time doing the cleansing and the data engineer. The second one is building trust and transparency. How do you? How do you really measure trust? What is how do you quantify it. And what we’re finding is with our research, partners who put the data versity 80% of the organization still considered data quality is there is their primary focus. Now, data quality, honestly, speaking, hasn’t changed, since organizations have started to build the database or data mart like going back to a decade or two decades ago. But what has changed is those data qualities have been part of more often your pipeline process was never visible to the business users. And what is the ask here is defined define your quality rules once and make sure that your pipeline tools is leveraging those rules. And at the end of that process, we will execute those rules making sure that it’s done correctly. So that you can measure Sunfest as the quality score to all your constituents, all your data citizens. That is, so that have still been a lot of gaps today. Number three is, you know when once the data set is available, making sure that it is used for for the right purpose in context to your business use case or making it available to your application downstream applications. And in in traditional way, if you think about it is you know, a lot of companies have been successful, building a very nice ecosystem and connecting to the analytics tool. But we are what we’re finding is the time it takes for making the data available to those tools, it still significantly is if if not months, it’s weeks. And when they get the data set, then they have questions. Well, that’s not what I was asking. I want to go back and ask some more information. So there is a lot of back and forth still happens.
And then the last one is the protection of your assets, making sure that there is a systematic way the policies, whether it’s a privacy policy access policies, you know, different types of policies are defined once and but implemented in a systematic way in the software in the tool. So that it’s you know, the individual is not managing on a day to day basis. So, for us, this is sort of a very simple way to define what you should look for in modern data governance, tools and applications.
Getting started with data governance
Okay. Now, how do you really get started, depending on where you are in your, the maturity curve in your organization in the governance, it really depends on whether you already have the program’s already invested in tools are fairly mature but you want to kind of you know, fill the gap and that’s absolutely okay. But if you’re really starting something from scratch, this is sort of built to hopefully will help you the first one is to start with defining your working metrics very
Often, what we find is that our customers, ask what’s the best practices, that’s great best practices will certainly help you. But if you want may not help you for your own business, for your own function. So that’s why the working metrics are more important to define, maybe three or four of them. Um, and we’ll talk about what those metrics are in a minute. Then the second one is establishing the right culture, which is, you know, most companies want to be data-driven, which is fantastic. And they are putting a lot of effort and hiring the roles like governance, head of governance or chief data officer, which is a great start. But when you start to execute in a tactical way, building the culture of data culture is very important, one of the important functions is surfacing the data problems and sharing that with everyone, it should be considered as a success is not considered as a red flag. Because without knowing what you have, there is no way you can improve it. So that’s the first one.
The second one is governance doesn’t work in silos. And that has been the problem we talked about. But if it is part of Core Data Management, then it has to work with that engineering team, that a scientist in your InfoSec it. So these five different functions has to come together with a very clean, clear understanding of your words, what should be the roles and responsibility and the recipients? Okay.
And the third one is, as you’re starting building a program or you are already built it, but you want to start to improve it, do you have the right execution framework? There are a lot of framework is being used by different companies for the industry, that at its core is their standard frameworks are available, and we’ll talk about what they are. So that’ll help you to understand, Okay, now that my program is structure, what do I do? What’s my program? Chapters should be what’s How do I go about executing a test? One or two tests, use cases, and build a project plan, right? So that framework will help you to define that. And the last one is, of course, starts with small and with few use cases and measure the success and celebrates the success. Okay. So, let’s, let’s start with defining these metrics. For each of those governance capabilities, we talked about, for data discovery and visibility. I think if you talk to your data citizens, and we are using that as citizens in, in the context of people who use your data, whether it’s data scientists, analysts, however you call your title, but let’s call that as a generic term as a data citizens, how much time it is taking for them to find a data set.
I think that’s easy to find out if you talk to some of those people. And if it is weeks, if not months, then it’s that’s a lot of time in weeks, a lot of things tenses. And the second one is for your new if you are if your business team is asking for a new use case, how long it is taking to bring that data set in integrating with your existing ecosystem. And sort of there are different activity happens when such use cases that are being executed in terms of learning about what the system has? And how what sort of do you want to bring everything is only a subset of it? What’s your state of data set? So there is a lot of activity happens. And we’ll talk about that. So that’s the two metrics. In the first category, the second one is on trust and transparency. Let’s say you want to have a task or as far as to be to have a trust score of 80%. Should you go with 80% With all the six dimensions that are outlined on the screen. It’s up to you. But what we see here is you can start with very simple, like completeness or timeliness. Right? I think those things are important. As long as you understand the data of European Union. It is timely updated. And it’s unique, then I think a lot of data integrity problems outside there. So start there and then expand into other dimensions. Number three is those metrics, how many days it’s taking today for your data citizens, between the request to getting that access to the data in the request, and in wider underscore the request As a cure is, in a lot of times, we find that citizens have requested access, they are told, okay, now the data is available in a cloud bucket, or, and now those citizens, those consumers would go into the tool where they can read the data from those cloud packets directly, or they have to figure out another tool to do that. And that’s adds latency, right in terms of consuming the data.
So the important thing here is, if What if you have a tool that allows those consumers to see what you have, request, and then at the end, once the request is fulfilled, they can get the data we can the tool they’re using, I think that’s important so that they don’t have to do any other additional technical work, they should start getting their data set and approved data set and start doing the functions were supposed to do. Number two is how many active assets you have. And active definition. Here we are, you know, we would like to share some caveats, this is your definition of action should be at the consumption level, meaning if you’re bringing in managing a lot of data assets, and they’re being provisioned to users, or for them to query, don’t measure, by the number of reports or number of models that are being created, it’s it’s wrong. In our opinion, it’s the wrong metric. What’s really want to spare to do that is are those reports being used every day is anyone opening it, when somebody opens those reports, you will capture the underlying the users who’s really worrying those set of cables or wedding sort of some of those models. So if you have that level of consumption data available, that’s what is really going to help you to understand through the meaning of active assets. So that’s, that’s the second metric. And the third one is more around the incidents that you can mitigate. Right? So I’m hoping that some of you are that have been in the industry for a while, you’ve come across some point in your career, where you build beautiful data pipelines that a process and reports are being used. And one fine day, business users start to scream at you to say, hey, my report is blank.
This particular metric, what I was seeing is not available or is down by 10%, whatever it may be. And now that creates firefight, in terms of you know, what happened, and you spent a few days and you start to do, you know, fix the issue, go to an RCA, you spend so much time to as a post mortem process, but what if, what if you can avoid that, meaning, if you are already having a visibility of your entire data, pipeline and process and something changes in the source you’re using within your overall lineage or observability, you detect that immediately and prevent it for the report to go out, I think you can save a lot of time and energy so that that incident can be avoided by if you have the right tools and process and people to make sure that they understand why they’re simple. And then the last one is, even if you sort of build the process and build the tools in the right way. We sort of talked about making sure that sensitive attributes are protected there pseudonymized. But again, sometimes people do forget all systems do fail. How do you prevent making prevent of the sensitive attributes and not being consumed by an application? So there is has to be a systematic way to auto-detect that and then correct it. So that’s the first one. And the second one is, if you are having implemented a policy engine, how many of those policies are covered by how many the number of your data sets, that’s another metric. So you know, there are probably about eight metrics here in to an average and each each area. You don’t have to start all eight. It really depends on your organization, maturity in terms of where you are, but start with a focus area of what problem you want to solve. And very recently, for our banking customer, we started information to us, that’s the area that and we start to capture the number of days the first metrics so and some of the customers that are really starting to build the data ecosystem to start with number one So, hopefully this gives you some tactical way for us to, for you to have a defined your working metrics and, and build your own metrics, what’s relevant to you and to your business. Okay.
The second one is kind of building the culture, we talked about it, which is making sure that there is mutual success. And across those functions, the governance which is the role likes data stewardship is the role that’s primarily the author of your governance or implementation of your governance rules. And is that role work with your interviewers that engineering data scientist, it InfoSec. So this gives you a racy model of what should be the shared understanding and makes sure that so for an exam, it can bring define your infrastructure, footprint, perimeter policy and security. Whereas an InfoSec defines what those do your data access policies, but the data steward is really implements that. So one, one of the important point here is while this, you’re developing the AC, you know, one of the things that we see here is don’t go by building 100 line item, they see nobody will look at it, go it between 10 to 15. I think that’s really the right number for you to start doing something simple. And people who are actually going to do the work, make sure that they are they have full clarity and align to this in addition to your, to their the executive from from each department. And then the last one is, there is when you’re building the data culture, the there is nothing called 100% data quality. And if you will start with having a benchmark, get the benchmark of what is your true score? And what is your error percentage meaning?, what percent is the error that you can still live with, or business can live with, to execute the test case? I think that’s important, that gives you a context of defining your trust. Okay.
So, let’s focus now to more of the execution framework now that we talked about what metrics we want to capture, what’s the culture? But how do we how do you get started with, you know, a proven framework that has been that we practice here in Zaloni. Well, as we see, this one is embraced by the interest industry as well, what we call as the data ops governance cycle, good data governance is both on the left side, right side, left side is more around bringing the data making it available, right side is more around consuming the data.
So as the data comes from, from the top, it goes through profiling, and in, you know, measuring the state of the quality, and running the classification, building the lineage and adding metadata, business metadata, which is if you want to sort of describe your class, your assets as a classification, ISO, 11 cm, and sensitivity, a lot of business metadata is has to be captured there. So So on the left side is more around. So making sure that are available. So that right side, when you start to bring consumers like data citizens coming to the to your platform, they know what data is available, so that they can request access in in the marketplace type of fashion. Provisioning is all about getting them the access at a granular level, make sure that within their tool is important here. And as they start to leverage and use it, consume it, bring the metadata from those tools back into the left side on the Linnaeus field so that you can you’ll have a true consumption of contract, whether it’s being used and how frequently that has been used. So that’s kind of gives you a end to end observability of your data ecosystem.
Now, the continuous integration and the development testing, it very much follows, like your standard type of cycle, each of this function that we talked about what that’s essentially it’s a code. And as you as you start to build it on your lower environments start to migrate to the higher environment to make sure that each of the things can be generated as a is a code base in your repo and it’s part of your CI CD process to move those changes to the higher environment as part of your Change Management. Right? So that’s the idea here. Okay. Now, this is to give you the when the rubber meets the road, which is now that now you are ready to start with developing a program structure or a project plan. Where do you want to how you want to start. So hopefully this kind of gives you a take this information and build that into more of a milestone right. Now, there are three functions that has to be working in an interconnected consensus, which is your data stewards in engineers and data scientists, we sort of talked about the RACI, what they’re supposed to do. So the first one really starts with the data steward, making sure that as you are defining around to bring certain number of use cases, we will talk about the functional parameters or use case of, of what to what to consider to really, so that you know, you’re not really spending all the time to your profile or your sources, but focus on the set of systems, cataloging those source systems, and bringing the technical metadata, understanding what it offers, within you know, few minutes, I think that’s just going to help us expedite a lot of information that you can bring and understand how what is the state of the data that you plan to bring in in the future. The second one is more of running profile data quality we sort of talked about in the, the, on the previous slide. So that output really feeds into more of a functional requirements to your engineering team, for that engineering team, where they take that and start to build more of a logical data model and build a data pipeline. And one of the important from some theories as you are you have data stewards have defined that equality, those quality rules has to be integrated with the data pipeline tool so that they are the engineers not inventing those rules again, because that has been is the disconnect in the past. So building those pipelines and building the transformations and optimizing the pipeline, compute 30 tons of work happens there. So that’s more on the transformation ETL, the third bucket here certifying the data, we will we see that it doesn’t happen at all, except when organisms have some governance functions available, or they’re starting to do it.
Now, why is it important? When you start to certify the data, what it means is you’re really defining a Business Glossary of how the data should be cataloged and not catalogs. So has to be organized so that business users can understand it, right? What is the definition of a particular attribute, and having that grocery defined, and it’s, it will save you a lot of time making sure that you’re not doing in a traditional way of doing, you know, mapping exercise in a spreadsheet or Google sheet and then realize that after one week, everything is obsolete, right? Because you are not building it for one time, we are constantly adding different sources, updating the metadata.
So getting that Business Glossary, and making sure that pseudonymising happens as part of your classification, and then approving the data glossary that okay, this is ready to go. So I think that is a critical function of preventing a particular or set of attributes not being used as its raw format, in the analytic model. Right. So then moving to the next one is taking the data provisioning, it’s it’s an a system functionality, honestly, there is no work needed here. Where are your consumers what you’re calling a data citizens coming in and looking at what are the certified data available? What are the certified closest circle ago and read the taxonomy of definitional, where was the source how it is being derived? At that point, if you have all that element, capabilities available, then you’re reducing lot of back and forth downstream where people already know what they’re asking and they know what to expect from the data rather than getting the data and then figuring out what they were looking for is not available. So so that’s the aspect of provisioning where they can look at it, look for your, the request access and the question access can be sort of integrated with any of the your organization to go to and then now they will get that data what they’re looking for. Once they have the information available in their tool, they will start to create the magic, whether it’s working with the business users, making sure those model they’re creating or the tasks that they’re creating the KPI they’re creating, it’s really meeting their goals, and then automating it. And, and then the, the last point is more around systematic way of once it is production analyzed, right, it’s now it’s set of reports, or dashboards are ready to go into production or models or integral products. And then you start to capture the consumption metadata as part of your observability.
So to address the problem of what we talked about earlier, which is making sure that you are truly understanding your active assets, right. So why that is important is because when you are asked an audit or ask for a question about where is my data set is being used, you have it’s the information at your fingertips, your application migration, it course gets much more faster, right, you understand what’s truly being used, and there is a cost optimization aspect of it as well, which we’ll talk about. Okay. So this is sort of to give you a milestone driven approach of building your plan, and create those milestone to an assign to those resources, what the each of the are going to be responsible of executing dismissals. Okay. So the time spent that you see here, just to provide some context is, if you don’t in right, where you’re planning, right, where you will invest in the right tools, that’s an ideal sort of time, each of the functions should spend. But what we are seeing here is, if you don’t do it the right way, then in one of the function, you spend a lot of time but we are not designed to do, which is primarily what we see is that a scientist spent a lot of time in that wrangling exercise, and because none of those things was done before. So we would encourage for you to use that as a reference point to as you mature, as you get to a point where are we really spending that much time you need? So the idea, if not what is what is the bottleneck and how you can improve? So this one, we wanted to give you some execution framework details. Now that we have the plan, how do you go about defining your use case? What should be your prior priority? It really, if you think about your use cases, on the right side, let’s say it’s, you know, there are certain use cases Or there are data monetization, or risk or line of business functionality. Start there. And define how do you classify what those requirements are? And how do you classify? Let’s say, the data monetization because it’s a new revenue stream is your number one, then use those three, your age of data, economic value, and timeline will be your classified classification sort of priority, and link that to your left side, which is a functional requirements of bringing your data from different sources only building a data catalog, right? And as you go through different other use cases in the future, you will certainly start look for how do you catalog your reports, machine learning models, images, like PDF and other documents, in lot of cases, for our banking and other like pharmaceutical clients, you will be required to tag parts of the information from PDF Word documents. So what type of those that asset is you start to define the data catalog? And then the next one is, once you have the catalog? How do you really structure that? What is your domain model? What is your topic? You know, depending on your use case, you could start with your data domain within the line of business. And within your data domain, you define what are the functional area, we could call that as a topic. So start to define that logical structure. And then making sure that you as you’re defining the glossary, it’s aligning to that logical structure. Right. So what we’re finding is some of the customers that we have worked with this they they have this in a nice spreadsheet, right? They have been that is how they start which is fine is because if it is small, but as you scale it you start to leverage, you know, support this InfoSec or new revenue stream all those use cases, there is no way you can manage that in a spreadsheet. So you need a tool, which can intake all the things that you have in the spreadsheet and build that You know, in that logical structure that we just talked about, and then sort of, from a functionality standpoint, make sure as you’re developing, it’s tagged correctly, it is searchable. And so a lot of those functional requirements you can add. So essentially, you can take your use case from your right side, build the functional requirements on the left, and then feed that into your plan that we talked about in the in the last slide, okay. So, that should give you a good execution framework for you to get start to to the next level of integration. So, we want to sort of cover in terms of scale.
So, the first and set first couple of use cases, you are fine to do that, manually if you have one or two resources, but if you are bringing four or five use cases where you you are dealing with 1000s of cables on entities and files, how do you there is no human way or practical way you can do all these things? And should you add more people, or should you really start to think about the tool which can really help you to scale. So the and that is what we want to share with you in terms of different areas where automation and machine learning is will play a key role. The first one is in it, it is going to evolve, it is going to change, make sure that everything that’s deployed, everything that’s provision is used as infrastructure as a code, that is everything has to be aligned that way InfoSec, as the policies are defined, those policies should be categorized and implemented in a in your policy engine. Right. So that’s an those policy engine, if you already have a policy engine, then how it should interoperate with your governance and ETL to make sure that happens otherwise, then you will have a policy engine separately, which doesn’t talk to your pipeline and governance. And in the end, at the end, you’re not going to get that it’s also what you’re looking for.
Catalog, we sort of talked about classification, I think that’s an important area where as you’re bringing in 1000s, of hundreds of data sets. Make sure making sure that you have an automated way of machine learning within classification that tells you whether there are certain PII data or or other non sensitive data, and also help you to develop that logical relevancy there is, there is no way you can query each table or each data set, or look at, and then figure out what should be a relationship between those entities. So between those tables, so getting that 70 or 80% recommendation, and as an user approving it, I think that’s going to streamline a lot of your effort. Okay, the automating the rules and pipeline, we already talked about making sure that rules are defined once they’re not duplicated, but those rules are federated through an API or to a library in the ETL tool, if you are having any separate EQ. Then as once you have a classified data, making sure that the PII as part of your policy are automatically pseudonymising Those your sort of classified data, so that and then making sure that you have a systematic way you’re capturing if something has gone wrong, mean, you know, those attributes are not available in your end and users. And the last one is the context while source right, so everything isn’t source based. And if I’m the you know, you are the user, you have been working on a particular line of business or domain, you’re searching, what is available, then the sorts of exalt ranking should be contextual, based on what you have done in the past, not necessarily just serving you all the results that are related to that particular string, right. So these are all the areas where you could start thinking about automation, and interpret the different tools and the some of the machine learning where it can help you to scale your governance program. Okay.
I think this is my last slide. So the last one that I have here is if you’re starting something soon, what are the budgetary parameters should be required? And it comes down to how do you what are the sort of criteria that you can put that into a sheet and start to put numbers in that? The first one is, is it really part of your data, as it’s part of your overall data ecosystem program? What is it’s meant to do at the end of the day, why are we why are you doing this? So is it a new revenue? associated with it our new product, what is your anticipated revenue for next six months or one year, whatever it may be. So capture that.
The second one is more of an optimization, cost optimization or processing efficiency, so that it will give you three or four different parameters, which is, the first one is, if you are already defining those metrics and have a benchmark, then if if you’re fast tracking some of the data delivery so that it could get the product out sooner, I think that definitely gives you the optimized some, like a number of days, or how much and then you can translate to your cost depending on the number of people, the cost of, you know, hire and all that stuff. So that’s the optimization. on the people side, you know, making sure that you bring the right talent and skill to your team or that you are already having it. But one area that we always recommend is once a year or so, or what, you know, I think that’s probably a good cadence to bring someone else, talk to your partners that you are working with or industry experts, making sure that you are still on the right path, or you are not missing any industry trend. So industry expertise is very important. And the last one is on the technology front edge, you are planning what tools and platforms you need.
Summary
You know, of course, it’s a SAS or whatever platform is service, what model you go depending on the business, the license costs, and there is an element of implementation for some degree. And those infrastructures that are part of the technology. And then from a saving standpoint is once you have things in place, and you understand, okay, I have been bringing all these data sets, I’m consuming tons of money for my cloud cost. Now I realize that a lot of this assets, that processing can be turned off because we are realizing that either people have moved on or they’re not using it. So it gives you a lot of license optimization and infrastructure savings, which is a big component for any any data ecosystem. So this is these are all the different considerations as you’re putting numbers to each of the parameters. And of course, there may be other things that you can consider but this should give give you a good starting point for you to start planning your budget. Okay. I think that’s all I have and thank you for the opportunity to share some of our experience, and hopefully, you find that this is useful for building a business case.