June 22nd, 2017
You know GDPR is coming. And with it are substantial penalties for noncompliance. What do you need to do to ensure that you are ready?
The General Data Protection Regulation (GDPR) is a European Union regulation set to go into effect May 25, 2018. This regulation requires that you strengthen data protection and management technologies and practices if you do business in the EU, have employees or customers that are EU citizens, or otherwise store or access data about European Union citizens. Among other things, GDPR addresses how personal data can be exported, the right for a citizen to control and delete their own personal data, data protection requirements and how data breaches are to be treated and a variety of other data and process-related rules and standards.
In this webinar, Kelly Schupp, Vice President of Marketing at Zaloni, will discuss where GDPR sits in the world of big data, overall data lake strategies that help with compliance, and how metadata management is key to that strategy.
Topics covered:
– Metadata management
– GDPR compliance and best practices
– GDPR technologies
– Data lake governance
Ready to start lowering your data risk and increase your GDPR compliance? Get your demo of Arena now!
Read the webinar transcript here:
[Brett]: Hello everyone, thank you for joining today’s webinar, the GDPR compliance data management practices for success. My name is Brett Carpenter and I am the marketing strategist here is zaloni. I’d like to introduce Kelly, Zaloni’s Vice president of marketing and Scott Gidley our Vice President of Product is today’s presenters. We left time for questions at the end of the presentation. So don’t hesitate to ask them using the Ask a question box below the presentation. And with that I’ll turn it over to Kelly.
[Kelly Schupp] All right, thank you Brett And hello everyone, thank you for joining today. Alright let’s quickly look at the agenda. Now I want to do is for the next half hour I have two goals. The first thing I wanted to do is provide as concise an overview of GDPR as possible. Talk a little bit about the basics, breaking down some of the key components. And then the second goal will be to convey this dwelling perspective on what a company should do what it needs to do to set itself up for success with GDPR compliance and from our point of view in the process. you’re also going to set yourself up for future data governance initiatives as well. So let’s go back into some of these basics first.
So, the GDPR, the general data protection regulation. The new European Union law on data protection and it replaces the EU Data Protection Directive is going to be implemented by every individual country in the EU, and yes, this does include the United Kingdom. Now it is at its core really focused on protecting personal information of its citizens right, all about giving the control of your own personal data back to you. And every business in the world is going to have to that actually handles personal data of the citizen is going to have to comply with those. And that’s a big difference from the previous data protection laws. Those are just applied to companies or countries working in the EU. And now this is going to apply to everyone. So, the implementation deadline is going to be May 25, 2018. And the question that a company might ask is, do I have to be compliant by May 2018 or just have my plan in place, and the regulation does require full compliance. It is passed to may 2016 to giving folks, two years to comply. But as it stands today, we are getting to the point where the deadline is less than a year old.
Talk a little bit about penalties. So, non compliance is a scary thing with GDPR unless there’s an extension of the companies that are going to find penalties are much steeper than they were with any previous data initiative. So, as stated in the regulation the non compliance penalties are either going to be 4% annual global revenue or euro 20 million, whichever is greater. So at the bottom of this chart to show some of the logos of companies that have been impacted by the Data Protection Directive right the predecessor, they’ll notice a wide variety of industries. And I think a really interesting study can put this in perspective for folks so that they can see how much more severe This is going to be. And also, even more frightening when you understand that the requirements are much more draconian. So this study. The study looked at a fire against British companies that were under the Data Protection Directive, and put in place the idea that what if the GDPR is in place them in 2016. And what they found that was that instead of 880,000 pounds, under the Data Protection Directive, they would have been looking at finances under 69 billion.
There’s another report that’s out there that forecasts in the European banks could end up paying 4.7 billion in fines over the first three years after GDPR goes into effect. So we know that we get something. And there’s a lot to understand.
So let’s look at this by breaking down some of the three major areas. And these are a little bit arbitrary. The, the section that we’ve pulled together here, but I think it does help to bucket them to grasp individual rights. The protection requirements, and also process requirements. Right, so concerned right to be forgotten, a normalization profiling under protection we have breach. Security cross border transfer, and then their process, looking at the roles and responsibilities. Addressing vendor management and addressing conduct and certification.
Let’s go ahead and start with the individual rights. The main thing here is all about consent, and after the explicit consent, it has to be opt in, not opt out. So, you can’t have a default often anymore you cannot check the box for them they have to check it themselves. And it must be really easy to withdraw that consent. Once the individual change their mind. And of course on children these restrictions are going to be even more severe and will require consult from an adult Guardian individual rights. This also includes the right to be forgotten. So let’s say I change my mind. No idea to stop tracking me but you also have to delete all the information you have about me on your system. And this one’s going to be quite a burden. There’s also the need to make sure that the individual understands and can restrict the use of their data, you have to be very transparent. You have to be much more granular about what you want to do with their data so that they can explicitly indicate what they agreed can and cannot do. And less provide people, much more access to their information, the ability to actually correct their information. And then there’s going to be certain things you just cannot do with that data certain profiling, for example segmentation cannot afford this. If the information identifies the individual. So this is definitely going to restrict certain marketing campaigns, and all of the analytic algorithms that they’re going to put in place you’re going to have to address. Finally, any personal data must be anonymized before it is physically stored. If there’s no consent. If you have that explicit consent, then you don’t have to run online. All right. Move under protection. So, personal data breach is going to have to be recorded within a 72 hour reporting period, and this is going to have to be reported to the authorities but also, it’s going to have to be recorded to the affected individuals, and there’s pretty strict guidelines that how and what needs to be recorded as well, folks need to really make sure they’re going under that. And regarding data security, your technology and your processing are going to have to have the ability to ensure confidentiality of personal data, the directive recommends that you encrypt and anonymize data. There’s also new restrictions in GDPR regarding cross border transfer data with contactless picking. You can only transfer data to countries, or what they determined is territory’s GDPR adequacy designations are going to be some explicitly approved cases where an individual company will be certified as GDPR compliant, in which case you can transfer across borders in only other ways that they can transfer transfer. You’ve already received explicit consent to do that.
Okay. So let’s go ahead and dig a little bit to everybody’s favorite bucket which is process. So when it comes to roles or responsibility GDPR defines roles, the first role is called controllers. So controllers are in charge of the security of the data, they’re also in charge of determining the connection process the data. And an example of a typical controller would be someone in the IT department, someone that can do organization. And then the second group are processors, instead of actually working with the data. So an example might be anybody who’s in marketing, it’s so important to know that a processor cannot actually send contracts and processing to any other vendor. Also if you support a GDPR compliant organization. Right, so if you are a cloud provider, a software provider. If you’re an outsourcer. Even if you don’t have a direct relationship with your customers. If you’re actually storing the data, not to comply with this law. Another key requirement is that if you have 10 people in the organization. You must now appoint a data protection officer, and the individual has to have true 40, as it relates to the data and also some level of independence from other parts of the organization, a lot of averages.
Okay, so we’ve gotten around the band pegs and breaking it down into some of the key components alone and talking about what it is you actually need to do to address this and then we’ll get into some of the technology considerations. And when it comes down to things you need to be able to talk about in two ways. The first is I want to talk a little bit about process. And then the second will be a little bit about the data itself. So with regards to process. First, as I mentioned in the last slide if you have 10 or more people in your company must appoint a data protection officer. You’ve got to adhere to a code of conduct that shows how you will be compliant. And you must get explicit consent before moving any data indicator insist that third party data is compliant. And if there’s any question about about consent, you’ve got to mask that data. You also need to establish and provide policies to customers, that they can remain on. And then lastly, make sure you get a process in place. So you can quickly within that 72 hour process, address, or window address any breaches that occur. All right, so I’ll talk a little bit about the data.
First of all, identify all sources of personal data. You’ve got to triage all of your existing data. You have to get consent from all current individuals you’re storing the data of and ensure that you’re doing that moving forward. You have to identify what personal data actually exists in organizations today. And you need to establish an anonymizing process. And it really should consider a single view approach. So basically it is the master data view of all the personal data. And we find a lot of organizations are starting to look at data lakes because there may well end up wanting to establish your data governance and quality programs.
You’ve got to establish that right to be forgotten process. And you must restrict your access to only approved processors. Now the. Now that you know what to do. You have to ask the question, how are we going to make this happen. How are we going to actually do. And the technology considerations are pretty onerous.
The Zaloni perspective on this would be around implementing a governance data lake that we mentioned before a lot of companies are thinking about data lakes today for the. They’re great for GDPR compliance, because they’re going to give you that central point store, if you do it right that central point of governance. The problem with many data lakes today. So many, many organizations implemented some level of Hadoop instantiation or something of that nature in order to do either some inexpensive storage or perhaps to do some small sandbox analytic projects. But the problem is that these aren’t going to check the trade governance box. And the GDPR compliance key is really around data compliance right and that’s the enterprise-wide. The company is looking to work with you they have to know where it came from, they have to choose for you, economics, keep it private, and without a comprehensive standard process that can afford across all the applications. It’s pretty clear that GDPR compliance initiatives are going to be doomed. i think it’ll make reference architecture like the one we’re showing here that has a sense of the fact that there are ways, not to architect a data lake that can provide that opportunity for you to put good governance in place. And also provide products and solutions and platforms that can take care of a lot of network for you.
So, I’m gonna talk a little bit more about this I wanted to show you guys a data lake maturity model that we developed. So we find there’s gonna be several stages and maturity in an organization is going to go through, as it implements next generation data architecture, and usually this is going to include data like this maturity model which shows you the stages and some of the characteristics that organizations are moving through when they’re doing this. Many organizations I would probably say most with today to stage one, and this is what we call the ignore stage, and what they’re doing is they’re leveraging their limited but reliable data warehouse technology, they built these over the decades to work for them. And now things are changing. In addition to the fact that we need much more agile data platform in a data warehouse can provide. In addition, things like GDPR are going to really start to break those old architectures.
So the second level or stage which there’s a lot of organizations out there today, who have become too, to work in this stage, we call it store and workbooks are experimenting with Hadoop data lake technologies. Usually it’s for some level of inexpensive storage or to do some level of siloed analytical insights sandbox. And the problem would be that you don’t have a good handle of what’s really in there, particularly if you want to grow it is very dangerous. So we often call these to be a small. And it’s really at the next stage, the governance stage. That allows folks to begin to comply with directives such as GDPR. And this is where you’re going to have full visibility into the data, you’re gonna have full visibility into its lineage, you can apply security and privacy, you can implement rules and workflows and address quality, and address race to store the data, how do you access the data and government data lake is going to set you up to more advanced environments in the future so if you look at age four and five on this chart. It’s the automate and optimize. Very few companies that have made it there today. Very few. Maybe some of the incredibly data rich tech companies that we know today, such as Google. You know perhaps they’re implementing some of these automated and optimized technologies. And the bottom line is the inflection point when it comes to maturity model is really a govern. Once you get to govern, whether you are in store today or whether you’re leapfrogging from ignore once you get there you’ve got the foundation to then truly innovate and truly automate and then optimize is going to be really key for GDPR compliance, because this is very taxing activity. And there’s a lot of it, and to actually accomplish GDPR compliance without some level of automation operationalization can be very taxing for today’s companies. So if you’re looking at this middle point, the governed data like what’s really worth learning comes into play for companies.
Right. Our platforms and our approach to governed data lake is holistic. And we look at it in three categories, and we’ve got this broken out into sort of three main buckets.
The first is on building the data lake right this is around ingesting data ingesting it and ordinates way into the data lake. And we’re not just concerned with cataloging the data or registering the data we’re also concerned, very much concerned with cataloging registry metadata, because at the core of a well governed data like infinitive data management, right, this is what allows everything else. He can assist an organization, both by providing huge congestion that’s required, but also if you’ve already built a data lake we do provide an automated discovery capability that can help you go in with the only institution today and get that information tagged in catalog. So that’s what is done that moves you into your second phase, which is what we call common data right this is the labeling tab data lineage critical data quality applied data privacy and security. And also manage the lifecycle of the data. And then finally, once we’ve done a good job of getting that data, organized and governed. Then it’s about exposing the data to business, but to do it in a controlled fashion, so that the business needs access. In fact, it’s right size to their needs, and can be compliant with an instrument such as GDPR. And literally, we work with, by providing a data catalog for the big international agency in the lake and around the lake. And we also provide a certain amount of self service data preparation capabilities.
So now what I want to do. Given just a high level view of what it is that we provide for an organization, I want to bring it back into the context of what it can do for our company GDPR. So we talked about enabling that perception. So what we can do, are an organization that’s looking at GDPR compliance on the grass and supports a wide variety of data files and it’s structured unstructured construction in the data lake trading at a batch and a converged platform, would not be secure that from the point of view of governance and data architecture that our products and support will enable segmentation of the data. According to data attributes, as well as individual preferences. So this is going to be easier for an organization that is looking at a metadata emphasis catalog that we provide to continue to visibility into the data community the tracking mechanisms. Often, apply various governance that you want across the data, as well as the secure access control. They provide audit and control logs. We provide lineage, as well as impact analysis, also implement masking, and anomaly detection of the data or the tool to the delivery platform to kind of find the roles and capabilities, both for controllers as well as for processors, and a well governed data lake is also going to provide the data quality stewardship and governance.
Finally, with regards to engage better be able to maintain that catalog for access and collaboration which everybody wants to do run the business. But to do it in a way that’s going to ensure GDPR compliance, and also provide for rapid discovery.
So we’re gonna stop there, wanted to mention, we have two pieces of collateral we have quite a bit of collateral that might be useful for you. If you want to learn a little bit more time to call. And then I wanted to provide an opportunity to ask questions. I’ve got Scott Gidley. He is my colleague, with regards to product development product management, with any questions that are more technical in nature we can address those as well. So let me mention quickly I think one that’s very well suited to this conversation is one that was recently written by Scott, and this understanding motivates talking about that and then the other one is more general but very relevant and natural and architecting data lakes and that’s written by our founders Ben Sharma.
[Brett] Are the data governance capabilities within the Zaloni platform, specific to GDPR compliance.
[Kelly] Yeah, good question. I think we get an answer out a little bit. When we’re talking about the fact that these debt covenants capabilities are foundational and these are going to apply to whatever governance requirements you have progression to our enterprise, and whatever the case may be. This is really, in our opinion around setting the stage for good governance and in the process of doing that, it’s going to make GDPR compliance or any other data privacy initiative a lot easier.
[Brett] why is the metadata-driven approach, the best way to achieve GDPR compliance.
[Scott] Well I think key word is almost anything from a governance perspective lies in metadata right so as you’re building out rules for GDPR compliance and any other type of compliance initiatives, being able to have a healthy understanding of and managing the monitoring of your metadata is critical because that’s going to provide the linkage between the technical bits and bytes of data that are being changed who is making those changes and how you want to audit and manage that. So I think it’s kind of the glue that holds together all the other pieces and without it, you have to create a more piecemeal approach to the overall solution.
[Brett] How do I track a person’s data across the lifetime that has been in my environment.
[Scott] So one thing there whether it’s an individual person information and tracking that throughout the lifetime within our environment. There’s a couple different ways that we would help you manage that. First and foremost is we could provide a watermark or some sort of unique identifier for each individual within, within the data, and then we provide very extensive lineage and monitoring to changes to that information as it occurs over time, as well as logging capabilities where if there are specific things from a GDPR compliance perspective where we need to write out to a specific log where we’re collecting and analyzing information of how a person’s data is changed or it’s accessed or it’s ultimately sunsetted, we can help manage that again through from a metadata perspective of understanding the type of data we’re looking at and then very specific types of rules that manage and track the actual, physical representation of that information.
[Kelly] What should technology teams prepare towards GDPR and Saas Paas vendor model this opportunity. Any guidelines.
[Scott] You know that’s kind of an interesting one. I think that from a software as a service or platform as a service model, you know, being able to, you know, like there was only perspective we can help you register information that can be both on or off premise. And you know his changes to the software updates happen over time. And you want to make sure that they are compliant at that point so you’re not taking some new change let’s say from an on premise application that’s going to somehow impact your overall GDPR compliance or readiness. So I think that’s something that you need to kind of one check with vendors that you work with and make sure that they are compliant upon any new changes or updates to that software as it moves forward. And if there are any new rules, or so forth that have been added how those apply to your overall business so it’s something I think that you sort of need need to manage from an accounting perspective and a ongoing change management perspective as well.
[Kelly] What are some of the thoughts with regards to how an organization can address federated data, right, we’ve got all of the rules around the cross border transfer. Is there anything that you might call out as a relates to a governed data lake and in that particular directive.
[Scott] Yeah, so I think that some of that boils down to the architecture of how you want to set up the data, the data lake and maybe it’s a multi tenant environments that you’re managing, where you have different sections of the data lake that represent different parts of borders or localities. And then you can manage the data from a singular perspective but the access to information and the control of data as it is, it would be managed in those environments maintains separation. And we do that somewhat within projects and some of the way we manage role based access control to data and artifacts within our system
[Kelly] Can we talk a little bit about. I’m sure they’re the right to be forgotten, and data retention in general, you talk a little bit about how, either through this learning platform just in general, data lifecycle management can be implemented to, to allow for them
[Scott] Sure. So I think that the way that we manage that specifically within zaloni is we have the concept of a lifecycle management which is a policy driven way to manage the lifecycle of information. So, specifically based on entities or data within your data label let’s say you could set up a rule that says, For this particular data set or this information that’s coming from this part of the organization, the life cycle is basically, it can be driven off time-based, you know, the information is valid for a certain period of time, that at that point in time it should be archived in some certain way. If you have rules about the right to be forgotten it can be tracked in that data or policy around that, that specific data you could set that up as well so it could be event-driven. It could be time drifting there can be custom alerts, or custom events to trigger that information to be in some cases pulled from the data lake or pulled from access to a system that would help you manage that type of implementation.
[Kelly] Is GDPR compliance applicable just for user behavior data tracking, where you look for information that is being collected about a user that downloads resources from a company’s website example, that they gave was a was download a white paper. So, marketing organizations around the world, are, are in for a rude awakening with regards to GDPR compliance and to implement a lot of the things that we talked about earlier on with regards to. If you go ahead and download a white paper for me. I have access to information but I cannot assume that I can leverage that in other ways. Right. So, there has to be a lot more transparency. There has to be a lot more with regards to explicit opt-in until as a marketing organization that you’re going to be greatly impacted by this. Any final thoughts from your areas of working in the data quality and data governance industry.
[Scott] I think that you did a really good job of kind of meeting how the technology and the process have to come together to deliver this type of initiative. And, you know, from from my perspective as I mentioned I think having a solid foundation of metadata that can drive these governance policies is critical to making this work in organizations. Thank you very much for your time.