Webinars

Prepare and Protect GDPR Data using Elastic Stack

Learn how Zaloni Arena can help prepare and protect your GDPR Data by requesting your custom demo today!

Read the webinar transcription here:

[Brett Carpenter] Hello everyone, thank you for joining today’s webinar, preparing to protect GDPR data using elastic stack. My name is Brett carpenter, I’m a marketing strategist here Zaloni, and I’ll be your emcee today. Joining us is one of our leads software architects, Sabby Gupta Now the goodwill have time to answer your questions at the end of the presentation. So don’t hesitate to ask them at any point using the after question tab, below the player. Before we dive in. I like to introduce you to Zaloni. Put simply, Zaloni simplifies big data, we help customers modernize their data architecture and operationalize their data lakes to incorporate data into everyday business practices. We supply the Zaloni Arena DataOps platform, which provides comprehensive data management governance and self service capabilities. We also provide professional services and solutions that helps you get your big data projects, up and running fast. And with that I’ll turn it over to savvy, to tell us how elastic stack can help prepare, protect, and enforce your GDPR data.

[Sabby Gupta] So my name is Sabby Gupta and we already got a brief overview of what I do, within Zaloni. Today we are going to know a bit more as to how GDPR can be implemented using elastic stack. Before we deep dive into the webinar. I would like to put a disclaimer, as we all know GDPR is a. There’s a lot of requirements, a lot of legal requirements part of part of the definition of it so. So the disclaimer is I am not a lawyer. This presentation is about just consolidating what I’ve learned over the years over the months of what GDPR is and what we can do with the, with the technology available in hand. So, if you want some legal advice you should consult a competent attorney. Also, if you want to learn more about how elastic stack can help in your GDPR enforcement, please feel free to visit the elastic website. There’s a lot of useful information over there for the month I have actually learned quite a bit from there. And some of this information is kind of a consolidation of that, my understanding, so that I can help others to actually nail down. How to enforce GDPR using elastic stack. 

So, what I wanted to talk about today is roughly, understand a better GDPR medical primer that I’m pretty sure most of you would probably know about. Maybe you have already experienced or already working on GDPR enforcement projects. For those who don’t. This would be a good few minutes to talk about what it is and what it entails to enforce. We’ll go through a brief overview on GDPR. And we’ll talk about what we do, what are the compliance processes so GDPR is a pretty clinical document that presents medical gifts. And so, This is just kind of making a concise abridged version of what to expect. If you want to enforce it. Some things, definitely might be met, but this is what a big violation of of that legal document. So when we talk about the three stage processes, which we think we need to go through for GDPR enforcement. We call it prepare, predict and enforce, and we will go to details of what they are. Then we will see like how we can apply elastic stack features on top of those different stages of GDPR compliance. In our projects, and eventually we’ll go to q&a. 

(4:00: What is GDPR)

So, what is GDPR, in general, so GDPR is not new. Actually, it’s replacing a previous 1995 eu Data Protection Directive. The purpose of GDPR is to provide a set of standardized data protection laws, across all the member countries in EU. They should make it easier for us citizens. So, it is not the US citizens. It’s best to use it to understand how their data is being used, and also raise any complaint, even if they’re not in the country where it is located GDPR is becoming increasingly recognized as regulation, that will be leveraged to stem the increasing number of missing data breaches, which we have seen over the years that, like, it causes a lot of pain for the companies and also organization’s reputation 

GDPR defines few specific roles and definitely don’t understand it better. The top most is kind of what it calls a data subject. So, they are subject in this case Mike is a eu resident, because it is for the European Union and UK or the UK is not part of European Union might be in the future, but still, it’s an interest for you and your resident and identifiable natural person is one who can be identified directly, without the data subject. There are two more important terms: they use the data controller and the processor will give an example as to where we can differentiate between the two. But the data controller is the one who performs the purpose and means of processing your data. So maybe that is a place where you actually put your data in as an individual data processor that acts on the instruction of data controller. And so, There is also responsible as a data processor. And last but not the least, is an important role which all the enterprise would need to build in is called the data protection, officer. So, data protection officers are responsible for overseeing data protection strategy and implementing to ensure compliance so they would actually understand, they would they can legal guys or anybody else would they would understand the GDPR law. And what it needed to be enforced within the company. Just to see like what is the difference between a controller and a processes. 

So for example, if you have like a manufacturing company, any kind of manufacturing company, because you would need to target the worst, and he would need to do some research. So most recently he will be giving that work of research and how to target to increase your revenue and sales would be conducting market research company. So, in this case, the manufacturing company would be passing on some user information to the market to that company who would be doing some analysis and then they would come back and tell you okay this is what needs to be done. So in that case, the manufacturing company because the controller, because it has the required user information, and the market research company becomes the processor. It’s possible. In today’s digital world that the same enterprise is acting both as a controller, and a processor. For example, a lot of social networking sites, behave that way. There, they have the data via the controller, so there was a processing it by targeting ads or different products.

So in GDPR, the main focus is on personal data. So let’s see what personal data is, and what are the different things associated with it. GDPR actually moves away a bit from older adults acronym of PII, which is like personally identifiable information, but gradually the top personal data, because it’s a bit more broader was the border destination. So after the GDPR article, personal data means any information. I think identified or identifiable person, which is a subject which you just saw, what is the definition of data subjects, an identifiable natural person is one who can be identified directly or indirectly in particular, by reference to an identifier such as the name. Identification Number, location, some online identifiers. Some other factors like genetic mental economic, cultural, an IP addresses, which can help identify a person so anything that can identify an individual is data. Data also has a special category of personal data that has additional restrictions and requirements. Anyone is trying to gather some ratio of ethnic or religious biometric information that is special characters or special characters are because of sensitive data. So there are there are two that need to. So there are different guidelines of of handling such special category of data GDPR there are some rights for data subjects. As you see, these are a few of them, but these are few important ones. I’d like right to be informed that you need to you don’t need to be informed as to where the data is and what’s going on with it. They should be able to access it so there’s a right to access, if the data is wrong, the user should be able to request rectification ification. The user should be able to read the data, we will go through some of these in detail in the next slide. The right to record processing, if you don’t want to be targeted by a particular ad or something, some processing if you want to restrict you should be able to request that if you would import the data from one processor to another was whatever reason. There has to be a mechanism of doing it. If you want to object, the processing for various reasons with level. You’re not opting out but no litigation going on and you want to you want to like put a stop on the processing for some time. Have some data at a particular controller. It should be able to do that. And the right not to be subject to automatic decision making, including profiling a couple of tickets Magnus might not be right and might not be a particular age when the 1985 directive came out. But nowadays we know like a lot of stuff is happening automated there are machines, using machine learning and trying to target people and do stuff on personal data. So, so this last right of data subject is very prevalent and like isn’t the fourth forefront artists. If you another pilot if you want to learn more about it. You can you can go online, on the US site and it has like all the articles all the chapters, he can. He can see in detail what the law entails. So, as I said like this GDPR is not new, and it is built on an older Data Protection Directive, but there are a few new ones which got it got created part of the directive of GDPR. They are the right to erasure, the right to restrict processing and the right to data portability. So if we need to go a bit more detail so right to erasure is this is designed to give individuals, the right to be forgotten. If the data held by them, is no longer needed. Nowadays we don’t know like there is a digital footprint that’s what we call. If there is, if even if you upload a picture from the digital footprint and we don’t know even if the delete the picture is gone. For permanently the system might not be. So, that is what it says to me, I do have a right to erase my digital footprint, you may want to this to the floor. The right jurisdiction of. This is designed to make it easier to contest the accuracy or lawful processing of data, where there is an objection regarding the legitimate legitimacy of the processing. If an individual deserves this way, his or her data can only be continually processed. For details, or legal claims. With the consent of the individual or for the protection of the rights of the person, or in public interest. So, kind of like you’re, you’re restricting the scope of the processing of the data and the right of data portability, as I mentioned, like, this is like kind of asking if my data needs to be ported from one place to another. And that needs to be done in a secure animation today. 

(13:00: Do I need to be GDPR compliant)

So now that we know a bit more insight into what GDPR is and what are my rights. The next question as an advisor is do I need to be GDPR compliant. First of all for the GDPR data, even if it was not animate. There is no way to identify an individual, then you’re good with nothing much needs to be done, because there’s no way to target a particular person, or the other scenarios. For example, if the GDPR data is not anonymized. Do I have consent. If I don’t have consent or illegitimate basis to collect data. Then, as an enterprise, you should stop, you cannot collect the data. You should just remove it from the system. Otherwise they will have legal implication is the data. I might have considered but if the data of special category has been discovered before. It is there, then we should get legal help, because it has mental processing to to handle. You know, if you’re if it is, then you can capture so that means you have consent and you know how to handle special category of data, then you can go ahead and start capturing GDPR data, but you have to abide by the GDPR laws. 

(14:00 GDPR Compliance – 3 stage process)

So, this is our take off what the GDPR compliance needs to do. So this is pretty test progress in our opinion here to prepare to capture personal data, then you have to protect it and you have to enforce it. I’ve always worked. 

So, the prepare stage is initiation of GDPR compliance cycles, with data flows are mapped for traceability data controller and processor those defined and data sharing legal agreements. Understood. What are the different major steps in them. The first one is information mapping. For example, if you need to implement the right to be forgotten, then you need to know where your data is in a big enterprise, it might be pretty. It might be available data and you don’t know where the data lives, but in GDPR world, you need to understand each bit of data is the process of identifying and documenting all the data flow processes within your organization that control, or process person data needs to be understood, or need to be mapped. In this step, then you need to confirm what role you’re playing for each data flow identified in to see whether you are a data controller processes or code that comes to production impact assessment. So what this is like this. This next. The standard risk assessment process clarifies the level of risk associated with loss or disclosure of the personal data associated with a particular data flow. That helps determine the appropriate level of protection required next special categories. So if your data is not what happens there would be implications, not only do you have to definitely declare our incident response investigations, but you also have to know like there a special category of data it loaded, what is the impact it will cause on your process. On the upon upon. Kevin also on a legal skill that you need to execute and then you need a data protection, Officer doesn’t lead to an officer responsible for overseeing data production strategy and implementation, to ensure compliance for non compliance, there’s like heavy fines. To find that pretty significant. So, that is in the best interest of enterprise to abide by the law. And you would like to do data retention and planning, because for each. Define data flow, determine how long you’d want to preserve GDPR Data or what you want to minimize the storage of data as much as possible. motivated to start thinking about how to delete data once his redemption period has been reached, and or if the data subject might subsequently request to be erased. And last but not least, negative attribute of the processes. So, organizations work with multiple data controllers the processor must be that one stop in the GDPR world. In fact under GDPR the data controller processor remain responsible for any access permissions not subprocessors. So, even if it goes through multiple levels whether the controller and the processor impulsive there is a sub process and after that each, each entity in that in the pipeline is responsible to enforce GDPR. So don’t think that once you pass the data to somebody. It is gone. No, it’s not like that. So it’s important that an organization ensures that it’s the processes that adequately protect personal GDPR data and there is an appropriate data policy agreement in place to enforce such protection. So that is part of the prepare phase as you can see part of the prepare phase. You need to chalk out a list of kind of create a blueprint as to how would you enforce the laws mandated by GDPR that comes to protect states so this is kind of normal, we have to print as to how we want to protect, how we want to protect the GDPR  data, but now the implementation phases like kind of the particular This is where people start implementing those, those guidelines, which came from the interface. So productivity is all about implementing the appropriate level of protection for the personal data associated with each data flow identified a stage above, which you were resolved, so far as you have data at the state, the controller and the processor needs to be implemented appropriate technical and organizational measures delivered to ensure level of security. In this data viewer to use a technique technologies like Elastic Search, which is the focus of this conversation to do to protect their GDPR data in this state. So the first step we are going various steps in there so first one is data protection by design and default so this is kind of maintenance of GDPR, but it says why we need to make data protection production is in the forefront, it’s not wait about the part. So, we need to make sure that personal data prediction. Things like anonymization and pseudonymization of personal data is maximized where while distribution of personal data is minimized. So you want to store. 

So the lifespan of personal data should be as minimal as possible, even if you have to store it then you want to think about, how do you want to secure it using different defense mechanisms, the proverbial randomization. You want to be part of the data protection design is one of the mandates, is to make sure you You’re, you’re encrypting and sort of noisy pseudonymisation. The data, personal data in your, in your data stores access controls. Obviously, anybody cannot just come in and get access to the data. So wherever the data is stored database elastic or anywhere else. The access controls, need to be there to make sure, only the required people can get a handle on the GDPR data logging and auditing is very key, very key design patterns. Anyways, in a technology space. So, need to leverage different security frameworks so that we can keep up. Keep logs and audit as to what is going on with the data, who is controlling it who is accessing and all that stuff, monitoring and reporting incident detection. If there was a data breach in the best of times, enterprises might not report or might delay the reporting of a data breach. It is not like that anymore. GDP or GDP are put strict timelines and stick guidelines that need to be recoded otherwise they’re very heavy compliance. Non compliance ones. Last but not least is data loss prevention to guard against personal data loss. You need to have an infrastructure process that might be designed to preserve data integrity, so that the data integrity in the event of system disruptions and failures. So the process is like making a systems active active, but you have to plan the business continuity. And how do you maintain data integrity is one part of your system goes up or one of your data centers goes offline. And the last is how to enforce it. So, you have prepared a blueprint on how to implement the GDPR mechanism implemented but you need to keep on enforcing it. So, this is part of the internal and external processes that need to be DC to be defined and comply with specific GDPR regulations. So some of some of the steps that we think are required part of it is to maintain data subject rights GDPR defines several rights for data subjects. One of them is like data. right to erasure. So, if sometime in the future people come and an individual comes and says I removed my personal data, there should be mechanisms to remove it from your system. Outside of kind of outside of you. So during preparation for GDPR this step involves the creation of GDPR data processing agreements. So part of the requirements needs to define how the data would be would be transferred, or, or processed so those, those need to be well defined and known that what how the data is being used by another entity. This is required because one of the, one of the rights of data subjects, is to understand what is going on with his data. So if the, if the individual comes to a controller, an athlete now my data is being used. The controller needs to be able to to do what shape and form his data is being used. And last but not the least is incident response and notifications so there needs to be internal mechanisms to do have to identify the data breaches and report it in a timely manner. So this is kind of at a theoretical level as to what GDPR compliance, you have to do a GDPR compliance, what are the different stages and how what are the different sub steps we need to take, so that we can successfully comply with GDPR. 

(20:00 Elastic Stack)

Now, let’s talk about how Elastic Search can help realize the above GDPR stages. I will be overlaying elastic stack features over the period one, which I’ve showcased so far. Part of Elastic Search, one part of the stack family whenever we talk about it’s kind of the elk stack, we see so it’s Elastic Search within the middle is kind of your data store cabana which is kind of the visualization layer. You can do a lot of, lot of cool stuff over there, and the LogStash LogStash is kind of the data transformation, and the data pipeline which is which gathers data from one phase to another. Beat is kind of the final like plugins. There are different kinds of plugins which kind of data shippers can see like metric. The dollar the file the metric beat would would carry your metrics from your system to say Elasticsearch for example, and files might be transferred file to one another. So that is a big framework for the elk stack. It can do various things. They would there is something like an audit beat, which would, which can detect changes to your file system so we will see a code that is used for monitoring all that stuff. 

Part of the elastic stack. x pack extension. So it’s a single install extension for elastic stack. It’s kind of login value added on top of elastic stack, you will get things like security on Elasticsearch alerting you things change on elastic indices you can get those alerting alerts, you can monitor them. You can do whatever reports in Kibana. They’re the new thing like graph. if you have like a graphical user, you can do that. Nowadays they have started putting machine learning capability. So like anomaly detection which we’ll see is very useful for of GDPR compliance. Similar part of the expect. However, overlaying elastic features on top of each of these stages, which we saw so far. First one is information mapping. So mapping data flows is the first step in dealing with regression, and if an arc is unable to identify relevant data flows, the GDP are initiatives may be incomplete and ineffective, depending on where an arc is storing personal data today, it may make sense the input for ingest a copy of all the data into Elasticsearch where it’s powerful, and fast full text search capabilities. Enable quick identification of tables queries reports or applications that will have personal GDPR data. So you might think, if the spreadsheet, by saying let’s put all their Elastic Search, which might be true, but if it is possible. I think elastic scalability of of scaling out and storing the huge datasets would be very useful. And it’s indexing and searching capabilities definitely an asset. But at the very least, and in many to many enterprises need to map the data, data flows to might be a data flow blueprint is there in Elastic Search, with the required identifiers that it leads to me. If something comes to to handle update or remove or personal data. So we know where all the two minute per user needs to go to remove that, that digital footprint. Attention, and planning for GDPR to suppress limited rotation as director of active arts are required to delete personal data, but it is no longer needed the retention of personal data in Elasticsearch can easily be managed through index management Elasticsearch supports time based indices, that can be deleted after a controlled period has expired. There is also a newer, and it’s not newer, but it’s in a cloud or cloud version of Elastic Search called elastic cloud enterprise. 

In short, EC, which is kind of the central orchestration software of elastic in cloud, which can also be used to manage a fleet of Elasticsearch clusters and address challenges that come with multi tenancy like data separation. And last but not the least part of the prepare is the agreement review of subprocesses so these are like complex documents legal documents that that need to be mined and understood. So if nothing needs to be searched all of these documents can be put into Elastic Search and search through keywords, which would have the required team to see what are the different steps or different agreements done with the sub processes. So all of this can be done through the Elasticsearch stack. 

Going into the protect state. Over here almost every step, can be achieved through, through elastic stack. The first one is data protection by design and default which means we need to make sure, if you are starting fresh. Get, we need to protect data at the beginning, not an afterthought. It is a first class citizen. So if an ARB is considering using elastic stack, as a data source data store of personal data, the capabilities of elastic cloud enterprise can come in very handy. And it can we recommend that’s the way to go. The principle of data protection by design is about three personal data like valuable valuable secrets by limiting access maintaining accuracy, ensuring data is secure, limit and limiting retention. Unlike traditional data architectures, with one massive data to read volumes of complex overlying data access controls. You see makes it practical and instantiate new clusters. Pretty seamlessly plugged into only data relevant to that project in its workflow is kind of like, you can have like different projects and different, different, different projects have a different set of data, which can be easily spawned through the easy framework. The next day. 

So next is cryptography and pseudonymisation. What would you recommend use LogStash, because lobsters can transfer data from one place to another. And if the data is coming to an octet there is nothing out there, there are a lot of things you can do with the lock phase because it is kind of a plug in based architecture, again, but the one of the important one of the ceiling. Plugins it has like the fingerprint plugin with the fingerprint algorithm, which can make anonymized and sort of analyze data as needed. So, while the data is flowing through LogStash it can it can encrypt it can obfuscate the data, and then send it to different sources like Elasticsearch and then we can start monitoring and analyzing the data access control so this is a pretty, pretty broad category because it is very key we don’t want to give data to everyone and we want to limit access on the data which is stored in Elasticsearch. So, to prevent unauthorized access to personal data stored in Elasticsearch clusters. There must be a way to authenticate users. And there is. This means that a user is validated for who they claim to be expect security. In this case, is very handy, Twitter features provide. It features provide a standalone authentication mechanism that enables quick password prediction for clusters cluster at the bare minimum. If there is an SLR authentication mechanism to manage users of the arcs like LDAP Active Directory, or PGI expect security features are able to integrate with those types of platforms to expand the security features also include IP based filtering, so if you don’t want to provide the access from a bunch of machines and violate others, it is possible to expect security certification is one thing comes obviously once you get authenticated you have to authorize the user as to what he can do. In fact security has role based access control, which provides the limited ability to specify which users can read and write operations on the Elastic Search indexes for the personal data. So, if you can you can control from that and the other is a concept called attributes based attributes based access control. What it means is like you can you can define attributes on different fields and different part of the document, and based on that. If you can, if you prefer if you run a query with those credentials, with your with your users, mutual credential. It will number of checks and returning the data based on those attributes. For example, if you’re part of the pod data document, which would only go to a particular person with a particular certification. Say AWS certification or ledger certification. You can put attributes on one of the data sets and if you don’t have those certifications, then you can ask me how they know but those those stated details they took they’re part of your system. Make an LDAP or somewhere. So you’ve been integrated with it so when the query gets run. It can match those external sources with the attributes on the data and filter. You can also prevent. Obviously, database filtering, or if you’re transferring data from one node to another. He can, you can apply SSL and TLS encryption from node to note, so that your data is encrypted and it’s not compromised. 

And one of the last features is at the different granularity of Elasticsearch preview at a cluster level, you can put, you can check the cluster health and put like alerts just to see whether the cluster is in the right shape and form. If it is running properly, who has access a particular cluster, you can you can do that. And then, if it goes down you can look at the index like who can add, delete, you can control who can actually to look into the indexes. In Elasticsearch another level deeper, another level of grammar is document, who can access sensitive documents you can put access controls on them. And at the lowest level you can also control field level access restrict access 20 visual fields. so you can see this very truck from a CT scan to find red color ID and access control we can get out of Elasticsearch. Next up is logging and auditing. So, obviously, we want to track like who’s doing what. And in order to in order to track the need mechanisms to to capture logs. We need to see like, if the data is changed, who is changing it, if a particular index was changed. What changed it so we need to track the whole kind of the lineage, as to what what was going on. To have that to help with that. That is part of the beat framework. It is an audit beat so automate is a lightweight shipper that you can install on your servers to audit activities of the users of the processes on your system. So you can see like, what file chain to chain data kind of very similar to. If you use Splunk, which kind of log aggregation. So you can use audit beat to push your push all this events into Elasticsearch, and then you can start using.

On the data which is stored in Elasticsearch. So, to prevent unauthorized access to personal data stored in Elasticsearch cluster. There must be a way to authenticate users. And there is. This means that a user is validated for who they claimed to be x pack security. In this case, is very handy later Fritos toy. whatever is on there and it goes back to the prediction for clusters at the bare minimum. If there is an XML RPC authentication mechanism to manage users as the ark, like LDAP or Active Directory, or PGI expect security features are able to integrate with those platforms to expand the security features also into IP based filtering, so if you don’t want to provide the access from a bunch of machines and whitelist others, it is possible to expect security certification in one thing comes problems. He wants to get authenticated he want to authorize the users to what he can do inflexibility has open access control, which provides a limited ability to specify which users can read and write operations on the Elastic Search includes coding personal data. So, if you can control from that and there’s a new concept called attributes based attribute based access control. What it means is like you can you can define attributes on different fields and different part of the document, and based on that, you can. If you prefer if you run a query with those credentials, with your, your user, Mr credential. It will run those checks and returning the data based on those attributes. For example, if you are part of cp 50 the data document, which would only go to a particular person with a particular certification. Say AWS certification or literature certification. You can put attributes on one of the data sets, and if you don’t have those certifications, then you can ask me how would they know but those those stated details need to be there as part of your system, maybe in LDAP or somewhere. So you can integrate with it. So when the query gets run. It can match those extra rows associates with the attributes of the data and filter it out. You can also prevent. Obviously, IP based filtering, or if you’re transferring data from one node to another. You can you can apply an SSL and TLS encryption from node to node, so that your data is encrypted and it’s not compromised. And one of the last features is at the different granularity of Elasticsearch preview at a cluster level, you can put, you can check the cluster health and put like alerts just to see whether the cluster is in the right shape and form. If it is running properly, who has access to particular clusters, you can you can do that. And then, if it goes down you can look at the index like who can add, delete, you can control who can actually to look into the indexes, in any way lastic search, another level deeper, or another layer banner is document, who can access sensitive documents you can put access controls on them. And at the lowest level, you can also control from a field level can access restrict access to individual fields. So you can see this vary from, from a CT scan to find grant Karla ID, and access control we can get out of Elasticsearch. 

Next up is logging and auditing. So, obviously, we want to track like who’s doing what. And in order to in order to track we need mechanisms to capture logs. We need to see like, if data is changed, who is changing it if a particular index was changed what James did so we need to track the whole kind of the lineage, as to what what was going on. To have that to help with that. That is part of the beat framework. If there’s an audit beat so automate is a lightweight shipper that you can install on your servers to audit activities are the users of the processes on your system. So you can see like, what file changed to change date is kind of very similar to. If you use Splunk this kind of log aggregation. So you can use the audit beat to push your push all this events into Elasticsearch, and then you can start using machine learning. Experts Kibana dashboards and other stuff to find anomalies and do other interesting things. So once, once the data is being pushed. This is a sample dashboard of Kibana you can create and you can see like how the data is flowing in what time in the day, what events preceded, or succeeded particular a particular event. So all that can be done. Once we have done auditing, and we’ll help give us know we can start monitoring and report incident detection report report reportable in detections, which is mandated by GDPR. And how do we do that. So, part of our response. There is a machine learning, plugin, which can help detect anomalies. So it’s a pretty, it’s a it’s a new thing, which was there for some time, but it’s pretty new compared to how long Elastic Search has been there in the ecosystem. So that would help us to detect the anomalies and based on the events and the patterns which we see we can report, whether a data has been breached if somebody if somebody who’s unauthorized access to data. If so, then the users can report that out. Part of monitoring is also monitoring the health of the service. So in fact, alerting. In addition to that, expect alerting and monitoring features, enable automated monitoring of low intensity and notifications when interruptions or failures occur, see if there is a server is down. If a data center is down. You can put alerts on these logs and and get and get, like, Can we can know that something went wrong, and then start diagnosing on the cause of it. And the last is within it with every system comes disaster recovery and resilience. So, we need to ensure that whatever data has been stored is prevalent say something goes down. Twitter’s personal data. There should be data integrity and there should not be loss of any data. 

So, whether the data store for personal personal data are the centralized now the smartphone, because we can store any data in Elasticsearch. We need to ensure that Elastic Search has been designed from start to distribute the data. So, by design. Elastic Search is distributed application. It has different shards what it called shards. So replicas of the data in different places if you have like a topology of nodes, or data centers across a particular geographic region Elasticsearch can like make copies and replicas of it. So, if one region goes down, the data is not not, and it’s there. Other places. This is required but again this is required part of the GDPR guidelines and rules. So, Elastic Search on two requirements is distributed design and architecture. And the last stage is not a bad forcement in that one of the important things is the subject rights. They are subject rights that needs to be enforced. And that can be done to Elasticsearch API so whenever a subject exercises, the right to erasure or withdraw their consent to follow the collection of their personal data. One of the biggest challenges generally is to figure out, find out that data and remove it and delete it off. So, based on what we have read. Understood. So far, the data flow information flow should be should help us when the data is before the data is in Elasticsearch then we can use Elastic Search API method delete query API. Delete by query API to to satisfy that or if somebody wants to update or rectify this information then we can use update by query API to update information very easily. Elastic foot. So, so with that, we see like how what are the different stages of GDPR compliance and how do we accomplish it through to Elastic Search. So just to summarize, just to put an emphasis as to what’s going on so organizations are in the hot seat with GDPR compliance and the fines for non compliance, as well as breach notification requirements with tight timelines, add to the complexity of adherence. Many, many reports have degraded the cost to be completely clear compliant Prospero becoming GDPR compliant will require focus and it’s certainly true that a certain amount of resources we will apply to the challenge. As shown it’s possible to use elastic Tap, tap technology to have the process to ensure data management processes fit for the purpose of the long term elastic stack usage, as a data store for GDPR personal data provides us with a strong starting position for building the GDPR compliant data store with security access control resilience and disaster recovery capabilities. So, with that, we come to an end, and I hope it was useful and informative as to how we can use Elastic Search stack to enforce GDPR in your organization’s. 

[Brett Carpenter] Should I always remove or anonymize the username or IP addresses that I collect. 

[Sabby Gupta] The short answer would be yes. But there can be scenarios like for example for legal compliance there is if there is a data breach the enterprise might want to track or would want to see where the data breach came in and try to take like legal steps or appropriate steps, if the data is completely anonymized like for example IP addresses, it might be tough to do that because the data is anonymized, something is mapped and is the way to go to the original data. And in that case, the data cannot be anonymized. So when I said that answer is yes. Generally it would be. but for those scenarios. The recommended approach is to photos. Set Data field data, data set field, store it. Not anonymized, but again, you cannot make it open to everyone. So, most probably have different like two different the Elasticsearch indices, one with anonymize data, which is for a broader use, and one, which is not, and which is tightly controlled tightly access controlled only. So for example the security team of enterprise. 

[Brett Carpenter] Where does Master Data Management take place, or the need to resolve an individual’s identity and connect the individual to all their personal data within the enterprise. 

[Sabby Gupta] So, Again, another very good question because Master Data Management or MDM In short, this is pretty key. Nowadays, there were traditional applications in the past from big companies but now a lot of like machine learning and all that stuff is bringing Master Data Management into the enterprise. So, it is, it is important because master data is built on top of the data which is captured, then some of it is personal data that is part of the information flow is Master Data Management part of the Master Data Management is the data set is in a particular repository, with an elastic stack or a database, or say, if you’re in big data HDFS or somewhere, you’d have that you need to have that part of the flow information flow. So as long as you’ve trapped and you know where the, where the data is so tomorrow if somebody comes in wants to ask for rectification or deletion or glacier protocol. It should be the trap that and remove that from the master data, because we cannot store personal data. If somebody wants it to rest. I hope that answers the question. 

[Brett Carpenter] So it says as elastic searches document based. Our relationship inherent in data sources discovered and retained. For example, connecting an email address to an IP address. 

[Sabby Gupta] So yes, although Elastic Search is document based. that is at the lowest common passwords stored at the lowest level is using a document based capture Hyler in the past, before it came in part of the expec. Before that, there were other plugins like for example, there were like, as a number, they were called siren, or Kimmy plugins, which are built on top of those applications built on top of it. So they had even means to link documents so if you want to create a graph, using some identifiers, so those, those applications, did that for you. We have decided in the past, and like it was part of the TV. I think it’s still there. They gave like a graphical structure to a document structure. So underneath is the document but over it is the abstraction is a graph, because you can link properties. But with expect graph. You can start like tying up attributes from one attribute to another. So, that can be done, say for example IP address to to a particular person, or particular where it came from an individual. This is become more prevalent because in one ipv6 bits upstream picks up steam. This will be more prevalent because then each device will target a particular entity or identity. So those, those things can be possible. They need to be maintained. Elasticsearch now gives it natively. Previously it was from external applications but now it gives it. 

[Brett Carpenter] I want to thank all of you for for joining this presentation, and to Saturday for taking the time to speak with us today about the GDPR and capabilities that elastic stack can provide. This presentation will be available on demand, both on Zaloni’s resources and on brighttalk for future viewing. Thanks again, and have a wonderful day. Thank you. It was a pleasure.

about the author

Team Zaloni

zaloni zine header