Data Loss Prevention (DLP)
In the context of security, Data Loss is any event that causes either unintended exposure of data beyond the scope of granted rights or malicious copying and/or deletion of important data. To combat such data loss, organizations implement a practice referred to as Data Loss Prevention (DLP). DLP most often is implemented via a class of applications that include Intruder Detection Systems (IDS), Firewalls, and Antivirus software.
How Data Loss Occurs
Data Loss Events can occur either accidentally (unintended exposure of an API, for example) or maliciously. Historically, organizations implementing DLP solutions have been most concerned about malicious data loss that is caused by an attacker or inside intruder. eBay suffered one of the most infamous examples of malicious data loss events when the credentials for just three corporate employees were compromised, unbeknownst to them. For 229 days outside attackers had access to these three user accounts, and thus, it was assumed they had more than enough time to copy data for all 145 million users that those three accounts had authorization to access.
Alternatively, many data loss events are the result of negligence and cause harm via material loss. One of the most interesting examples of this was during the creation of Toy Story 2 at Pixar Films. An employee accidentally ran the Linux utility command “rm,” with the necessary arguments to select delete all files and folders: “rm -rf *”
It was only a moment before employees noticed files disappearing, but it was too late. In only 20 seconds, 2 months of character files and animations scenes were deleted from Pixar servers. It was only later, when Pixar discovered yet another breach that saved them. In an interesting twist, an employee copied all of the film’s files onto their personal computer and was able to restore them.
As evident in both scenarios, a group of employees had far reaching data grants – perhaps more access than they should have had. In both scenarios, the activity of deleting all files system wide (Pixar), and the activity of sequentially exfiltrating data records of users, stepped far outside the bounds of the typical activity of those accounts.
Traditional Data Loss Prevention Solutions
Data Loss Prevention products, or DLP as they are commonly known, became popular nearly two decades ago, as large organizations started using them to scan all their different storage media and network traffic for unexpected PII or other proprietary data. These solutions scanned through all the data across the organization to match it against specified data format to make sure customer credit card numbers were not being stored on unsecured systems, and proprietary documents weren’t being shipped out of the organization. A challenge with all these solutions was the amount of time they took for them to complete a full scan and the amounts of false positives that were reported.
Leading up to a data loss event, an attacker must first gain access to credentials or data authorization grants that are not intended for them to have. This is often done via phishing, password lists, social engineering, or with Malware that a target accidentally installs on one of their devices. Even with adequate data security training and antivirus measures, etc, it is never possible to account for every possible attack vector of an organization’s users. In some scenarios, the attacker may even be an insider to the organization that is taking data for their own nefarious purposes!
To protect against this, organizations have historically put in place Intruder Detection Systems (IDS), in addition to solutions that protect the security of internal user accounts. These solutions observe and inspect, in real-time, the way in which data is accessed and modified. In the example of the eBay event, above, a properly configured and installed IDS may have been able to observe an exfiltration pattern in the User data from the three compromised accounts. Contrasting the behavior of the freshly compromised accounts, an IDS would have likely flagged the sequential SELECT queries of the User table as unusual behavior. This is not too unlike receiving a Fraud alert via SMS when traveling, if your bank all-of-a-sudden detects a charge in a new location you have never been before. With a simple SMS response, a bank can confirm the transaction attempt really is the authorized account holder, saving untold damages to consumer and financial services.
Several DLP approaches rely on Database Activity Monitoring (DAM) by retroactively analyzing SQL logs. Most SQL databases (MySQL, Postgres, MS SQL, etc), support logging functionality. However, this is not a perfect solution for several reasons. Primarily, logging typically degrades database performance by as much as hundreds of percent. Modern cloud applications that deliver services to end users with high bandwidth connections require minimal database latency. Increasing database latency by 4 or 5 times would result in a noticeable negative impact to end user application performance, and lead to an increase in infrastructure cost.
Furthermore, if an attacker has root level access to a database, all bets are off. Once an attack has gained access to a database they can simply delete their queries from the general query log for the database. Therefore, there is no way to guarantee that the query log of a database is actually complete and not truncated by the attacker themselves.
Cloud Infrastructure Challenges to DLP
All the above solutions assumed that data would be stored on relatively few, specified data repositories and there would be clear entrance and exit pathways into the organization. The days of when organizations stored data in a singular data repository for all applications running inside a digital perimeter are long gone. Modern cloud native applications rely on sharded and replicated databases distributed across the entire planet. While this enables end users to experience lightning fast application performance regardless of which continent they may be located on at any given moment, it means their data is replicated across as many as tens or hundreds of data centers, globally.
To add further complications to the challenge of preserving the security of sensitive and valuable data, the same class of data for each user (address, social security number, etc.), is often stored in numerous types of data repositories. For example, when a consumer purchases utility service with their credit card and verifies identity by using their SSN, the utility will likely store many of these values in a data warehouse for analytic purposes, in addition to storing some of these values in an application database. Additionally, the credit card number will likely be stored by a third party, who also stores that value in multiple types of databases.
This modern cloud architecture example makes defining and identifying a nominal “pattern” of data activity nearly impossible. Given that a consumer’s SSN may be stored in two separate databases, each with multiple purposes, there is simply no successful means by which existing DLP solutions can classify risky behavior with high enough accuracy to not cause “alert fatigue,” or misclassify malicious behavior as “normal.”
Data Loss Prevention and Intrusion Detection for the Cloud
As cloud architecture becomes even more complicated, diverse, and distributed, the need for a Cloud Native DLP continues to grow. This trend in cloud architecture is expected to only increase as consumers expect organizations to further leverage increased bandwidth and computing power available to them.
The flip side of this is that the risk of data loss events continues to increase. Consumer and Organizations alike are becoming more sensitive to this risk, as the material loss and damages of data loss continue to grow. The solution required must be able to intercept data queries in real-time and at the data repository endpoint location by intercepting the request.
Cyral has developed patented technology of observing data via endpoint interception. Rather than logging it for analysis at a later date Cyral analyzes the query and data within microseconds in parallel to the request.
The benefits of such an approach address the challenges of traditional DLP technologies when applied to cloud native architecture. There is no risk of “missing” query events from a maliciously truncated log table, as the events are analyzed against policy on-the-spot, and for every request. To learn more how Cyral can address your Organization’s DLP needs for Cloud Architecture, sign up for a demo today!