S3, the Amazon Simple Storage Service, is an essential data repository for most organizations. Developers today rely on the flexibility and scalability of storing data in S3 so that they can quickly build applications. This flexibility often comes at a cost in terms of security. It’s harder to secure an environment where there’s a lot of variety—like new sets of S3 buckets for each new application—and a lot of change, with new applications storing new data all the time.
With more sensitive data flowing into S3 buckets every day, AWS offers well-established best practices for securing that data. However, managing and protecting sensitive data in flexible data repositories like S3 raises important questions for security:
- Access control: Can you manage access consistently across S3 and other types of repositories for your team and the applications they use?
- Monitoring: Can your security and compliance teams know who’s got access to your S3 buckets, and what actions they’re taking?
- Auditability: Can you report to your users in detail about how you’re protecting their S3 data, and can you assure them of exactly who has access—and who’s had access—to their private data that they’ve entrusted to you?
Below, we’ll survey the built-in S3 data protections, look at common weaknesses that arise in practice, and provide an overview of some mechanisms organizations can leverage, including Cyral, to detect and prevent common threats against their data stored in S3.
S3’s popularity makes it a compelling target
Amazon Simple Storage Service (S3) is one of the most popular cloud storage services today. S3 stores objects, which means that an application or person can store a file in S3 just like they would do with a conventional file system. Also, each S3 object can be made reachable over the internet through its unique URL, making it easy to share files using the S3 service.
Over the years, S3 has become a critical component of the overall data lake architecture for organizations big and small. Today, popular services like Snowflake, Databricks, and Redshift run directly on top of S3. With this rise of popularity, S3 has become a common attack target for hackers.
With flexibility comes risk
While AWS offers excellent security controls for S3, arguably the biggest risk that organizations face when they give a team of developers access to S3 is the data getting mistakenly exposed to the internet due to misconfigurations.
In practice, when an S3 bucket is misconfigured, hackers might be able to:
- list the data it contains (that is, folders and files);
- download data, including recursive downloads to copy entire folders;
- replace data; and
- delete data
Let’s look at some tools provided in AWS that allow organizations to avoid such exposure.
Native S3 protection mechanisms
S3 now provides a secure configuration by default, with each new bucket set to block all public access. Turning on default encryption for buckets ensures that your files are encrypted before they are stored in the AWS backend storage, and versioning provides you a way to roll back changes made accidentally or damages caused by malware such as ransomware.
At the architecture level, you can use traffic segmentation to isolate an application server into a Virtual Private Cloud (VPC) and then define S3 access control rules so that your S3 buckets accept requests only if they originate from the VPC where App Server is hosted.
When configurations aren’t enough
When it comes to security, we should assume every target with internet access is hackable. Application servers are no different and subject to software weaknesses such as Server-side Request Forgery (SSRF), Unrestricted Upload of File, and Improper Restriction of XML External Entities (XXE), just to name a few. These weaknesses open a direct path for hackers to access your organization’s data.
Let’s look at one of the most common of these attack vectors, the Server-Side Request Forgery (SSRF). An application is vulnerable to SSRF when it takes a URL as an input and makes requests to it without validating the target host. This attack vector gained notoriety as part of the infamous 2019 Capital One breach. Today, its impact can be mitigated using bucket policies and turning on Amazon EC2 Instance Metadata Service, (IMDSv2) which protects each request with session authentication.
Threats from compromised hosts
While the countermeasures mentioned above can protect against AppSec vulnerabilities like SSRF, they are ineffective when the attacker compromises a legitimate host or application that has access to S3. A common example of this is when your environment is running a legitimate (but clearly poorly vetted) application that allows insecure deserialization. Such an application blindly trusts information sent to it, and as a result deserializes any incoming data—in effect, unpacking that data and converting it into objects that can invoke unwanted actions.
If an adversary finds a hosted application that allows insecure deserialization, they can create a handcrafted object that runs arbitrary commands on the application server in order to access an S3 bucket, exposing your data. Even if the S3 bucket has all the right security features turned on, the adversary can still perform data exfiltration. The worst part is, as the administrator of this application, you probably won’t discover this breach until data has already been exposed.
The root cause of this issue is that while most approaches focus on protecting the application and infrastructure, none of the built-in AWS tools represent a pervasive approach that protects S3 buckets directly, regardless of where the requests are coming from.
Post-facto offline mechanisms like CloudTrail logs can help monitor data activity on S3 buckets, but they do not block malicious exfiltration attempts in real time. As such, CloudTrail’s log-based monitoring helps with visibility for security teams, but it can’t stop an attack.
A modern approach to stopping attacks on S3 data
To more fully protect your data in S3 requires surveying the scope of ways your organization needs access to data, and the range of threat vectors that could compromise it. As we listed at the outset, this survey needs to take into account access control, monitoring, and auditability. Let’s look at each in turn.
- Access control: AWS’s IAM policies allow you to specify who has access to which buckets, and even support SSO, but for more complete protection, you need to consider the following needs:
- consistent access management across S3 and the other types of repositories your team uses;
- granular, attribute-based access controls for both your team members and the applications they use, and being able to monitor which application and which user is responsible for each action in S3; and
- identity and access management integrations that provide federated authentication for users of your data cloud, providing access based on their user SSO group memberships.
- Monitoring: Your security and compliance teams need to know who has access to your S3 buckets and what actions they’re attempting, and a secure infrastructure will also be able to block potentially destructive or malicious actions before they happen. You can achieve basic monitoring with Amazon’s S3 server access logging and CloudTrail logging for S3 API calls, but a more complete solution also needs to address:
- consistent interception of all activity across all data repositories;
- alerting and blocking of unauthorized and policy-violating access; and
- mechanisms to detect anomalous behavior.
- Auditability: With sensitive data stored in S3, auditability is important. Your organization needs the capability to report to customers, regulators, and other stakeholders in detail about how data is protected while in S3 and in every repository you manage, and you need to preserve audit logs showing who’s had access to the data entrusted to you. For class-leading security, this means logging not just data actions, but all changes in your policies that protect data.
When considering the full scope of data needs and security requirements, we can see that just using built-in tools, even in their best-practices configurations, leaves organizations at risk. A modern approach needs to address all three of these requirements.
How Cyral can protect your S3 buckets
Cyral’s lightweight interception service, the Data Cloud sidecar, is one tool for doing this. With a Cyral sidecar in place, every HTTP request that goes to S3 must pass through the Cyral sidecar before it hits the destination. By intercepting all requests, services like Cyral’s can help achieve the goals outlined above:
- For access control, Cyral’s class-leading identity federation means that your SSO authentication and role-based privileges work consistently across all types of repositories.
- For monitoring and alerts, Cyral’s lightweight agent intercepts every command sent to your repositories, and its multi-repository policies let you enforce policies based on the type of data, rather than where it happens to live. For example, you can enforce a Cyral policy to protect credit card data in the same way across your S3, RDS, and Snowflake repositories.
- For auditing, Cyral is one of the few solutions that can link every event in your Data Cloud—regardless of repository type—back to the SSO user who initiated it. Precise control over what gets logged, along with integrations with popular logging platforms, mean you can strike the right balance in terms of how much history you keep.
Final thoughts: S3 in the context of your Data Cloud
The biggest problem with protecting against data exfiltration is that S3 is just one of the many services that store your organization’s sensitive data. Most data engineering teams use a combination of data repositories like S3, RDS, Redshift, Snowflake, Kafka, and others to accomplish their goals. Engineers often need access to retrieve data from or write data to these repositories while they design models, test services, and troubleshoot issues, either directly or through applications.
Given the large number of repositories and applications involved, it becomes unwieldy for organizations to manually grant access to the various users and services, and retire those privileges when the job is done. Beyond that, it’s difficult to monitor activity in a consistent way across different types of repositories and to identify and block unwanted activity in a comprehensive manner.
But there are effective answers, and a consistent, policy-based interception approach, like Cyral, is worth considering when protecting a wide range of data repository types.
Read our next blog post, where we discuss the technical details of the attack patterns listed above and show what can be done to neutralize these types of attacks on S3.
Also, if you’d like to try out Cyral to protect your S3 buckets now, you can get started in minutes with our 14-Day Free Trial.