In 1981, Barry Boehm published Software Engineering Economics, a look at the relative costs to fix an error at various stages of software development. In this seminal paper, Boehm quantified that for each delay in fixing an error, as measured over the progressive phases of a standard waterfall development model, the relative costs increase greatly. Since publication, numerous studies have debated the exact numbers put forth by Boehm, but they all generally agree that finding errors sooner decreases total costs. It is this holy grail that we all strive for now, of finding and fixing errors earlier in our agile workflows.
In today’s world of software and service delivery, we often focus on not just the monetary cost, but also the cost associated with agility and ability to deliver features more quickly, more easily, and in a more stable environment. Facebook famously changed their motto from “Move fast and break things” to “Move fast with stable infrastructure.” As with Facebook, so has the sentiment changed industry-wide to embrace stability.
Security is a prerequisite for stable infrastructure
The words, “stable infrastructure,” encompass more than just a reliable daily cadence of delivery. To deliver stable infrastructure, you must be able to deliver secure infrastructure. The best way to achieve both of these goals is to implement a Security as Code program across your entire development pipeline. This means implementing automated testing, library validation, and static analysis so that they’re a routine part of your organization’s daily operations.
Maximize test coverage
One of the first steps you can take in your Security as Code journey should be to increase source code test coverage. If you have test coverage of the business logic of your authentication and/or authorization mechanisms, then you’ve already started your journey. Work with your development teams to expand on this type of code coverage for other security-focused scenarios. One area you could review is your permissions model, in order to make sure you have full coverage for both positive control (limits that determine what actions can be done) and negative control (limits that prevent unwanted actions). In other words, you should test for both the presence of permissions and lack of permissions. These tests should be high signal and should never produce the wrong result unless something is very wrong.
Find vulnerable libraries
Another key area to focus on in your testing strategy is in regards to your third party library usage. Today’s coding practices often include extensive use of both directly included third party libraries as well as libraries that, in turn, include other libraries. To keep up with these practices, you should be sure to include vulnerability scanning of those libraries on every pull request. In 2019, Snyk reported an “88% increase in application library vulnerabilities over two years” in their annual state of open source report. Libraries should be reviewed automatically for both licensing issues and security vulnerabilities. There are a number of tools that can help you with this, and these are often built into your code repository or package repository manager. Efforts by organizations including the US NIST to establish standards for a Software Bill of Materials (SBoM) will reduce third-party library risk in the future, but it is still early.
Locate risks with static analysis
Static analysis tools are routinely used to monitor adherence to style and coding guidelines. When you adopt a Security as Code approach, you’ll want to implement a security-focused static analysis checker as well. There are a number of free and open source projects as well as commercial solutions to get you started here. Facebook recently open sourced their internal static analysis tool called Pysa for Python. According to Facebook, “in the first half of 2020, Pysa detected 44% of all security bugs in Instagram’s server-side Python code.” Tools like Pysa allow the security team to meet the developers where they already are, working with them directly in source code and testing. As you begin working with your dev team, you will find that many are more than happy to take up the mantle to increase code coverage.
Security culture that’s embedded in all teams, in all phases
For the security team, the next step in your Security as Code journey is to begin to develop a guidebook and your own trusted libraries for developers to reuse across your various services. You can start to develop code guardrails that embody the security standards you wish to see across your organization.
On the human side of the equation, it often falls to the security team to set the security standards and best-practice examples the rest of the organization will follow. Ideally, the security team should be built out so that it becomes a directly contributing engineering team like any other. This allows the security team to not only build security guardrails and offer guidance, but also to set examples in day-to-day work that show the effectiveness of a healthy security engineering culture
For example, Netflix has championed the idea of paved roads that speed development. In 2016 they mentioned that for one of their projects, it took only 16 minutes from code check-in to a multi-region deployment. Prevent problems from happening in the first place by providing vetted and secure libraries and infrastructure that your developers can take full advantage of.
As you move up the stack and across the pipeline, one of the key areas you can focus on is earlier, more proactive vulnerability scanning as part of a vulnerability management program. This work starts in your testing and staging environments. The testing and staging phases offer the best opportunities to scan for vulnerabilities and incorporate fixes as part of your normal testing cadence. In particular:
- Library and package upgrades can and should be simply treated as new features or code updates.
- Containers and golden images should include your updated application code as soon as it is available
- Containers and golden images should be updated automatically with full security updates as soon as they’re available.
Timely scans and updates help avoid delays. According to the Stackrox Winter 2020 The State of Container and Kubernetes Security report a full 44% of companies said “they’ve slowed or halted application deployment into production due to security concerns”. Most of these slowdowns could have been avoided by exposing issues as soon as possible and by adhering to a practice of automatic updates in the testing environment.
Replicate real-world usage
To catch issues sooner, you need to replicate your production environment as fully as possible. One of the operational risks we’ve seen repeatedly is a staging environment that does not include all of the same monitoring, checks, interactions with internal tools, and other attributes of the production environment.
Once you’ve faithfully replicated your production architecture in a staging environment, the next important element in replicating real-world usage lies in how you simulate production traffic and usage patterns. Doing this requires an in-depth understanding of production traffic and the ability either to replay historical traffic (if it maps to expected traffic) or to build simulations that can faithfully replicate the traffic.
Keep security policies consistent across dev, staging, and production
Staging environments don’t need to replicate only your production architecture and expected production traffic; they should also adhere to the same security policies and access rights (for internal and simulated external users) that you have in your production environment. Many times, the only metrics monitored in a canary deployment are those related to performance, but security issues are just as important to check before promotion. If you can deploy and monitor your security policies so that your staging environment’s security posture matches what you have in production, you can be sure to catch and address problems before each release is rolled out.
Build in security now to prevent surprises later
The investments in testing early and often in your source code pipeline will pay great dividends in a more secure and stable product. It has long been known that fixing errors earlier will be orders of magnitude cheaper, yet, for too long many haven’t been willing to make the necessary investments in the testing infrastructure to make this become a reality. If you’re already testing in production, bring that testing earlier into your testing and staging environment to find problems early and often.
The goal of predictable and predictably secure infrastructure is within reach. By building security procedures and testing into your organization’s product delivery infrastructure at all stages—in your source code pipeline—your team can ensure fewer surprises, more uptime, and a more predictable experience for customers and employees.
Image via the OpenIDEO Cybersecurity Visuals Challenge by Elio Reichert.