Observability
Observability, in real life, is your ability to see what something is doing. When it comes to securing applications, observability often means you can see what an application is doing. How many requests is it receiving? Who is using it? How much memory is it consuming?
There are two main ways most applications provide observibility – through logs, traces, and through metrics. These provide raw data about an application.
Logs
Logs are lines of text that an application outputs describing what it’s doing. They typically carry a timestamp, and have a level like info, warn, or error.
Logs are incredibly useful when researching a problem your application had. They provide rich, contextual details about what your application was doing when a problem occurred. Often, they can be provided to the application developers for assistance, or they can be used by your own support team to help your customers.
Traces
Over the last 10 or 20 years, microservice and container-based applications have grown in popularity over the more monolithic architectures of the past. With this network of interdependent applications, attribution becomes a problem. Which user ran this report? Who was at the keyboard when someone SSH’d into that server? When this request travelled through 3 services on 4 computers, did we lose track of how the original request tied to the report that was eventually run?
There have been many thoughts about this, like Google’s Dapper paper forwarding the idea of using tracing IDs in headers through all services. By combining tracing IDs with rich, relevant information at each step (like the ID of the user who initiated the call at the first server), it’s possible to attribute behavior directly back to a particular user.
This helps with debugging a problem the user had, researching a security incident around that user ID, or calculating the user’s bill at the end of the month.
Metrics
Metrics provide numeric values regarding important information about your application. Often, they lack context around attribution – you don’t know exactly why something is happening, but you see that it is.
An example metric might be a “request count”. On its own, a request count isn’t that useful, it just goes up forever.
But having a request count with a timestamp can be used to notice that suddenly, you’re receiving a much higher level of requests over 30 seconds than normal. Hey, you’re being attacked by somebody sending a lot of requests! Or, maybe you’re not sure if you’re being attacked, but you know you need to check right now.
Metrics also help you detect if your application is slowly using increasing memory (indicating a potential memory leak), or if it’s quickly using bursts of memory (users are running huge reports that threaten to take down your current server size). They give you data in aggregate.
Using Metrics in Real Life
Just having a bunch of numbers and text isn’t enough to help you by itself. That’s why most engineers who are launching critical applications combine metrics with dashboards. They also combine logs with a search engine, and also potentially a dashboard.
This is often one of the last steps an engineer takes before putting an application into production. At a production launch, sometimes there’s even a handoff of the application from one team to another, with observability being a key deliverable at handoff.
Some tools people use for consuming logs and metrics are DataDog, Splunk, Grafana, the ELK stack, and Logstash. Each different tool has its own feature set, but generally speaking, they visualize and search this data, and bring attention to important events.
Observability for the Data Cloud
While several solutions have been brought to market in the last few years to address observability at the application and infrastructure layer, it is still a challenge for databases, data warehouses and pipelines. Especially as organizations adopt a cloud-native architecture with ephemeral microservices communicating with SaaS-based data repositories, the need for this observability has become more important than before.
Simply turning on logging inside databases slows down performance, and it is not feasible to use agents for SaaS databases like RDS, Aurora, Atlas, etc. Yet logs, traces, and metrics must be captured to meet modern security standards.
Cyral has built an innovative, stateless interception technology that enables real-time logs, metrics and traces for all data activity without impact to performance and scalability providing unprecedented observability for the data cloud. To learn how Cyral’s patented technology can help your organization make this critical transformation, register for a demo.