May 27, 2020
❓ *Despite doing everything right, the image has been compromised during runtime and started to show suspicious activity!
🐳 *How does container monitoring solutions take a decision and say:
Hey, this is bad behaviour! or
Hey, this is good and this is bad?
If those questions interest you, please grab your coffee ☕ and stick with me for another five minutes. I’ll make sure that you receive something meaningful from our conversation!
The aim of this article is to give an overview of runtime monitoring and security. We should never use runtime protection as a replacement for any other static up-front security practices: Attack prevention is always preferable to attack detection.
Everything with Docker, is set of processes and the way of implementing tags!
From a security point of view, Docker ensures that applications that are running on containers are completely segregated and isolated from each other, granting you complete control over traffic flow and management.
No Docker container can look into processes running inside another container.
From an architectural point of view, each container gets its own set of resources ranging from processing to network stacks.
Here’s a quick check:
Docker containers are minimal Docker containers are task-specific Docker containers are isolated Docker containers are reproducible
It’s the resource isolation magic features of the Linux kernel such as
Cgroups = limits how much you can use (Memory | CPU) Namespaces = limits what you can see (Process ID)
The act of monitoring a scaled out, enterprise production environment running in containers is where you will truly appreciate the complexity of the data challenge when it comes to
operating containers in production.
Containers are designed to be simple, that means typically a single process with only the required dependencies and libraries, and often as part of a microservices app, so performing a single task.
Some things you would want to monitor:
• Resource usage (CPU/Memory/Disk) • Network activity • File I/O activity • Errors/faults • Application activity
With the simplicity in design, it is relatively easy to define a behavior pattern through strict whitelists that detect any action that deviates from the norm.
Sysdig provides a single, unified data platform that offers key operational capabilities for containers and microservices.
Here are a few of the key questions that will become incredibly challenging to answer in your new cloud-native environment:
This is with reference to sysdig’s architecture document.
What’s the response time of the Cassandra service, which is spread across three data centers and 45 containers?
Can we proactively alert in case one container has a different performance profile compared to its peers?
Can we break down a downtime issue to see both application and systems issues to quickly solve the problem and prevent future occurrences?
How effectively are we using compute resources? Which services will require additional resources, and when?
How do you see what’s happening across clouds, clusters and regions to ensure you’re meeting your business goals?
How can you identify the software in production where new vulnerabilities have been detected?
Do we see abnormal behaviors in any services that may indicate compromise?
Are we meeting CIS and PCI compliance requirements for our infrastructure and on a per-microservice basis?
What happened during that security incident? Can we replay it, even if the containers involved have already been killed off?
Image source: sysdig
Sysdig choses to select per-host agents for monitoring, that drastically reduces resource consumption of monitoring agents and requires no modification to application code. It does, however, require a privileged container and a kernel module. That technical approach is called
Csysdig leverages the sysdig collection system, but exports it in a user interface that you can imagine as a combination of many popular command line monitoring tools: tools like strace, tcpdump, htop, iftop and lsof.
Just like sysdig, csysdig natively supports containers - cleanly and elegantly. Run csysdig in its own container, or directly on the host machine, for full visibility into all the containers running on the machine.
Csysdig is part of the sysdig package, so it will be included when you install sysdig on your machine.
View the list of containers running on the machine and their resource usage
View the CPU usage of the processes running inside the
sysdig -pc -c topprocs_cpu container.name=example
Show all the interactive commands executed inside the
sysdig -pc -c spy_users container.name=example
View the top files in terms of I/O bytes inside the
sysdig -pc -c topfiles_bytes container.name=example
Detect bad behaviour: 🚀 Rules!
We will take an example of another open source project by Sysdig that is
Sysdig Falcoand discuss this further.
Sysdig Falco: we can call it an intrusion detection system based on system calls.
Falco is more of an open source project for intrusion and abnormality detection for Cloud Native platforms such as Kubernetes or Docker. Using Falco you can create a Docker security policy to detect attacks and anomalous activity on production environments, in real-time, so you can react to unknown and 0-day vulnerabilities, attacks caused by weak or leaked credentials or compliance breaches.
In simple words, falco is just a container that will run alongside your deployment and will act as a secret watcher.
Falco can detect and alert on any behavior that involves making Linux system calls. Falco alerts can be triggered by the use of specific system calls, their arguments, and by properties of the calling process. For example, Falco can easily detect incidents including but not limited to:
/proc, from the host.
Using the required keys, we can write a very basic rule:
- rule: Detect bash in a container desc: You shouldn’t have a shell run in a container condition: container.id != host and proc.name = bash output: Bash ran inside a container (user=%user.name command=%proc.cmdline %container.info) priority: INFO
While Falco comes with 25 rules for common best practices, you’ll quickly want to start writing custom rules to match your operational and application requirements. If you are curious to learn more, please do follow the links in
To recap- the overall intention here was to talk about monitoring during the runtime and that goes in 2 phases:
Observe and take a decision (rules).
We have used Sysdig and Sysdig Falco here, just for example purposes. There are a bunch of monitoring tools that could give you protection during runtime and can act as a source of truth.
Although, as we discussed- please do not use runtime protection as a replacement for any other static up-front security practices:
Attack prevention is always preferable to attack detection.