Persistence: Scenario 2 Defense¶
- Still overworked
- Still can only do the bare minimum
- Uses the defaults when configuring systems
- Usually gets blamed for stability or security issues
- A week after the first incident, Blue gets paged at 3am because “website is slow again”.
- Blue, puzzled, takes another look.
- Blue decides to dust-off the résumé “just in case”.
Blue is paged again with the same message as last time. What is going on? Could this be the same problem again?
Identifying the Issue¶
Let's run some basic checks again to see if we can find random workloads:
kubectl get pods --all-namespaces
There does not appear to be any unusual workloads running on our cluster.
Just to be sure, let's check our cluster's resource consumption:
kubectl top node
kubectl top pod --all-namespaces
So far, everything looks normal. What gives?
Hold on. We installed
falco last time and it is throwing us alerts in StackDriver.
In a new StackDriver window, let's run the query:
resource.type="k8s_container" resource.labels.container_name:"falco" jsonPayload.rule="Launch Privileged Container" OR jsonPayload.rule="Terminal shell in container"
We're looking for
container logs from
falco where triggered rules are privileged containers or interactive shells.
Huh. This is odd. A privileged
alpine container, but no other information to go off of? What can kubernetes cluster logs tell us about this
In a new StackDriver window, let's run this query:
So, we see a few things:
- A create event that was authorized with the
system:serviceaccount:dev:defaultserviceaccount in the
- A pod named
- The pod command is
nsenter --mount=/proc/1/ns/mnt -- /bin/bash
hostPIDis set to
This is not looking good. Can we see what this container did?
In a new StackDriver window, let's search for this
r00t container logs:
Wow. We can see someone was running commands from this container.
But wait, they can run docker commands? How can they talk to the docker on the host from the container? OH NO! They must have broken out of the container and by this point they're on the host!
bitcoinero container again must be what's causing slowness. But, they're trying to do something else.
They tried to create a pod, but failed. So, they created a Service and an Endpoint. They must be trying to open a backdoor of some sort to get back in later.
In cloud shell, let's check if those exist:
kubectl -n kube-system get svc,ep
That's one sneaky hacker, for sure. But, jokes on them, We're not using service mesh.
Let's delete that service (the endpoint will be deleted too):
kubectl -n kube-system delete svc/istio-mgmt
But, I want to know how did they get in in the first place?!?!?! The
create event authorized because of the
dev:default serviceaccount. So, what is in
dev namespace that led to someone taking over the entire host?
kubectl -n dev get pods
There is an
db, and a
dashboard. Wait a second! Could it be an exposed dashboard?
kubectl -n dev logs $(kubectl -n dev get pods -o name | grep dashboard) -c dashboard
kubectl -n dev logs $(kubectl -n dev get pods -o name | grep dashboard) -c authproxy
It is an exposed dashboard. That's how they got in. There is
GET /webshell in authproxy logs with the source IP.
We might want to revoke that serviceaccount token:
kubectl -n dev delete $(kubectl -n dev get secret -o name| grep default)
And perhaps disable the automatic mounting of serviceaccount tokens by setting
automountServiceAccountToken: false in the pod spec, if the dashboard doesn't need it.
But, how can we mitigate this further?
The attacker ran a privileged container, which they shouldn't have been able to. So, we should block that. I remember a talk at KubeCon this week about Open-Policy-Agent/Gatekeeper that gets deployed as an admission controller.
That should work because an admission controller is a piece of code that intercepts requests to the Kubernetes API server after the request is authenticated and authorized.
So, we should set two policies:
- Deny privileged containers.
- Allow only the images we expect to have in
First, let's apply Gatekeeper itself:
kubectl apply -f https://raw.githubusercontent.com/securekubernetes/securekubernetes/master/manifests/security2.yaml
Second, let's apply the policies. If you receive an error about
no matches for kind... in version ..., this means Gatekeeper has not kicked into gear yet. Wait a few seconds then re-apply policies:
kubectl apply -f https://raw.githubusercontent.com/securekubernetes/securekubernetes/master/manifests/security2-policies.yaml
Let's see if this actually works by trying to run some containers that violate these policies.
First, let's try to run privileged container:
kubectl apply -f - <<EOF apiVersion: v1 kind: Pod metadata: name: nginx labels: app: nginx spec: containers: - name: nginx image: nginx ports: - containerPort: 80 securityContext: privileged: true EOF
We see that Kubernetes denied this request for 2 reasons (not whitelisted image and privileged), as expected.
Let's try running a non-whitelisted image:
kubectl -n dev run alpine --image=alpine --restart=Never
We see that Kubernetes rejected this request again due to image not being whitelisted/allowed, as expected.
Can we still run pods that meet/satisfy the Gatekeeper policies? Let's find out:
kubectl -n dev run ubuntu --image=ubuntu --restart=Never
Yes, looks like we can run pods that satisfy the policies and requirements we set on our cluster.
Even though we applied Falco and Gatekeeper, we should not continue to use this cluster since it has been compromised. We should create a new cluster and re-deploy our applications there once we've hardened and secured it enough.