Chaos Monkey Alternatives - Kubernetes
Kube Monkey
Kube-monkey is an open-source implementation of Chaos Monkey for use on Kubernetes clusters and written in Go. Like the original Chaos Monkey, Kube-monkey performs just one task: it randomly deletes Kubernetes pods within the cluster, as a means of injecting failure in the system and testing the stability of the remaining pods. It is based on pseudo-random rules, running at a pre-defined hour on weekdays to then build a schedule. Based on the generated schedule random pod targets that will be attacked and killed at a random time during that same day, although the time-range is configurable.
Kube-monkey will only terminate pods that have explicitly opted in by specifying certain Kube-monkey metadata labels
. The following illustrates the basic labels that can be specified to allow Kube-monkey to kill pods within the application.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: monkey-victim
namespace: app-namespace
labels:
kube-monkey/enabled: enabled
kube-monkey/identifier: monkey-victim
kube-monkey/mtbf: '2'
kube-monkey/kill-mode: "fixed"
kube-monkey/kill-value: 1
spec:
template:
metadata:
labels:
kube-monkey/enabled: enabled
kube-monkey/identifier: monkey-victim
# ...
Check out the GitHub repository for more information on installing and using Kube-monkey.
Engineering Chaos In Kubernetes with Gremlin
Gremlin’s Failure as a Service simplifies your Chaos Engineering workflow for Kubernetes by making it safe and effortless to execute Chaos Experiments across all nodes. As a distributed architecture Kubernetes is particularly sensitive to instability and unexpected failures. Gremlin can perform a variety of attacks on your Kubernetes clusters, including overloading CPU, memory, disk, and IO; killing nodes; modifying network traffic; and much more.
Check out this tutorial over on our community site to get started!
Kubernetes Pod Chaos Monkey
Kubernetes Pod Chaos Monkey is a Chaos Monkey-style tool for Kubernetes. The code itself is a local shell script that issues kubectl commands to occasionally locate and then delete Kubernetes pods. It targets a cluster based on the configurable NAMESPACE
and attempts to destroy a node every DELAY
seconds (defaulting to 30).
Since Kubernetes Pod Chaos Monkey is essentially a simple shell script it can be modified quite easily.
The Chaos Toolkit
The Chaos Toolkit is an open-source and extensible tool that is written in Python. It uses platform-specific drivers to connect to your Kubernetes cluster and execute Chaos Experiments. Every experiment performed by Chaos Toolkit is written in JSON using a robust API. Experiments are made up of a few key elements that are executed sequentially and allow the experiment to bail out if any step in the process fails.
-
Steady State Hypothesis: This element defines the normal or “steady” state of the system before the Method element is applied. Here we’ve defined a basic application with a steady state hypothesis titled “Service should have nodes.”
{ "version": "1.0.0", "title": "Gremlin EKS App", "description": "Gremlin EKS App", "tags": [ "service", "kubernetes" ], "steady-state-hypothesis": { "title": "Service should have nodes.", "probes": [ { "type": "probe", "name": "nodes_found", "tolerance": true, "provider": { "type": "python", "module": "chaosk8s.node.probes", "func": "get_nodes", "arguments": { "label_selector": "eks-gremlin-chaos" } } } ] }, }
-
Probe: A Probe is an element that collects system information, such as checking the health status of a node. Here we define a Probe element, which we’ve added to our steady state Probes list above, that calls the
get_nodes
function and retrieves the list of nodes for the specifiedlabel-selector
.{ "type": "probe", "name": "nodes_found", "tolerance": true, "provider": { "type": "python", "module": "chaosk8s.node.probes", "func": "get_nodes", "arguments": { "label_selector": "eks-gremlin-chaos" } } }
-
Action: An Action element performs an operation against the system, such as draining or deleting a node. In the example we call the
delete_nodes
function, passing the requiredlabel-selector
argument, and settingall
totrue
so we delete all nodes in the cluster.{ "type": "action", "name": "delete_all_nodes", "provider": { "type": "python", "module": "chaosk8s.node.actions", "func": "delete_nodes", "arguments": { "all": true, "label-selector": "eks-gremlin-chaos" } } }
-
Method: A Method element defines the series of Probe and Action elements that make up the experiment. Here we’re first using the
nodes_found
Probe to make sure nodes exist, executing thedelete_all_nodes
Action to delete all nodes in the cluster, then performing another explicit Probe to verify that no nodes remain."method": [ { "ref": "nodes_found" }, { "type": "action", "name": "delete_all_nodes", "provider": { "type": "python", "module": "chaosk8s.node.actions", "func": "delete_nodes", "arguments": { "all": true, "label-selector": "eks-gremlin-chaos" } } }, { "type": "probe", "name": "nodes_not_found", "tolerance": false, "provider": { "type": "python", "module": "chaosk8s.node.probes", "func": "get_nodes", "arguments": { "label_selector": "eks-gremlin-chaos" } } } ]
That’s the basics to begin experimenting using the Chaos Toolkit. Chaos Toolkit also has a fault injection plugin for Gremlin so you can easily perform attacks while utilizing the safety and security of the Gremlin platform.