|Type||Description||Tested K8s Platform|
|OpenEBS||Induce latency into the cStor target/Jiva controller container||GKE, EKS, Konvoy(AWS), Packet(Kubeadm), Minikube, OpenShift(Baremetal)|
Note: In this example, we are using nginx as stateful application that stores static pages on a Kubernetes volume.
Ensure that the Kubernetes Cluster uses Docker runtime
Ensure that the Litmus Chaos Operator is running by executing
kubectl get podsin operator namespace (typically,
litmus). If not, install from here
Ensure that the
openebs-target-network-delayexperiment resource is available in the cluster. If not, install from here
The DATA_PERSISTENCE can be enabled by provide the application's info in a configmap volume so that the experiment can perform necessary checks. Currently, LitmusChaos supports data consistency checks only for MySQL and Busybox.
- For MYSQL data persistence check create a configmap as shown below in the application namespace (replace with actual credentials):
apiVersion: v1 kind: ConfigMap metadata: name: openebs-target-network-delay data: parameters.yml: | dbuser: root dbpassword: k8sDem0 dbname: test
- For Busybox data persistence check create a configmap as shown below in the application namespace (replace with actual credentials):
apiVersion: v1 kind: ConfigMap metadata: name: openebs-target-network-delay data: parameters.yml: | blocksize: 4k blockcount: 1024 testfile: exampleFile
Ensure that the chaosServiceAccount used for the experiment has cluster-scope permissions as the experiment may involve carrying out the chaos in the
openebsnamespace while performing application health checks in its respective namespace.
- Application pods are healthy before chaos injection
- Application writes are successful on OpenEBS PVs
- Stateful application pods are healthy post chaos injection
- OpenEBS Storage target pods are healthy
If the experiment tunable DATA_PERSISTENCE is set to 'enabled':
- Application data written prior to chaos is successfully retrieved/read
- Database consistency is maintained as per db integrity check utils
- This scenario validates the behaviour of stateful applications and OpenEBS data plane upon high latencies/network delays in accessing the storage controller pod
- Injects latency on the specified container in the controller pod by staring a traffic control
netemrules to add egress delays
- Latency is injected via pumba library with command
pumba netem delayby passing the relevant network interface, latency, chaos duration and regex filter for container name
- Can test the stateful application's resilience to loss/slow iSCSI connections
- Network delay is achieved using the
pumbachaos library in case of docker runtime. Support for other other runtimes via tc direct invocation of
tcwill be added soon.
- The desired lib image can be configured in the env variable
Steps to Execute the Chaos Experiment
This Chaos Experiment can be triggered by creating a ChaosEngine resource on the cluster. To understand the values to provide in a ChaosEngine specification, refer Getting Started
Follow the steps in the sections below to prepare the ChaosEngine & execute the experiment.
Use this sample RBAC manifest to create a chaosServiceAccount in the desired (app)namespace. This example consists of the minimum necessary cluster role permissions to execute the experiment.
Sample Rbac Manifest
apiVersion: v1 kind: ServiceAccount metadata: name: target-network-delay-sa namespace: default labels: name: target-network-delay-sa app.kubernetes.io/part-of: litmus # Source: openebs/templates/clusterrole.yaml apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: target-network-delay-sa labels: name: target-network-delay-sa app.kubernetes.io/part-of: litmus rules: - apiGroups: ["","apps","litmuschaos.io","batch","extensions","storage.k8s.io"] resources: ["pods","pods/exec","pods/log","events","jobs","configmaps","secrets","services","persistentvolumeclaims","storageclasses","persistentvolumes","chaosexperiments","chaosresults","chaosengines"] verbs: ["create","list","get","patch","update","delete"] apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: target-network-delay-sa labels: name: target-network-delay-sa app.kubernetes.io/part-of: litmus roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: target-network-delay-sa subjects: - kind: ServiceAccount name: target-network-delay-sa namespace: default
- Provide the application info in
- Provide the auxiliary applications info (ns & labels) in
- Override the experiment tunables if desired in
- Provide the configMaps and secrets in
experiments.spec.components.configMaps/secrets, For more info refer Sample ChaosEngine
- To understand the values to provided in a ChaosEngine specification, refer ChaosEngine Concepts
Supported Experiment Tunables
|APP_PVC||The PersistentVolumeClaim used by the stateful application||Mandatory||PVC may use either OpenEBS Jiva/cStor storage class|
|LIB_IMAGE||The chaos library image used to inject the latency||Optional||Defaults to `gaiaadm/pumba:0.6.5`. Supported: `docker : gaiaadm/pumba:0.6.5`|
|CONTAINER_RUNTIME||The container runtime used in the Kubernetes Cluster||Optional||Defaults to `docker`. Supported: `docker`|
|TARGET_CONTAINER||The container into which delays are injected in the storage controller pod||Optional||Defaults to `cstor-istgt`|
|TOTAL_CHAOS_DURATION||Total duration for which network latency is injected||Optional||Defaults to 60 seconds|
|DEPLOY_TYPE||Type of Kubernetes resource used by the stateful application||Optional||Defaults to `deployment`. Supported: `deployment`, `statefulset`|
|TC_IMAGE||Image used for traffic control in linux||Optional||default value is `gaiadocker/iproute2`|
|NETWORK_DELAY||Egress delay injected into the target container||Optional||Defaults to 60000 milliseconds (60s)|
|DATA_PERSISTENCE||Flag to perform data consistency checks on the application||Optional||Default value is disabled (empty/unset). It supports only `mysql` and `busybox`. Ensure configmap with app details are created|
|INSTANCE_ID||A user-defined string that holds metadata/info about current run/instance of chaos. Ex: 04-05-2020-9-00. This string is appended as suffix in the chaosresult CR name.||Optional||Ensure that the overall length of the chaosresult CR is still < 64 characters|
Sample ChaosEngine Manifest
apiVersion: litmuschaos.io/v1alpha1 kind: ChaosEngine metadata: name: target-chaos namespace: default spec: # It can be active/stop engineState: 'active' #ex. values: ns1:name=percona,ns2:run=nginx auxiliaryAppInfo: '' appinfo: appns: 'default' applabel: 'app=nginx' appkind: 'deployment' chaosServiceAccount: target-network-delay-sa experiments: - name: openebs-target-network-delay spec: components: env: - name: TARGET_CONTAINER value: 'cstor-istgt' - name: APP_PVC value: 'demo-nginx-claim' - name: DEPLOY_TYPE value: 'deployment' - name: NETWORK_DELAY value: '30000' - name: TOTAL_CHAOS_DURATION value: '60' # in seconds
Create the ChaosEngine Resource
Create the ChaosEngine manifest prepared in the previous step to trigger the Chaos.
kubectl apply -f chaosengine.yml
If the chaos experiment is not executed, refer to the troubleshooting section to identify the root cause and fix the issues.
Watch Chaos progress
View network delay in action by setting up a ping to the storage controller in the OpenEBS namespace
Watch the behaviour of the application pod and the OpenEBS data replica/pool pods by setting up in a watch on the respective namespaces
watch -n 1 kubectl get pods -n <application-namespace>
Check Chaos Experiment Result
Check whether the application is resilient to the target network delays, once the experiment (job) is completed. The ChaosResult resource naming convention is:
kubectl describe chaosresult target-chaos-openebs-target-network-delay -n <application-namespace>
- If the verdict of the ChaosResult is
Fail, and/or the OpenEBS components do not return to healthy state post the chaos experiment, then please refer the OpenEBS troubleshooting guide for more info on how to recover the same.
OpenEBS Target Network Delay Demo [TODO]
- A sample recording of this experiment execution is provided here.