|Type||Description||Tested K8s Platform|
|OpenEBS||Kill the cstor pool pod and check if gets created again||GKE, Konvoy(AWS), Packet(Kubeadm), Minikube, OpenShift(Baremetal)|
Note: In this example, we are using nginx as stateful application that stores static pages on a Kubernetes volume.
- Ensure that the Litmus Chaos Operator is running by executing
kubectl get podsin operator namespace (typically,
litmus). If not, install from here
- Ensure that the
openebs-pool-pod-failureexperiment resource is available in the cluster. If not, install from here
- The DATA_PERSISTENCE can be enabled by provide the application's info in a configmap volume so that the experiment can perform necessary checks. Currently, LitmusChaos supports data consistency checks only for MySQL and Busybox.
For MYSQL data persistence check create a configmap as shown below in the application namespace (replace with actual credentials):
apiVersion: v1 kind: ConfigMap metadata: name: openebs-pool-pod-failure data: parameters.yml: | dbuser: root dbpassword: k8sDem0 dbname: test
- For Busybox data persistence check create a configmap as shown below in the application namespace (replace with actual credentials):
apiVersion: v1 kind: ConfigMap metadata: name: openebs-pool-pod-failure data: parameters.yml: | blocksize: 4k blockcount: 1024 testfile: exampleFile
- Ensure that the chaosServiceAccount used for the experiment has cluster-scope permissions as the experiment may involve carrying out the chaos in the
openebsnamespace while performing application health checks in its respective namespace.
- Application pods are healthy before chaos injection
- Application writes are successful on OpenEBS PVs
- Application pods are healthy post chaos injection
- OpenEBS Storage target pods are healthy
If the experiment tunable DATA_PERSISTENCE is set to 'enabled':
- Application data written prior to chaos is successfully retrieved/read
- Database consistency is maintained as per db integrity check utils
- This scenario validates the behaviour of stateful applications and OpenEBS data plane upon forced termination of the target pod
- Target pool pod are killed using the litmus chaoslib random pod delete
- Can test the stateful application's resilience to momentary iSCSI connection loss
- Pod delete is achieved using the
Steps to Execute the Chaos Experiment
This Chaos Experiment can be triggered by creating a ChaosEngine resource on the cluster. To understand the values to be provided in a ChaosEngine specification, refer Getting Started
Follow the steps in the sections below to prepare the ChaosEngine & execute the experiment.
Use this sample RBAC manifest to create a chaosServiceAccount in the desired (app)namespace. This example consists of the minimum necessary cluster role permissions to execute the experiment.
Sample Rbac Manifest
apiVersion: v1 kind: ServiceAccount metadata: name: pool-pod-failure-sa namespace: default labels: name: pool-pod-failure-sa # Source: openebs/templates/clusterrole.yaml apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRole metadata: name: pool-pod-failure-sa labels: name: pool-pod-failure-sa rules: - apiGroups: ["","apps","litmuschaos.io","batch","extensions","storage.k8s.io","openebs.io"] resources: ["pods","jobs","deployments","pods/log","events","configmaps","secrets","replicasets","persistentvolumeclaims","storageclasses","cstorvolumereplicas","chaosexperiments","chaosresults","chaosengines"] verbs: ["create","list","get","patch","update","delete"] - apiGroups: [""] resources: ["nodes"] verbs: ["get","list"] apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRoleBinding metadata: name: pool-pod-failure-sa labels: name: pool-pod-failure-sa roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: pool-pod-failure-sa subjects: - kind: ServiceAccount name: pool-pod-failure-sa namespace: default
- Provide the application info in
- Provide the auxiliary applications info (ns & labels) in
- Override the experiment tunables if desired
Supported Experiment Tunables
|APP_PVC||The PersistentVolumeClaim used by the stateful application||Mandatory||PVC must use OpenEBS cStor storage class|
|TOTAL_CHAOS_DURATION||Amount of soak time for I/O post pod kill||Optional||Defaults to 600 seconds|
|DEPLOY_TYPE||Type of Kubernetes resource used by the stateful application||Optional||Defaults to `deployment`. Supported: `deployment`, `statefulset`|
|DATA_PERSISTENCE||Flag to perform data consistency checks on the application||Optional||Default value is disabled (empty/unset). It supports only `mysql` and `busybox`. Ensure configmap with app details are created|
Sample ChaosEngine Manifest
apiVersion: litmuschaos.io/v1alpha1 kind: ChaosEngine metadata: name: target-chaos namespace: default spec: # It can be true/false annotationCheck: 'false' # It can be active/stop engineState: 'active' #ex. values: ns1:name=percona,ns2:run=nginx auxiliaryAppInfo: '' appinfo: appns: 'default' applabel: 'app=nginx' appkind: 'deployment' chaosServiceAccount: pool-pod-failure-sa monitoring: false # It can be delete/retain jobCleanUpPolicy: 'delete' experiments: - name: openebs-pool-pod-failure spec: components: env: - name: FORCE value: 'true' - name: APP_PVC value: 'demo-nginx-claim' - name: DEPLOY_TYPE value: 'deployment'
Create the ChaosEngine Resource
Create the ChaosEngine manifest prepared in the previous step to trigger the Chaos.
kubectl apply -f chaosengine.yml
Watch Chaos progress
View pod restart count by setting up a watch on the pods in the OpenEBS namespace
watch -n 1 kubectl get pods -n <application-namespace>
Check Chaos Experiment Result
Check whether the application is resilient to the pool pod failure, once the experiment (job) is completed. The ChaosResult resource naming convention is:
kubectl describe chaosresult target-chaos-openebs-pool-pod-failure -n <application-namespace>
OpenEBS Pool Pod Failure Demo [TODO]
- A sample recording of this experiment execution is provided here.