Version: Next

Construct chaos experiment YAML without ChaosCenter

Chaos Experiment is a set of different operations coupled together to achieve desired chaos impact on a Kubernetes Cluster.

A basic chaos experiment consists of these steps:

Install ChaosExperiment CR
Install ChaosEngine CR
Cleanup Chaos resources

Before we begin

To construct a Chaos Experiment without ChaosCenter, make sure you are aware of Chaos Experiment, ChaosEngine CR and the different steps present in it.

Steps to construct a chaos experiment

LitmusChaos leverages the popular GitOps tool Argo to achieve this goal. Argo enables the orchestration of different chaos faults together in the form of a single chaos experiment which is extremely simple and efficient to setup and use.

The structure of a chaos experiment is similar to that of a Kubernetes Object. It consists of the mandatory fields like apiVersion, kind, metadata, spec.

Few additional terms in an Argo chaos experiments are:

Template : It consists of different steps with their specific operations.

      templates:
    - name: custom-chaos
      steps:
        - - name: install-chaos-experiments
            template: install-chaos-experiments
        - - name: pod-delete
            template: pod-delete
        - - name: revert-chaos
            template: revert-chaos

Steps : It is a single step inside a chaos experiment which runs a container based on the input parameters. These can also be sequenced parallely.

steps:
  - - name: install-chaos-experiments
      template: install-chaos-experiments
  - - name: pod-delete
      template: pod-delete
    - name: pod-cpu-hog
      template: pod-cpu-hog
  - - name: revert-chaos
      template: revert-chaos

Entrypoint : The first step that executes in a chaos experiment is called its entrypoint.

entrypoint: custom-chaos

Here, the template with the name custom-chaos will be executed first.

Artifacts : Artifacts are defined as the files saved by the containers in each step.

-  name: install-chaos-experiments
   inputs:
     artifacts:
       - name: pod-delete
         path: /tmp/pod-delete.yaml
         raw:
           data: >
             apiVersion: litmuschaos.io/v1alpha1

             description:
               message: |...

Ensuring Your Workflow is Recognized by the Argo Workflow Controller

When applying a Workflow manually without ChaosCenter, it's crucial to include the workflows.argoproj.io/controller-instanceid label in the manifest. This label helps Argo Workflow controller identify and reconcile the Workflow upon its creation. The instanceID value can be found in the workflow-controller-configmap under the instanceID key.

Once the chaos experiment is constructed, it should look like this:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: pod-delete-experiment
  namespace: litmus
  labels:
     workflows.argoproj.io/controller-instanceid: 86a4f130-d99b-4e91-b34b-8f9eee22cb63
spec:
  arguments:
    parameters:
      - name: adminModeNamespace
        value: litmus
  entrypoint: custom-chaos
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
  serviceAccountName: argo-chaos
  templates:
    - name: custom-chaos
      steps:
        - - name: install-chaos-experiments
            template: install-chaos-experiments
        - - name: pod-delete
            template: pod-delete
        - - name: revert-chaos
            template: revert-chaos
    - name: install-chaos-experiments
      inputs:
        artifacts:
          - name: pod-delete
            path: /tmp/pod-delete.yaml
            raw:
              data: >
                apiVersion: litmuschaos.io/v1alpha1

                description:
                  message: |
                    Deletes a pod belonging to a deployment/statefulset/daemonset
                kind: ChaosExperiment

                metadata:
                  name: pod-delete
                  labels:
                    name: pod-delete
                    app.kubernetes.io/part-of: litmus
                    app.kubernetes.io/component: chaosexperiment
                    app.kubernetes.io/version: 3.0.0
                spec:
                  definition:
                    scope: Namespaced
                    permissions:
                      - apiGroups:
                          - ""
                          - apps
                          - apps.openshift.io
                          - argoproj.io
                          - batch
                          - litmuschaos.io
                        resources:
                          - deployments
                          - jobs
                          - pods
                          - pods/log
                          - replicationcontrollers
                          - deployments
                          - statefulsets
                          - daemonsets
                          - replicasets
                          - deploymentconfigs
                          - rollouts
                          - pods/exec
                          - events
                          - chaosengines
                          - chaosexperiments
                          - chaosresults
                        verbs:
                          - create
                          - list
                          - get
                          - patch
                          - update
                          - delete
                          - deletecollection
                    image: litmuschaos/go-runner:3.0.0
                    imagePullPolicy: Always
                    args:
                      - -c
                      - ./experiments -name pod-delete
                    command:
                      - /bin/bash
                    env:
                      - name: TOTAL_CHAOS_DURATION
                        value: "15"
                      - name: RAMP_TIME
                        value: ""
                      - name: FORCE
                        value: "true"
                      - name: CHAOS_INTERVAL
                        value: "5"
                      - name: PODS_AFFECTED_PERC
                        value: ""
                      - name: LIB
                        value: litmus
                      - name: TARGET_PODS
                        value: ""
                      - name: SEQUENCE
                        value: parallel
                    labels:
                      name: pod-delete
                      app.kubernetes.io/part-of: litmus
                      app.kubernetes.io/component: experiment-job
                      app.kubernetes.io/version: 3.0.0
      container:
        args:
          - kubectl apply -f /tmp/pod-delete.yaml -n
            {{workflow.parameters.adminModeNamespace}} |  sleep 30
        command:
          - sh
          - -c
        image: litmuschaos/k8s:latest
    - name: pod-delete
      inputs:
        artifacts:
          - name: pod-delete
            path: /tmp/chaosengine-pod-delete.yaml
            raw:
              data: |
                apiVersion: litmuschaos.io/v1alpha1
                kind: ChaosEngine
                metadata:
                  namespace: "{{workflow.parameters.adminModeNamespace}}"
                  generateName: pod-delete
                  labels:
                    instance_id: 86a4f130-d99b-4e91-b34b-8f9eee22cb63
                spec:
                  appinfo:
                    appns: default
                    applabel: app=nginx
                    appkind: deployment
                  jobCleanUpPolicy: retain
                  engineState: active
                  chaosServiceAccount: litmus-admin
                  experiments:
                    - name: pod-delete
                      spec:
                        components:
                          env:
                            - name: TOTAL_CHAOS_DURATION
                              value: "30"
                            - name: CHAOS_INTERVAL
                              value: "10"
                            - name: FORCE
                              value: "false"
                            - name: PODS_AFFECTED_PERC
                              value: ""
      container:
        args:
          - -file=/tmp/chaosengine-pod-delete.yaml
          - -saveName=/tmp/engine-name
        image: litmuschaos/litmus-checker:latest
    - name: revert-chaos
      container:
        image: litmuschaos/k8s:latest
        command:
          - sh
          - -c
        args:
          - "kubectl delete chaosengine -l 'instance_id in
            (86a4f130-d99b-4e91-b34b-8f9eee22cb63, )' -n
            {{workflow.parameters.adminModeNamespace}} "

Install Experiment

ChaosExperiment CR:
The install-experiment step consists of ChaosExperiment CR in its artifact. ChaosExperiment CR is the heart of LitmusChaos and contains the low-level execution information. They serve as off-the-shelf templates that one needs to "pull" (install on the cluster) before including them as part of chaos run against any target applications (the binding being defined in the ChaosEngine). The experiments are installed on the cluster as Kubernetes custom resources and are designed to hold granular details of the experiment such as image, library, necessary permissions, chaos parameters (set to their default values). Most of the ChaosExperiment parameters are essentially tunables that can be overridden from the ChaosEngine resource.
ChaosEngine CR:
The ChaosEngine is the main user-facing chaos custom resource with a namespace scope and is designed to hold information around how the chaos experiments are executed. It connects an application instance with one or more chaos experiments while allowing the users to specify run level details (override experiment defaults, provide new environment variables and volumes, options to delete or retain experiment pods, etc.,). This CR is also updated/patched with the status of the chaos experiments, making it the single source of truth with respect to the chaos.

Resources

The ChaosExperiment CR and ChaosEngine CR of different experiments are available at ChaosHub.

Learn More

What are the different Probes

Before we begin​

Steps to construct a chaos experiment​

Ensuring Your Workflow is Recognized by the Argo Workflow Controller​

Install Experiment​

ChaosExperiment CR:​

ChaosEngine CR:​

Resources​

Learn More​