The first step in the creation of a new litmusbook is to clearly define the experiment, or in other words, identify the test intent, flow & its execution environment. This can be translated as understanding the following requirements:
Entry criteria of the test
For example, it may be necessary for the application under test (AUT) to be in “Running” state prior to executing the experiment steps and the storage health checks to be successful. In case of chaos experiments, this corresponds to identification of steady state & means to ascertain it.
Experiment/Test business logic
This consists of the main test procedure that corresponds to the user action- typically provisioning/other administrative/functional workflows OR failure injection/chaos of specific cluster components.
Exit Criteria of the test
Evaluation against success conditions based on the hypothesis around impact on service/system.Most of the times, this is successful (re)configuration in case of functional workflows or uninterrupted service availability in case of chaos. Data integrity verification could be an important post-test check.
A majority of the Litmus tests are written with “kubectl” as the main tool through which the experiment’s steps are executed. In some cases, the test business logic might need to make use of additional tools, for example injecting packet loss in the pod network necessitates use of pumba or at least, tc & netem.
Most of the Litmus experiments are designed to be purely platform agnostic. They can run on any Kubernetes cluster regardless of the underlying infrastructure, i.e.., cloud-providers or on-premise. However, there are some experiments which use provider-specific APIs (gcloud, awscli) to execute some steps. For example, disk failure tests, where the nature of underlying disk/block volume depends on the platform.
In such cases, different “utils” to achieve the requirement are created for different platforms, with these being selectively called based on user inputs for platform type.