The Chaos Operator is a Kubernetes Operator, which are nothing but custom-controllers with direct access to Kubernetes API that can manage the lifecycle of certain resources or applications, while always trying to ensure the resource is in the "desired state". The logic that ensures this is commonly called "reconcile" function.
The Chaos Operator is built using the popular Operator-SDK framework, which provides bootstrap support for new operator projects, allowing teams to focus on business/operational logic.
The Litmus Chaos Operator helps reconcile the state of the ChaosEngine, a custom resource that holds the chaos intent specified by a developer/devops engineer against a particular stateless/stateful Kubernetes deployment. The operator performs specific actions upon CRUD of the ChaosEngine, its primary resource. The operator also defines secondary resources (the engine runner pod and engine monitor service), which are created & managed by it in order to implement the reconcile functions.
Engine Runner Pod: The runner pod executes/spawns the experiment executors along with a prometheus exporter sidecar to collect the chaos metrics (Total number of experiments scheduled, experiments passed, failed as well as the individual experiment run status)
Engine Monitor Service: The monitor service exposes the /metrics endpoint to allow scrape functions by prometheus or other similar supported monitoring platforms.