Job Recruitment Website - Zhaopincom - Chaos Engineering

Chaos Engineering

Before I give you a formal introduction, let me tell you a little story. During a test conducted before the launch of Apollo 13, the oxygen in the No. 2 oxygen tank could not be completely emptied. Controllers decided to activate the heater in the tank to vent the remaining oxygen. This operation requires a 65-volt power supply, but the circuit inside the oxygen tank was originally designed for 28 volts. After 8 hours of heating, the temperature of the wire near the heater once reached 538 degrees, and the insulation layer of the wire was destroyed. In space, the wire shorted out, igniting the insulation and causing an explosion.

In daily development and operation and maintenance work, if you cannot discover and solve problems early, the problem will eventually "solve you" on weekends/middle of the night.

Evolution from the essence of system architecture It is a fragile system that slowly evolves, gradually strengthens, and eventually becomes more intelligent and becomes an anti-fragile system.

In 2010, Netflix internally developed a chaos experiment tool for randomly terminating EC2 instances on the AWS cloud: Chaos Monkey

In 2011, Netflix released its Monkey Army tool set: Simian Army

p>

In 2012, Netflix open sourced Simian Army built from Java to the community, including the Chaos Monkey V1 version

In 2014, Netflix began to officially recruit Chaos Engineers

In 2014, Netflix proposed Fault Injection Test (FIT) is used to control the explosion radius of chaos experiments by using the characteristics of microservice architecture

In 2015, Netflix released Chaos Kong to simulate AWS region (Region) interruption scenarios

< p> In 2015, Netflix and the community formally proposed the guiding ideology of chaos engineering – Principles of Chaos Engineering

In 2016, Kolton Andrus (former Netflix and Amazon Chaos Engineer) founded Gremlin to officially commercialize chaos experiment tools< /p>

In 2017, Netflix open source Chaos Monkey V2 version reconstructed by Golang, must be integrated with the CD tool Spinnaker to use

In 2017, Netflix released ChAP (Chaos Experiment Automated Platform), which can be regarded as Apply an enhanced version of Fault Injection Testing (FIT)

In 2017, a new book "Chaos Engineering" written by a former Netflix chaos engineer was published online

In 2017, Russell Miles founded ChaosIQ. And open sourced the chaostoolkit chaos experiment framework

Many students are afraid of performing experiments in a production environment because they are worried about the uncontrollable impact of failures. Implementing experiments is just a means, and building confidence in the system through experiments is our goal. How to reduce the impact of experiments will be explained in the "Minimizing Explosion Radius" section.

And Chaos Engineering does not suggest that you run it in the production environment from the beginning, but also gradually, starting from the test environment, slowly transitioning to the sandbox, and finally running the test in the production environment.

1. Architects can verify the currently designed architecture through chaos engineering

2. Development and operation and maintenance personnel can use chaos engineering to improve their ability to handle online cases Practice and improve experience

3. Testing can use chaos engineering to expose some online problems early, reduce the recurrence rate of faults, and transform passivity into initiative

4. UI can use chaos engineering to expose some online problems when these occur Problems, feedback on the interface, the product is not displayed, how the product is for users, whether it is acceptable or not, etc.

Quoted from Netflix

In the past year, chaos has The project discovered 2 major faults and 8 minor faults in advance, avoiding a loss of approximately US$700,000 for the entire organization. The chaos engineering team has a total of ***3 members, with a salary expenditure of US$150,000 per person. Carrying out a chaos engineering experiment itself costs US$10,000. What is the return on investment?