These are interesting tools that I have come across in the last couple of days to create chaos one of the key SRE practices to determine if your production site can handle excess load ...
- ChaosBlade: An easy to use and powerful chaos engineering experiment toolkit from AliBaba. https://github.com/chaosblade-io/chaosblade
- Chaos Toolkit: Create Chaos in your spring apps . https://github.com/chaostoolkit-incubator/chaostoolkit-spring and the chaos monkey for Spring Boot apps https://codecentric.github.io/chaos-monkey-spring-boot/
- Gremlin: Chaos As a Service. https://www.gremlin.com/docs/application-layer/attacks/ Resiliency through orchestrated chaos. Worth paying for this service if you have low confidence on the production readiness of your code or if you don't have SRE practices to shock the organization into operational readiness.
- General Load Testing tools: http://cloud.rohitkelapure.com/2019/05/load-testing-tools.html
- Istio Fault Injection: https://istio.io/docs/tasks/traffic-management/fault-injection/
Now that you have succeeded in creating chaos how should you instrument and fix the system to deal with the chaos. To understand how to deal with chaos start with Health Checks and Graceful Degradation in Distributed Systems and Testing in Production- The Safe Way
Other Book Chapters to understand the theory and implementation of SRE practices when dealing with Chaos read the chapters on Handling Overload and Addressing Cascading Failures from the SRE Books. As a bonus read the chapter on Non Abstract Large System Design to understand the design process for designing large scale fault tolerant systems.
Lastly if you are in the Bay Area this looks like an awesome conference https://chaosconf.io/
Happy SRE Practices!
Lastly if you are in the Bay Area this looks like an awesome conference https://chaosconf.io/
Happy SRE Practices!
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.