About Me

My photo
Rohit is an investor, startup advisor and an Application Modernization Scale Specialist working at Google.

Saturday, July 28, 2018

Day 2 Application Operations

What are the set of practices you need to inculcate and internalize to keep your app alive in production ? Here is a checklist that can help ...

1/ Use spring boot actuators in production … AppManager has secret sauce integrating with actuator allowing for easier management  and visibility of your app of PCF. all your actuator endpoints

2/ When it makes sense add custom health indicators that add to the default health indicators

3/ Add spring cloud sleuth to your classpath dependencies so that you can visualize the call flow across microservices threads with zipkin in PCF Metrics http://docs.pivotal.io/pcf-metrics/1-4/using.html

4/ Plug a syslog drain to the end of the Firehose using the appropriate nozzle like https://docs.pivotal.io/partners/splunk/index.html

5/ Emit application or domain level metrics using Micrometer https://docs.spring.io/spring-boot/docs/current/reference/html/production-ready-metrics.html

6/ Starting PCF 2.2 you can configure alerting and autoscaling on any custom app metric. Custom rules for autoscaling.

7/ Use VisualVM and sister tools from the JVM to get deeper insight particularly for performance issues. Here is a good article on setting up VisualVM with an app running on Cloud Foundry. Furthermore the ability to take threaddumps and heapdumps is critical in analyzing performance issues specifically How to generate Java Application thread dump from Cloud Foundry container and How to generate and download Java Application heap dump from Cloud Foundry container and How to know if an app is responsible for high CPU in a Diego Cell and how to find it

8/ Leverage canaries to understand the quality of the app and then scale up with Blue/Green. concourse-pipeline.

9/ Create a SLI/SLO dashboard to get immediate visibility into your error budgets and uptime.



10/ Practice an outage handling event and run through debugging common scenarios like OOM, hangs, deadlocks using cf-ssh, and boot actuator endpoints. Create a program around incident response management.



No comments:

Post a Comment

Note: Only a member of this blog may post a comment.