About Me

My photo
Rohit is an investor, startup advisor and an Application Modernization Scale Specialist working at Google.

Friday, June 26, 2015

Warden and Deigo container preservation in Cloud Foundry

There are a number of use cases for Warden container preservation in Cloud Foundry -

  1. Post-mortem analysis of a compromised or hacked container
  2. Problem Determination and Troubleshooting
  3. Audit and compliance
There are two ways to achieve this before and after Diego i.e BD and AD

Before Deigo

This is a world where apps run in warden containers. Warden containers are garbage collected after a predetermined time period dea_next.crash_lifetime_secs Container container stays around on the DEA for about an hour or so by default before being removed. In addition to setting the dea_next.crash_lifetime_secs, also set dd the container_grace_time parameter to 3600 in the Warden configuration file stored in the warden.yml. container_grace parameter controls defines time before DEA deletes -containers.

After Deigo

Diego does not keep around containers after crashes. Diego kills containers when they "crash" this can be because:
  •  the application exited
  • the health check failed (today this is a port check however it will be possible to have a custom health check in the future)
  • the application exceeds its memory use

Till CF explicitly introduces support for keeping crashed containers around in Deigo  - there are some options to implement post-mortem container forensics :

  1. Modify the java buildpack release ruby script to upload the bits under /home/vcap rootfs to a s3 compatible blob store on JVM process exit. add OOM heapdump script that uploads heapdump to S3 storage
  2. Create a JVM shutdown hook that introspects and copies the bits of the app and the file-system to a Riak-S2 blob store. Runtime.getRuntime().addShutdownHook(shutdownHook); see
  3. Leverage a Tomcat Lifecycle Listener org.apache.catalina.LifecycleListener to do post-mortem activities. For a prototype look at the current Java buildpack’s ApplicationStartupFailureDetectingLifecycleListener
  4. As a part of the release phase, a postStop.sh script is executed after the actual server has stopped or crashed. This script will report the death of the instance along with instructions on downloading any relevant files from the specific instance and also provide a default grace period of 30 seconds before the warden container gets estroyed. This approach is implemented by the cloud foundry weblogic buildpack
  5. For the app in question explicitly specify a start command by adding a  ";sleep 1d" The push command would like this - cf push <app_name> -c "<original_command> ;sleep 1d". This will keep the container around for a day after process within the container has exited.  For a complete guide on troubleshooting  CF app issues take a look at 10-common-errors-when-pushing-apps-to-cloud-foundry.
  6. The easiest way to achieve container preservation without any extra work is to simply snapshot the DEA VM that contains the warden container.  The DEA VM mapping to the container in question will need to be ascertained by looking in the DEA logs for the app GUID. The mapping function can be scripted in a log analytics engine like Splunk or  ELK.
  7. For apps that are NOT running on the JVM , you can generate a  raw binary dump of the process memory by issuing a kill -6 ${PID} or kill -11 ${PID}. These dumps will need to either pulled manually or pushed from the warden container by an async task scheduled by the buildpack of the app runtime.  For a detailed discussion of various troubleshooting options with JVMs and Operating Systems refer to this cookbook from Kevin Grigorenko.
  8. If you want to download contents of a running app's file directory from the warden container  use the cf-download plugin. cf download spring-music Usage: cf download APP_NAME [PATH] [--overwrite] [--verbose] [--omit omitted_path] [-i instance]

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.