About Me

My photo
Rohit is an investor, startup advisor and an Application Modernization Scale Specialist working at Google.

Friday, November 25, 2016

Current State of Persistence in Cloud Foundry and Pivotal Cloud Foundry


----LATEST----

Below is the current status of the nfs integration with ERT.

The ERT alpha available right now has the CC property for volume services enabled by default. Customers can then follow the documentation to deploy the NFSv3 driver/broker pair manually. 

ERT has a radio button to enable NFS volume services on the advanced tab. When the button is enabled, two things have to happen:
1 - The NFS broker is deployed as a CF app and registered to the marketplace. Work is being tracked in the OSS backlog to deploy the driver as an App) https://docs.cloudfoundry.org/adminguide/deploy-vol-services.html
2 - The NFS driver is collocated on Diego Cells, disabled by default, and enabled when configured. (Windows Cells is not supported)


In PCF 1.10, the driver can be deployed via bosh add-on and broker can be pushed separately for testing.


Sample app thanks to Adam Zwickey
https://github.com/azwickey-pivotal/volume-demo


Container persistence shipped in 1.8 in a closed beta and will GA in PCF 1.10.  It is trivial to modify a PCF 1.9  to enable volume services. See above. To play with it locally use PCFDev.

There are a number of use-cases where we have to remediate the dependency on persistent filesystem when replatforming apps to PCF. We run into these use-case in all our app replatforming engagements. 
  • Sharing data across apps  as an integration layer
  • 2PC commit transaction logs
  • Existing datasets on disk
  • Disk caches
  • Content management systems that r/w from mounted file volumes
  • 3rd party dependencies and libraries that rely on persistent file systems
  • Composite apps. some services runs on PCF and others run on the IaaS 
For Persistence Orchestration, a new project called Volman (short for Volume Manager) has become the newest addition to the Diego release. Volman is part of Diego and will live in a Diego Cell. At a high level, Volman is responsible for picking up special flags from Cloud Controller, invoke a Volume Driver to mount a volume into a Diego Cell, then provide access to the directory from the runtime container.

As of cf-242 and Service Broker API v2.10, Cloud Foundry now ships with support for Volume Services: filesystem-based data services. The v2.10 API is a release candidate, and will be considered GA unless a bug in the implementation is fond. An experimental version of the API was added in v2.9.

What is included in CF itself is the plumbing required to plugin driver/broker pairs that add support for specific kinds of external volumes. The support for EFS, NFS, Isilon, etc is added through separate BOSH releases not tied to a particular CF version. In https://github.com/cloudfoundry-incubator there is a local-volume-release and an efs-volume-release. In the Persi tracker there is an nfsv3 - "broker driver pair for existing nfs shares that can be mounted with nfsv3" epic almost complete.

Until recently, the only data services that have been allowed are ones with network-based interfaces, such as a SQL database. With Volume Services, brokers can now attach data services which have a filesystem-based interface.

Currently, we have platform support for Shared Volumes. Shared Volumes are distributed filesystems, such as NFS-based systems, which allow all instances of an application to share the same mounted volume simultaneously and access it concurrently.

This feature adds two new concepts to CF: Volume Mounts for Service Brokers and Volume Drivers for Diego Cells. Details can be found in the links below.
Slack: If you're interested in rolling out a volume service, please ask questions here, on the OSS #persi slack channel.

Finally if you want to play with persistence support with Cloud Foundry checkout PCFDev. CFDev team released a new version of PCFDev that includes local-volume services out of the box. This new release of PCFDev gives you an option to get up and running with volume services that is an order of magnitude easier than any of the options we had before. Here is a  post detailing the steps to try out volume services with your Cloud Foundry applications.

This feature does not mean we automatically revert to using persistent mounts when replatforming applications - I look at this as another weapon in our arsenal_  also see To be clear this just a stepping stone to a more cloud-native architecture. You have to treat blobs (files) as a construct supported by a backing service. All apps instances now will now see a common NFS mount - so app instance have to manage consistency when talking to the mount .... each app instance does NOT get its own mount

Adam Zwickey Pivotal platform architect validated the persistence feature in CF following these steps:
1) Enable volume-services on the cloud controller. Cloud Foundry must be deployed with the cc.volume_services_enabled BOSH property set to true.
2) Deploy a volume driver colocated with each Cell  (using bosh add-ons)
3) Deploy a service broker that implements the volume API.

For sample apps that require disk persistence you can employ Spring apps that leverage the @cacheable abstraction  and write the cache to disk. On app restart you should see cache hits for the content written to the disk. see 1 and 2.

References:

credit:  Thanks to Greg Oehman & Julian Hjortshoj & Adam Zwickey

Migrating 1TB of Data from DB2 to MySQL

I would advise that you first insulate the application against this change by making all the app services interact with the backend using a Repository pattern. i.e. put the repository abstraction in place that will allow you to then switch the DB internally. I also like to follow the expand-contract-expand pattern for consumers explained in http://martinfowler.com/articles/evodb.html for existing data. 

When you refactor your application to use MySQL leverage something like Flyway or jOOQ to manage  future DB migrations.

At a raw DB2 data and schema level there are tools like DBConvert that can be used to move and sync the data.

So its a combination of both app and data patterns along with some Blue/Green routing magic that will be needed to move the data from DB2 to MySQL.

Wednesday, November 16, 2016

Breaking Apart a Complex Domain

One of the key problems when you are breaking a monolith are identifying the microservices and their respective domains. This is critical to get right otherwise you end up solving the wrong problem. How do we know which part of the monolith/megalith to attack first ? How does one identify the core domain and its supporting sub-domains.  

There are a couple of techniques that can serve as a launch point for this discovery. These techniques revolve around the idea of Visual Thinking i.e. mapping the domain as events or lego blocks and visualizing the data flow and system architecture. Three such techniques have emerged recently 1. Event Storming, 2. Visible Architecture and C-4 Model

Before breaking apart a complex domain its key to get a shared understanding of the current state of the domain. We need to establish a baseline where we label ALL the red flags and pain points of the system. Thereafter we move into problem solving and collaboration where we come up with solutions and approaches to move the system closer to our end-state, picking the features & approaches that give us the best chance of success. We need scientific techniques to break down a complex domain among a group of business and technical stakeholders. Fostering collaboration among people is a hard problem to solve. Event Storming and the Visible Architecture process outlined below, starts the conversation initiating the scientific process that leads to minimum viable microservices being carved from the monolith.

Event Storming

Event Storming is a group exercise to scientifically explore the domains and problem areas of a monolithic application. The most concise description of the process of event storming comes from Vaughn Vernon's DDD-Distilled book and the color around the process comes from Alberto Brandolini's book Event Storming. I have taken liberally from those two sources in this blog post. So how does one perform an exercise of event storming ?

1. Storm the business process by creating a series of Domain events on sticky notes. The most popular color to use for domain events is orange. The DomainEvent is a verb stated in past tense and represent a state transition in the domain. Write the name of the DomainEvent on an orange sticky note. Place the sticky notes on your modeling surface in time order from left to right. As you go through the storming session, you will find trouble spots in your existing business process. Clearly mark these with a purple/red stick notes. Use vertical space to represent parallel processing.


  src: Event Storming from Alberto Brandolini http://leanpub.com/introducing_eventstorming

2.  Create commands that cause each Domain Event. The command should be stated in the imperative. Place the light blue sticky note of the command left of the domain event that causes it. They are associated in pairs: Command/Event, Command/Event, ... If  there is a specific user role that performs an action and it is important to specify, place a small, bright yellow sticky note  on the lower left corner of the blue stickie. Sometimes a command can cause a Process to run. It is possible that creating commands will cause you to think about creating domain events. Place newly discovered events on the modeling surface along with the corresponding command. Sometimes one command may cause multiple events. Model that one command, and place it to the left of the multiple domain events. Once you have all commands associated with the domain events you are ready to move to the next step.

3. Associate the Entity/Aggregate on which the command is executed and that produces the Domain Event outcome. This is the data holder where commands are executed and domain events are executed. Other business friendly words for Aggregate are data or Entity.  Place the aggregate sticky behind the Command and domain event. If aggregates are repeatedly used then create the same aggregate noun on multiple sticky notes and place them repeatedly on the timeline where the corresponding Command/Event pairs occur. It is possible that as you think of data associated with the various actions you may discover more domain events. Never ignore new events, rather follow the same process as before to integrate the events with commands and aggregates.

4. Draw boundaries and lines with arrows to show flow on the modeling surface. You have discovered that there are multiple models in play and domain events flow between models. Use solid lines for  bounded contexts and dashed lines for subdomains. Draw lines with arrowheads to show direction of domain events flowing between bounded contexts. If you want to start bounding models with less permanence use ping stickies to mare general areas and withhold drawing boundaries with permanent markers till your confidence justifies it.

5. Identify the various views  that your users will need to carry out their actions, and important roles for various users. Use bright yellow stickies to identify user roles or personas.

An exercise in event storming may lead you to a CQRS Event Sourced architecture. If you pursue the path to using domain events as the system of record then leverage resources here to get deep into the world of Event Sourcing.

src: Event Storming from Alberto Brandolini http://leanpub.com/introducing_eventstorming

Visible Architecture:


src: Screenshot of Luke Hohmann's talk from Mucon

A visible architecture is a physical model of a system created by architecture teams using Duplo® bricks, with strings representing data flows. Visible architectures enable teams to collaboratively understand the “as-is” architecture and make better choices on the “to-be” architecture.

Quoting Luke ... 
The structured, yet creative freedom afforded by the technique enables teams to explore challenging concepts. For example, in one project, teams from Cisco represented known problems as “monsters” and outdated technologies as “dinosaurs” using plastic toys. In another project, teams from Rackspace used Visible Architectures as means to rapidly integrate acquired technologies.

Luke Hohmann has pioneered this technique and in this talk from Mucon, he presents an overview of the process, how to document desired changes in a structured manner, and how to augment Visible Architectures with powerful business frameworks that enable architects to "speak the business language" necessary to convert models into realities.

C-4 Model:
A software system is made up of one or more containers, each of which contains one or more components, which in turn are implemented by one or more classes. Some other techniques that I see of value in this space is the work done by Simon Brown in Modular Monoliths and the C-4 model. Checkout his book on Software Architecture for Developers. This book focusses on the visual communication and documentation of software architecture. The key facets of the software architecture are:
1. Context: A high-level diagram that sets the scene; including key system dependencies 
and people (actors/roles/personas/etc).
2. Container: A container diagram shows the high-level technology choices, how responsibilities are distributed across them and how the containers communicate.
3. Component: For each container, a component diagram lets you see the key logical components and their relationships.
4. Classes (or Code): Small number of high-level UML class diagrams  that explain how a particular pattern or component will be (or has been) implemented. 
These can be elicited from the code or drawn from scratch using tools like Structrizr

Saturday, November 12, 2016

Moving Data Off An Mainframe

Recent developments have reinforced the need for change in our political system. In the same vein the blog post below provides a path to unlocking your data and compute from mainframe.  The content in this post is a straight ripoff from my buddy and colleague David Malone's postulation on this topic. The steps below from Dave and are the most cogent explanation of how to move data away from mainframe to distributed systems. It draw's from Dave's direct experience at a major retailer.  The same approach can be leveraged to starve and strangle data locked in any data store.

Monday, November 7, 2016

The Thin Line Between Application Replatforming and Refactoring

When migrating applications to the cloud there is a very thin line between application replatforming and refactoring. These terms are defined as follows: 
  • Replatforming: Make the least amount of changes to move it to the cloud aka "lift-tinker-and-test-and-shift" 
  • Refactoring: Modify the applications such that it becomes cloud native. Reimagine how the application is architected and developed, typically using cloud-native features.
What defines cloud native? Well we can use 12 factor/15 factor app as one set of heuristics to determine cloud nativeness. We will therefore explore the boundary between replatforming and refactoring along the 15 factors since they are as good as any, ubiquitous language to talk about the effort and the changes that differentiate lift-and-shift vs refactor. The examples below illustrate what it means to replatform vs refactor the application. The context of these words changes in some cases from applications to environment.

TL;DR: Always replatform first and then based on business and technical strategic objectives refactor. The line between replatform and refactor is fuzzy.

One Codebase, One Application:
Adding automated maven or ant based build such that each build runs consistently and generates a deployable artifact is replatforming. If the code is not in source control checking in the code and applying a bug tracking system counts as replatforming. Moving the code from svn to git or across SCMs is refactoring. Modifying the build such that the single codebase is  split and reorganized among teams dedicated to individual apps and microservices is refactoring. If you split your code base into multiple microservices or shared libraries or common API then it counts as refactoring. 

API First:
If you apply the extract API from class pattern and use it to strangulate the monolith then  that is replatforming. If you have an API designed by product designer based on UX or user story flow mapping and are retrofitting implementation to this new API then it counts as refactoring. Putting API management tools or ambitious API gateways counts as refactoring. If you are fixing the implementation of an existing API i.e. modernizing from Axis1/2 to Spring REST MVC or switching the payload from XML to JSON then it counts as replatforming. If you are making semantic changes or additions or deletions to an existing API that falls in the category of refactoring. Getting messaging working with WebSphere MQ counts as replatforming whereas replacing WebSphere MQ with JMS or AMQP  is refactoring.  

Dependency Management: 
Modifying the existing packaging of the application such that all of the dependencies including the application server are vendored into the app like the spring boot fat jar packaging qualifies as refactoring. Modifying the packaging of the application such that a single monolithic ear file is broken into multiple war files that are deployed separately falls under replatforming.  Using an application server buildpack to vendor in your app dependencies like the TomEE or JBOSS buildpacks counts as replatforming. Removing dependence on older frameworks and JavaEE and building the app with spring-boot-starters counts as refactoring. Spring bootification i.e. the process of converting an app to spring boot of the app, ultimately leads to mavenization conflicts where the newer dependencies dragged in by boot and its cousins fight with the older frameworks ossified in the app. Surprisingly a large part of refactoring is spent harmonizing these dueling sets of dependencies.  

Design, Build, Release, Run: 
Using a CI tool like Jenkins to manage the CI pipeline counts as replatforming; however if you define the deployment pipeline through code instead of configuring a running CI/CD tool then you are refactoring. If you leverage blue/green deployment then you are replatforming; however if you are using feature flags and dark launching i.e. adhering to NO breaking changes then you are refactoring. If you do no* upfront design and dive straight into making the code run on the platform then you are replatforming; however if you indulge in model design with bounded contexts and all the good stuff with DDD you are in refactoring territory. If you run the application with the actuators that come by default with Spring Boot you are replatforming; however if you find your self writing a ton of custom actuator metrics and health end points you may be refactoring.

Configuration, Credentials and Code: 
If you have brute forced your configuration by getting rid of all of your configuration files and then went back through your codebase and modified it to expect all of those values to be supplied  by environment variables then you are replatforming. If you have modified your code to expect configuration based on profiles from an external configuration server then you are refactoring. If your credentials are fetched from an external service then you are refactoring; however if your credential management requires encrypted keys as environment variables or hard coded in environment specific property files then you are replatforming.

Logs: 
If you modify your application send logs to stdout and stderr you have replatformed the app. If you have modified the traces across your apps to take advantage of distributed tracing frameworks like Zipkin and corresponding log aggregation, indexing and visualization tools like ELK or Grafana you have refactored the app.

Disposability: 
If your application starts slowly and you have tinkered with timeout settings of the platform to allow the health checks to pass then you have replatformed. If you have fixed the slow app startup and shutdown issues then you have refactored the application. If your application is a mix of web request processing and batch tasks then you have simply replatformed the app; however if you have separated batch processing from the request or message processing bits into separate apps then you have refactored.

Backing Services:
If your cache or sessions are within the JVM or offloaded them to an external DB then you have replatformed. If you have eliminated the need to keep persistent state across requests then you have indulged in refactoring. If the configuration of the backing service happens outside of the application  i.e. there is no coupling between the app and the specific backing service then you have refactored the application. If the configuration and binding of the backing service is done explicitly within the application then you have replatformed the app. If you have protected your backing service interaction with circuit breakers then you have refactored whereas if you have simply gotten the backend service configured correctly with the app with externalized configuration then you have replatformed the app.

Environment Parity: 
If you have 8 different environments across 3 different tiers, each with their own databases, firewall rules and app specific configurations in properties files then you have replatformed the environment; however if you have a one-touch CI/CD deployment to production with appropriate process gates then you have refactored your environment. Unless every commit is a candidate for deployment and the gap between production and test environments basically comes down to different prioritized infrastructure resource pools you are replatforming. 

Administrative Processes:
If you have separated your admin processes like crons, database migrations, singleton services from your parent app then you are a refactorer otherwise if your app is a jack of all trades implementing different concerns then you are a replatformer. If you have extracted your one-off processes and implemented them using batching frameworks like Spring Batch or Spring Cloud Streams or Spring Cloud Data Flow then you have certainly refactored the app.

Port Binding: 
Table stakes here is eliminating hard-coded dependencies on specific ports for startup. When you have eliminated all other network protocols and refactored the application to exclusively communicate exclusively over HTTP then you have refactored the app. If you have NOT micromanaged your port assignments and can run with the ports exposed by the container you have replatformed the app. If you rely on a dynamic service registry like Eureka or Consul to discover other micro services and their ports then you have refactored the app.

Stateless Processes:
If all the long running state is external to application persisted in backing services then you have replatformed the application. If you have rearchitected the application to get rid of the need to carry long running state then you have refactored the application. Remediating the app dependency on a persistent filesystem like NFS or local disk to instead leverage s3 or sftp  is replatforming; however relying on an external service that provides a file caching abstraction or a user level file system like HDFS is refactoring the application.

Concurrency:
If the most efficient way to scale the application, is to vertically scale resources like memory or CPU or I/O then all you have done is replatformed the app; however if the app can be horizontally scaled without bound to maintain the same SLAs with a linear increasing load then you have refactored the application.

Telemetry: 
If your approach to monitoring involves third party APM tools and other health checks baked into the platform with Spring boot actuator libraries then you have replatformed; however if you have baked in business domain specific monitoring and metrics that can be leveraged by the business for A/B testing and used by the team to discuss feature rollout or blue/green deployment then you have reached refactoring nirvana.

Authentication & Authorization:
If you rely on integration with SiteMinder or other agent based protocols like Kerberos and have them working in the cloud via  a route service or via an API gateway or via a route interceptor then you are replatforming. On the other hand if the application is refactored to use federated lighter-weight user and application space based identity and authorization protocols like OAuth2, OpenID Connect and JWT then you have a reformed app on your hands.

Monday, October 31, 2016

Why Bother ? The Need for Rabbit and Redis through Pivotal Cloud Foundry

1. Need for self-provisonable reliable and production capable messaging and non-relational backing services in a time-bound fashion (typically less than 5 mins).

2. Reduce time to value for the application features and increase developer flow by giving the ability to create and bind services during application development avoiding ticket based speed bumps.

3. Existing IBM based messaging and relational stores are too heavyweight for cloud based applications and don't have quick turnaround for new instances.

4.  Pre-canned  recipes and collateral to leverage Redis and Rabbit to accelerate the migration to the Cloud. see blog posts https://medium.com/@KevinHoffman/migrating-apps-to-the-cloud-shunting-the-event-stream-8c2f6f309242#.5h3dpckl4 and https://medium.com/@KevinHoffman/orchestration-to-choreography-19686684fd44#.nxp73bxxk

5. Redis and RabbitMQ are the best-in-breed when it comes to session caching, event sourcing and messaging. Deep Integration and Ease of Usage with the Spring and the Spring Boot frameworks that automatically configure connections and credentials to Redis and Rabbit when an application is deployed to Pivotal Cloud Foundry.

6. Developer ramp-up time to Redis and RabbitMQ is much lower than other technologies in the same space.

7. These Pivotal Tech will help reduce IBM spend on DB2 and provide a pathway for migration away from MQ and DB2.

Wednesday, October 26, 2016

Snap Analysis of applications

We are often called upon to analyze applications for suitability of porting to Cloud Foundry. We quickly draw up the following chart and prepare a forrester style 4 quadrant graph of Technical suitability vs Business Criticality. I am embedding the technical criteria for evaluating an app below :









The quadrants on the SWOT analysis graph below are the axis of whatever is important to you to prioritize the apps - typically the axes are - Technical complexity and effort vs Strategic business importance. It is a way of categorizing your apps based on your goals and interests that come out during the inception.

Typically the X axis is (technical complexity - how much do you know about the app) - Y axis is (Business criticality - How important is this to the business) - other factors come into play as well


image credit goes to my buddy Joe Szodfridt

CQRS and Event Sourcing in Java with Spring Framework

Recently I have had the opportunity to delve deep in to the world of Event Sourcing and CQRS for a modernization project where we are refactoring a 1M LOC a decade old classic java application that does more than a billion dollars of revenue across multiple lines of business. If you have to embark on a similar journey these are the resources I found useful:

1. DDD Distilled : This is the best summary of the classic Eric Evan's DDD book and Vernon's iDDD book. There is also a more practical Tools and Techniques section which summarizes Event Storming technique very well. We used a 3 day event storming inception to discover the complex business and technical domain and figure out the bounded contexts and problem areas.

2. Event Storming from Alberto Brandolini: A very easy read on the why and how of event storming. Alberto's writing style is very entertaining. This is probably the most interesting book written in this space. Alberto and Paul Raynor's videos(1, 2) on event storming provide a good getting-started guide to the world of event storming.

3. This Gist from Szymon provides an excellent collection of videos to understand the basics of modeling, Domain-Driven Design, CQRS and Event Sourcing.

4. Online Class: Greg Young's entire Domain Driven Design, CQRS, and Event Sourcing class all online.

5. Read Event Sourcing and CQRS chapter from Cloud Native Go: Building Web Applications and Microservices for the Cloud with Go and React on Safari Online.

6. Can't say enough good things about Michael Ploed's talk on building microservices with event sourcing and CQRS at SpringOne 2016. His practical advice is sorely needed in understanding the production concerns when refactoring a monolith to event-sourcing.

7. Peruse significant discussions on the DDD CQRS Group google mailing list.

8. Blogs & Articles

9. Code Repositories:

10. Videos & Webinars:

Tuesday, August 2, 2016

Running WebSphere Liberty Profile or Tomcat in Cloud Foundry

Why should you chose to run your apps on a plain vanilla Tomcat/Jetty/Undertow servers with the Spring Framework instead of running on WebSphere Liberty Profile or any proprietary app server.
  • Oracle has stopped investing in JavaEE. Java-Gaurdians. Java EE innovation driven through the JCP process is driven to a complete halt thanks to lack of investment by its major supporters and Oracle hosting the Java EE standard  hostage with the proprietary TCK accessible only to Oracle and corporate licenses. [reboot]. Meanwhile innovation in the Spring community has taken off. 
  • Don't take my word on it. Read the analyst Stephen O' Grady's opinion on Spring Boot ... Better than ten years removed from the initial release of Rails, it seems strange to be writing about the “new” emphasis on projects intended to simplify the boostrapping process. But in spite of the more recent successes of projects like Bootstrap and Spring Boot, such projects are not the first priority for most technical communities. Perhaps because of the tendency to admire elegant solutions to problems of great difficulty, frameworks and on ramps to new community entrants tend to be an afterthought. In spite of this, they frequently prove to be immensely popular, because in any given community the people who know little about it far outnumber the experts. Even in situations, then, when the boot-oriented project and its opinions are outgrown, boot-style projects can have immense value simply because they widen the funnel of incoming developers. 
  • Thoughtworks technology radar now recommends Spring Boot as an ADOPT technology.  A lot of work has gone into Spring Boot to reduce complexity and dependencies, which largely alleviates our previous reservations. If you live in a Spring ecosystem and are moving to microservices, Spring Boot is now the obvious choice. For those not in Springland, Dropwizard is also worthy of serious consideration.
  • Developing applications using the full feature set of the WebSphere Liberty Profile will result in your application being non-portable to other app servers or migration from WebSphere Classic versions to the Liberty Profile. Furthermore you are offloading the control of your applications features to the app-server. Using Spring and vendored frameworks instead of provided dependencies at runtime yields control back to the application leading to a deterministic set of compile and runtime dependencies. Packaging apps with Spring lead to immutable artifacts. 
  • Idiomatic Applications written with the Spring Framework and Spring Boot on Pivotal Cloud Foundry running on a plain vanilla app server written in an idiomatic fashion are inherently 15 factor cloud native app. The applications are production-ready from the get-go. 
  • Organizations are now comfortable running business critical applications in production. Take a look at a recent Java Application Server survey report from Plumbr where the Tomcat installation base exceeded the 50% threshold for the second year in a row. The 58.22% share of the pie left no question about the winner. 


Tuesday, June 14, 2016

Multi-line Java stack traces out of order in Logstash and Splunk

Do you have customers frustrated by getting multi-line Java stack traces out of order? We're working on a clean solution in our enhanced metrics work, but here is a workaround courtesy @DavidLaing .
With the Java Logback library you can do this by adding 

"%replace(%xException){'\n','\u2028'}%nopex" 

to your logging config [1] , and then use the following logstash conf.[2]
Replace the unicode newline character \u2028 with \n, which Kibana will display as a new line.


mutate {
  gsub => [ "[@message]", '\u2028', "
"]
^^^ Seems that passing a string with an actual newline in it is the only way to make gsub work
}

Thursday, June 9, 2016

Top 10 KPIs Cloud Foundry

The cloud foundry firehose and syslog streams generate tons of metrics and logs. What should a busy devops engineer look at ? My colleague Ford Donald put this awesome list of the top 10 KPIs of Cloud Foundry.

+----------------------------------+---------+--------------------+
| KPI                              | Description
+----------------------------------+---------+--------------------+
| rep.capacityRemainingMemory      | Available cell memory (sum for all cells)
| rep.capacityRemainingDisk        | Available cell disk (sum for all cells)
| bbs.LRPsRunning                  | Equivalent to # apps running
| bbs.RequestLatency               | CF of API is slow if this rises
| bbs.ConvergingLRPduration        | Apps or staging crashing if this rises
| stager.StagingRequestFailed      | Indicates bad app or buildpack pushes
| auctioneer.LRPauctionsFailed     | CF can’t place an app (out of container space)
| router.totalRoutes               | Size in routes of a PCF install (indicates uptake)
| router.requests                  | Tracks traffic flow thru a PCF
| bosh.healthmonitor.system.healthy| Indicates BOSH/VM health
+----------------------------------+---------+--------------------+

New PCF v1.10 "Monitoring PCF" docs section went live!  - https://docs.pivotal.io/pivotalcf/1-10/monitoring/index.html




Replatforming .NET applications to Cloud Foundry - Migrating .NET apps to Cloud Foundry


A lot of developers do not realize that Cloud Foundry can run a process from any runtime including .NET. Windows applications are managed side-by-side with Linux applications. .NET apps get support for key PCF features like scaling including autoscaling, monitoring, and high availability, logs and metrics, notifications, services including user-provided services and PCF set environment variables. BOSH and OpsManager now supports deployments on Microsoft Azure via pre-built ARM templates.This blog post provides details of the Windows support and provides guidance on migrating .NET apps to Cloud Foundry . This post build extensively on the work done by the Pivotal product engineering and the solutions team particularly Kevin Hoffman, Chris Umbel Zach Brown, Sam Cooper and Aaron Fortener.

There are two ways to run ASP.NET  apps on Cloud Foundry

1. Windows 2012 R2 Stack
 • Support Introduced in Nov. 2015
 • Supports.NET 3.5-4.5+
 • Requires Diego-Windows, Garden-Windows
 • BOSH-Windows support currently underway
 • App is pre-compiled, no detect. Leverage binary_buildpack  to push app
 • cf push mydotNetApp -s windows2012R2 -b binary_buildpack
 • Runs on Garden-Windows container
 • Win 2k12 provides process isolation primitives similar to those found in the Linux kernel
 • IronFrame abstracts these primitives into an API utilized byGarden-Windows
 • Win 2k12’s low-level API Linux kernel’s API
 • Therefore Garden-Windows process isolation functionality ≈ Garden-Linux
 • Developer OS is Windows

2. Linux cflinuxfs2 Stack
 • ASP.NET Core CLR only
 • Runs on Diego or DEA
 • Standard CF Ubuntu Stack
 • Community ASP.NET 5 Buildpack
 • cf push mydotNetApp -s cflinuxfs2 -b <asp.net5 bp>
 • Garden-Linux is the backing container
 • Garden-Linux containers are based on LXC (Linux cgroups)
 • Developer OS is Win, Mac OSX and Linux

ASP.NET Core vs ASP.NET 4.6


 ASP.NET Core
  • Web API
  • ASP.NET MVC
  • No Web Forms (yet)
  • No SignalR
ASP.NET 4.6
  • Battle-tested, hardened
  • Many years of testing and improvement
  • MVC, Web Forms, SignalR, etc.

Migrating Existing Applications

• For Greenfield Applications Try “Core First” approach
• For Replatforming and migrating Existing Apps Probably ASP.NET 4.x
Dependencies and framework requirements will likely dictate choice
• All versions of the .NET SDK supported by Windows Server 2012 R2 are supported on PCF including 3.5, 4.5, 4.5.1 and 4.6

Here are some of the details on how Garden-Windows secures applications


Non Cloud-Native stuff you will want to change

 • Reading/Writing to the Registry.
•  Super old versions of ASP.NET. Upgrade to .NET core or ASP.NET 4.x
 • Reading/Writing to the local disk. File systems are ephemeral. Replace usage of the file system e.g. disk logging even temp files with a S3 compatible blob store or external file store.
 • Does the app use integrated Windows Authentication? Replace  Integrated Windows Auth. with ADFS/OAuth2 to leverage UAA and Pivotal SSO Tiles
 • In-Process Session State / Sticky Sessions. Replace InProc, StateServer with out-of-process data store e.g. Redis, SQL Server
 • Externalize  environment-specific config in web.config into VCAP environment variables. If value changes between environments, don’t get it from web.config. Don’t use externalization methods that only work in Azure. If app relies on machine.config or applicationHost.config, could have issues in PCF.
 • For MSI-Installed services or drivers, Bin deploy dependencies with app.
 • Nearly every app will need refactoring to use Console logging and tracing. Customers who mess with applicationHost.config can interfere with tracing, logging, etc.
 • 32 bit ASP-Net builds/References/Builds/apps depending on 32-bit libraries will not work. Use of 32-bit dependencies or when the app is itself a 32-bit app this is a red flag.
 • Does the app run as a service or a WCF Self-hosted services as opposed to IIS-hosted ? These services will need to be self-hosted in IIS as part of the replatforming to PCF.
 • Apps requiring assemblies in Global Assembly Cache won't work in CF.
 • Backing Service Drivers requiring native installers / DLLs may not work in CF.
 • On Demand Apps and ones triggered by job controller will need to be refactored to run in CF to use CF one-off task support coming in the CC v3 API.
• MSMQ and tight integration with other MSFT server products via on-server dependencies will need to be refactored to use external backing services. .NET Core is now supported in the RabbitMQ .NET client.
• Routing to non-HTTP WCF endpoints and listeners for third party middleware will not work since only HTTP based inbound traffic is routed to the windows containers.
• Reliance on Microsoft-specific infrastructure? (e.g. MSMQ, BizTalk, SharePoint, etc). These apps do NOT make good candidates to port to PCF.

.NET Application Selection Criteria


References

Wednesday, June 8, 2016

Creating Chaos in Cloud Foundry

One of the key tenets of operational readiness is to be prepared for every emergency. The best way to  institutionalize this discipline is by repeatedly creating chaos in your own production deployment and monitor the system recovery. The list below is a listing a tools from the PCF Solutions team @ pivotal and others to cause chaos at all levels in the stack in Cloud Foundry.

Tools, Presentations & Repos:
https://github.com/xchapter7x/chaospeddler
https://github.com/xchapter7x/cf-app-attack
https://github.com/strepsirrhini-army/chaos-lemur
https://github.com/FidelityInternational/chaos-galago
https://github.com/skibum55/chaos-as-a-service
Monkeys & Lemurs and Locusts Oh My - Anti-Fragile Platforms

Type of test/event/task



1. BOSH
* bosh target (director ip)
* bosh login (director username/password obtained from Ops Man)
* bosh download manifest cf-(hash) ~/cf.yml
* bosh deployment ~/cf.yml
* bosh vms/cck
* bosh ssh
* bosh logs
* bosh debug (gives you the job/task logs)
2. VM Recovery
* Terminate a VM by deleting it in vSphere, watch it come back up
3. App Recovery
* Terminate an app by using cf plugin, watch it come back up.
4. Correlate logs?
* Watch logs for steps above
5. Chaos Monkeys
* Execute Chaos Lemur and watch bosh/cf respond
6. Director
* Shut VM down/delete in vCenter
* When its down, what app still runs?
* Once VM is gone, how do you get it back/rebuild?
7. Network switch
8. Hypervisor
9. Credentials that expire:
* Certs that have expiration date
* System Accounts (internal CF system accounts)
* vCenter API Account that CF uses
10. Log Insight goes down
11. Kill container
12. Kill VM
13. Kill DEA
14. Kill Router
15. Kill Health Manager
16. Kill Binary Repository
* Then scale
17. Over-allocate Hardware (how do we do it?)
18. Execute and backout a change to CF
19. Bulid Pack Upgrade and Roll Back
20. Right Apps have right build pack
21. Licensing server scenario (for example, can't connect)
22. Double single components (for example, 2 BOSH's)
23. Kill internal message bus
24. DNS
25. Clock drift



Chaos Testing Procedure: 
Kil vms from vsphere; used bosh tasks —no-filter in a loop to watch resurrector bring them up
bosh ssh and sudo kill -9 -1 are also fun
bosh ssh’d into a dea and killed a container