All Things Cloud: 2019-08-04

Saturday, August 10, 2019

Event Storming - A Pivotal Practice for decomposing applications

FIELDS + GUIDANCE	Guidance
Name of method	Event Storming https://speakerdeck.com/rkelapure/event-storming
What is this method ?	Event Storming is a cross functional facilitation technique for revealing the bounded contexts, microservices, vertical Slices, trouble spots and starting points for a system or business process.
Phases	Discovery, Kick Off
Suggested Time	1 - 2 hours.
Who participates?	SMEs, Core Team (see facilitator notes)
Why do it?	Event Storming enables decomposing monoliths into microservices. It allows for modeling new flows and ideas, synthesizing knowledge and facilitating active group participation without conflict to time travel and ideate the next generation of a software system.
When to do it?	When you need to make sense of a Huge mess to enable Cross Perspective Communication as a force function for clarity.
What supplies are needed?	People, tools and supplies needed to conduct an ES session A large wall (for stickies) At least 4 different colored stickies Sharpies Blue painters tape Refreshments like water and soda and juices for hydration Paper flip boards for readouts and breakouts
How to Use this Method	Event Storming is a group exercise to scientifically explore the domains and problem areas of a monolithic application. The most concise description of the process of event storming comes from Vaughn Vernon's DDD-Distilled book and the color around the process comes from Alberto Brandolini's book Event Storming. Storm the business process by creating a series of Domain events on sticky notes. The most popular color to use for domain events is orange. The DomainEvent is a verb stated in past tense and represent a state transition in the domain. Write the name of the DomainEvent on an orange sticky note. Place the sticky notes on your modeling surface in time order from left to right. As you go through the storming session, you will find trouble spots in your existing business process. Clearly mark these with a purple/red stick notes. Use vertical space to represent parallel processing. After all the events are posted experts will post locally ordered sequence of events and enforce a timeline. Enforcing a timeline triggers long awaited conversations and eventually STRUCTURE will emerge. These event clumps or common groupings give us our notional service candidates (actors or aggregates depending on how rigid the team is with DDD definitions). These will be used during the Boris Exercise.
Success/Expected Outcomes “You know you are done when…”	- Event Storming generates an immense backlog of user stories. - Perform User Story Mapping to Map and organize stories into MVPs - Define scope of the problem - Confirm that you are solving the right problem ?
Facilitator Notes & Tips	Event Storming is a technique used to visualize complex systems and processes. This could range from monoliths to value streams. Event Storming is gamestorming technique for harnessing and capturing the information captured in a group’s minds. It surfaces conflicts and different perspectives of a complex system and bubbles up the top constraints and problem spots. As an event storming facilitator you have one job - create a safe environment for the exchange and output of ideas and data. The job is 50% technical facilitation and 50% soft people facilitation where you are reading body language. A single facilitator can typically orchestrate groups of 15-20. For a group of 30 or more you need two facilitators. ES is usually conducted in two phases. A high level event storm to identify the domains and then a subsequent ES into a top constraint - the core domain. The language of ES is stickies. In its simplest form ES is basically a facilitated group story telling. The stickies represent domain events - or things that happened in the past. The trouble spots are identified by orange/red stickies. The color of the stickies does not matter. What does matter is that you start simple and then add the notation incrementally. Start simple and then add the information in layers. ES can serve many goals - break down a monolith into constituent bounded contexts, create a value stream - as a way to onboard employees, etc., There is no ONE correct style of ES. Every session is different from another based on the desired goals and outcomes. So don’t worry about getting it right- just do it and roll your own style. An ES is only successful if the right people are involved. This is a mix of business domain experts, customer executives, stakeholders, business analysts, software developers, architects, testers, and folks who support the product in production. Subject matter experts, product owners and developers that knows and understand the application domain. This process enables cross perspective conversation throughout the team as well as a standard definition of the terms used by both technical and non-technical team members.
Related practices	BORIS SNAP
Real world example	Deconstructing Monoliths With Domain Driven Design
Recommended reading	Motivation behind ES can be found here - Gamestorming: A Playbook for Innovators, Rulebreakers, and Changemakers. Note this book is available on safari books online: https://www.safaribooksonline.com/library/view/gamestorming/9781449391195/ This is the book we read on the flight to Boston Eventstorming https://leanpub.com/introducing_eventstorming Book is written by Alberto Brandolini the father and inventor of Eventstorming Domain Driven Design (DDD) - provides the theoretical underpinnings of decomposing ‘/monoliths. DDD Distilled is the perfect book to understand the science of DDD and how ES fits into the grander scheme of things - how do the ES artifacts translate into software design, architecture and an actual backlog.

Thursday, August 8, 2019

Learnings from Implementing enterprise event driven architecture

There are four different types of event driven architecture[3]. Docket Based Choreography pattern is one of our inventions that allows us to design and operationalize an event driven architecture for a legacy system. It involves event notification but is a specific implementation. Typical event driven reengineering of a monolith involves both sync and async flows. CQRS and Event Sourcing - one of the en vogue forms of event driven architecture is hard resulting in to 50% failure rate in projects. Practicing domain driven design and carving out bounded contexts and vertical slices is hard. You have to stick to first principles after the system is decomposed to stay true to the event driven architecture. We leverage Kafka a lot primarily as a messaging broker. Developers struggle with ACID guarantees in Event driven systems. Online event processing provides a way to make a system eventually consistent [6].

Typical pitfalls encountered in engineering to microservices from monoliths are Incidental Coupling of microservices and shared data model across microservices. in some cases the microservices reverted to shared canonical domain models. CDC driven decomposition of monoliths is rare due to data silos. Successful architecture and app transformation requires change in culture. Event Shunting Pattern [4] allows for gradual transformation for legacy to a modern stream event driven architecture. Among many things, the pace of change dictates the boundary of the microservice. Leverage past experience to see which features/components change over time. Avoid glorious central model. Central authority make changes to the canonical model. Avoid the trap of making a giant canonical model by staying true to guiding principles. Large messages as events vs small messages build on this. Large canonical messages can force many changes to the model happen. Managing regulatory, security, PII, encrypted data at rest are painful. Small messages that trigger data fetches through an API have been much more successful towards a sustainable architecture.

References

Wednesday, August 7, 2019

Death To the Kubernetes YAML! Long Live the Manifest Generators

You are wallowing in a wall of YAML, wondering where your inner loop developer productivity disappeared. You are wandering the annals of the internet to find the kubectl equivalent of cf push. You are a refugee from the land of the Platform As A Service wandering aimlessly in the Container As a Service world. The land of Kubernetes is intimidating for those of us who are used to the higher order abstractions afforded by platforms like Cloud Foundry or Heroku

In this series of blog posts a refugee from PaaS who has crossed the chasm, will be your coach as you navigate this perilous journey. We will cover equivalence of concepts from Cloud Foundry to Kubernetes. We will draw out the distinctions of architecture & developer workflows across K8s and Cloud Foundry. You will get a deep understanding of the tools & the confidence necessary to make YAML your new best friend and conquer K8s to 10x your productivity developing in the Kubernetes native way.

So here it goes - episode 1 - Death to YAML long live the manifest generators!

What are the options when it comes to a cf push like experience on kubernetes. I am going to keep a running list of K8s YAML template generators here that generate K8s manifests for app deployments. Developers start with these tools below to escape death by YAML on first contact with kubernetes.

Fabric8.io has a java DSL for kubernetes. The fabric8.io kubernetes java client. Istio and other projects use it. fabric8 library is a single jar - It works in airgapped environments. Provides a Typesafe Kubernetes-manifest DSL for JVM-based apps
The primary interface to kubectl is YAML. Pulumi exposes a rich, multi-language SDK to create API resources, and additionally supports execution of Kubernetes YAML manifests and Helm charts.
kf provides cloud foundry users a familiar workflow experience on top of Knative.
Google Cloud Code - Google Cloud Code provides IDE support for the full development cycle of Kubernetes applications, from creating a cluster to deploying your finished application. https://cloud.google.com/code/docs/intellij/
Lift - Spring 2 Cloud Internal hygen inspired YAML generator. Templating engine is pluggable - and based on mustache/handlebars ATM. Pivotal internal only at the moment.
Docker Enterprise 3.0 Simplifies Kubernetes Management - It can identify and build, from Docker Hub, the containers needed and creates the Docker Compose and Kubernetes YAML files, Helm charts, and other required configuration settings.
Develop with Java on Kubernetes using Azure Dev Spaces - Generate the Docker and Helm chart assets for running the application in Kubernetes using the azds prep command.

In the next blogpost I will cover the automation toolchain for building docker images from source. Tools like jib, pivotal build service, cloud-native-buildpacks, s2i etc fall in that category.

Honorable Mentions

The Future Of Observability and Developer Business Intersect Dashboards

Not sure if y'all came across https://thenewstack.io/observability-a-3-year-retrospective/

I really liked this part and for me resonated with the value of PCF Metrics. We have a single unified firehose of information available to us which helps us achieve the things Susan mentions i.e. figure out the unknown unknowns ...

The Future of Observability

Three short years into this ride, I ponder the question; What’s next and where will this movement take us? I believe that in the next ~3 years, all three of those categories — APM, monitoring/metrics, logs, and possibly others — are likely to cease to exist. There will only be one category: observability. And it will contain all the insights you need to understand any state your system can get itself into.
After all, metrics, logs, and traces can trivially be derived from arbitrarily wide structured events; the reverse is not true.
Users are going to start to figure out that they are paying multiple times to store single data sets they should only have to store once. There is no reason to invest budget with separate monitoring vendors, logs vendors, tracing vendors, or APM vendors. If you collect data in arbitrarily wide structured events, you can infer metrics from those, and if you automatically append some simple span identifiers, you can use those same events for tracing views. Not only can you cut spending by 3-4X, but it’s phenomenally more powerful if you can use a single tool and fluidly flip back and forth between the big picture (“there’s a spike”) and drilling down to the exact raw events with the errors. Next, compute what outlier values they have in common, trace one of them, locate wherein the trace a problem lives, and figure out who else is impacted by that specific outlier behavior. All conducted in one single solution with all teams getting the same level of visibility.
Right now this is either a) impossible, or b) a human being has to copy-paste an ID from one system to another to the next. This is wasteful, slow, and cumbersome, and extremely frustrating for the teams that have to do this when trying to solve a problem. Tools create silos and siloed teams spend too much time arguing about the nature of reality instead of the problem at hand.

In the same vein - wonder what a perfect App Metrics dashboard looks like for the organization ?
Here is a sample soup 2 nuts source to business OKRs dashboard that you should emulate

Sunday, August 4, 2019

Failures in Microservices

As microservices evolve into a tangled mess of synchronous and asynchronous flows with multi-level fanouts it becomes important to think about failure and resiliency since that is pretty much a guaranteed outcome when the availability of the whole system is a multiplicative of all its downstream microservices and dependencies.

How does one systematically think about handling load, graceful degradation and load shedding in the face of impaired operation and sustained high load ? Google's SRE books contain excellent high level advice as it pertains to handling load and addressing cascading failures. I have prepared a actionable summary of a couple of chapters dealing with resiliency to win in the face of failure. Follow the notes here to create

Rigor and governance around Microservices frameworks and templates to enable systematic resiliency through circuit breakers and autoscaling for sustainable scale out of your System of Systems.

https://landing.google.com/sre/sre-book/chapters/addressing-cascading-failures/

https://landing.google.com/sre/workbook/chapters/managing-load/

Different types of resources can be exhausted

Insufficient CPU > all requests become slower > various secondary effects

1. Increased number of inflight requests

2. Excessively long queue lengths

- steady state rate of incoming requests > rate at which the server can process requests

3. Thread starvation

4. CPU or request starvation

5. Missed RPC deadlines

6. Reduced CPU caching benefits

Memory Exhaustion - as more in-flight requests consume more RAM, response, and RPC objects

1. Dying containers due to OOM Killers

2. A vicious cycle - (Increased rate of GC in Java, resulting in increased CPU usage)

3. Reduction in app level cache hit rates

Threads (Tomcat HTTP )

1. Thread starvation can directly cause errors or lead to health check failures.

2. If the server adds threads as needed, thread overhead can use too much RAM.

3. In extreme cases, thread starvation can also cause you to run out of process IDs.

File descriptors

Running out of file descriptors can lead to the inability to initialize network connections, which in turn can cause health checks to fail.

Dependencies among resources

Resource exhaustion scenarios feed from one another
DB Connections (Negative Indicator)

All this can ultimately lead to Service Unavailability > Resource exhaustion can lead to servers crashing leading to snowball effect.

How To Prevent Server Overload

Load test the server’s capacity limits,
Serve degraded results
Instrument servers to reject requests when overloaded - fail early and cheaply
Instrument higher-level systems to reject requests at reverse proxies, by limiting the volume of requests by criteria such as IP address, At the load balancers, by dropping requests when the service enters global overload and at At individual tasks
Perform capacity planning