All Things Cloud: 2019-07-28

Saturday, August 3, 2019

Making Eureka Service Discovery Responsive on PCF

Making Service Discovery More Responsive on PCF

Eureka service registries and Eureka clients are tuned for cloud scale deployment of applications. This results in tuning their respective server and client side caches to account for network brownouts and self-preservation in case of network partitions, failed compute or storage.

All this has the effect of the service registry getting stale sometimes especially in autoscaling or auto descaling scenarios. If autoscaling and scaling down is happening very fast then the REST or HTTP clients sometimes experience timeouts since the service IPs are outdated and the service registry has not been updated to the latest set of microservices app instances.

We ran into one such issue on PCF and configured Service Discovery in three ways to eliminate timeouts. Much of this detailed experimentation was done by my colleague Rohit Bajaj.

Ribbon Ping Configuration
BOSH DNS Polyglot discovery
Eureka server configuration to eliminate server timeouts

Summary

We Determined that Native Polyglot service discovery provided by Cloud Foundry as the optimal Configuration for Service Discovery. Error rate drops to 0-1% in the auto descaling scenarios as opposed to > 2% with other settings.

For your average spring, spring-boot, Java app that requires service discovery Eureka Service Registry fits the bill nicely; however if your workload is highly dynamic and you need close to 0 error rate when load balancing across transient service instances then BOSH DNS is better. When using Bosh DNS you don’t use Ribbon. So Custom Circuit Breaker + BOSH DNS replaces Ribbon + Eureka.

1. Ribbon Ping Configuration

See https://thepracticaldeveloper.com/2017/06/28/how-to-fix-eureka-taking-too-long-to-deregister-instances/

2. BOSH DNS Discovery

Polyglot service discovery introduces new capabilities with a familiar workflow. An app developer can configure an internal route to create a DNS entry for their app. This makes the app discoverable on the container network. A DNS lookup on an internal route returns a list of container IPs for applications corresponding to that particular internal route. docs

We recommend pairing BOSH service discovery with a robust circuit breaker like Resilience4J Sample code on Reslience4J + BOSH Polyglot discovery

3 Eureka Settings

Ideal setting for tuning Eureka clients and servers to be super responsive

# Ribbon Settings

ribbon.ServerListRefreshInterval = 50

# Eureka Client

eureka.instance.lease-renewal-interval-in-seconds = 10
eureka.client.initialInstanceInfoReplicationIntervalSeconds = 10
eureka.client.instanceInfoReplicationIntervalSeconds = 10
eureka.client.registryFetchIntervalSeconds = 10

# Eureka Server

eureka.instance.lease-expiration-duration-in-seconds = 30
eureka.server.eviction-interval-timer-in-ms = 20 * 1000
eureka.server.responseCacheUpdateIntervalMs = 10 * 1000
eureka.server.getWaitTimeInMsWhenSyncEmpty = 10 * 1000

Typically changing Eureka server side settings is not possible when the Eureka server is provisioned by the Spring Cloud Services tile.

# Eureka Server Self-Preservation

eureka.server.enableSelfPreservation = false

Self Preservation window is every 15 minutes
DEFAULT = 30s interval for `lease-renewal-interval`
N = 24
Number of heartbeats per minute to trigger self-preservation < 2 * 24 * 0.85 = 41 heartbeats

NEW = 10s interval , N = 24
Number of heartbeats per minute to trigger self-preservation < 6 * 24 * 0.85 = 123 heartbeats

Sunday, July 28, 2019

On Scaling Microservices

THOUGHTS ON SCALING MICROSERVICES

Much of this is a rehash of Susan Fowler's excellent book
Production-Ready Microservices Published by O'Reilly Media, Inc., 2016 http://shop.oreilly.com/product/0636920053675.do

# Qualitative and Quantitative growth scales of a Microservice

## Quantitative

Qualitative growth scales allow the scalability of a service to tie in with higher-level business metrics: a microservice may, for example, scale with the number of users, with the number of people who open a phone application (“eyeballs”), or with the number of orders (for a food delivery service). These metrics, these qualitative growth scales, aren’t tied to an individual microservice but to the overall system or product(s).
Quantitative

Business Metrics
Number of Health Care Claims Adjudicated
Number of Insurance claims processed

## Qualitative

If the qualitative growth scale of our microservice is measured in “eyeballs”, and each “eyeball” results in two requests to our microservice and one database transaction, then our quantitative growth scale is measured in terms of requests and transactions, resulting in requests per second and transactions per second as the two key quantities determining our scalability.

RequestsPerSecond/QueriesPerSecond/TransactionsPerSecond
HTTP Throughput
CPU Utilization
Memory
Latency
(negative - scaling) Threadpool saturation
(negative - scaling) Number of open database connections _is it near the conn limit_

When dealing with complex dependency chains, making sure that all microservice teams tie the scalability of their services to high-level business metrics can make sure that all services are properly prepared for expected growth, even when cross-team communication becomes difficult.

## What To Monitor For Each Microservice Availability

### Infrastructure Metrics

CPU utilized by the microservice across all containers
RAM utilized by the microservice across all containers,
The available threads,
The microservice’s open file descriptors (FD)
The number of database connections that the microservice has to any databases it uses.

### Monitor the availability of the service

Service-level agreement (SLA) of the service,
Latency (of both the service as a whole and its API endpoints), -
Success of API endpoints, Responses
Average response times of API endpoints, the services (clients) from which API requests originate (along with which endpoints they send requests to),
Errors and exceptions (both handled and unhandled), and the health and status of dependencies.

# Monitoring ADVICE

CUSTOM DASHBOARD FOR EACH MICROSERVICE that ALERT FOR EACH MICROSERVICE FOR KEY METRICS
Normal, Warning and Critical Alerts
On call Runbook procedure for remediating all alerts
Low level Remediations should be automated

A microservice should never experience the same exact problem twice.

About Me

Saturday, August 3, 2019

Making Eureka Service Discovery Responsive on PCF

Making Service Discovery More Responsive on PCF

1. Ribbon Ping Configuration

2. BOSH DNS Discovery

3 Eureka Settings

# Ribbon Settings

# Eureka Client

# Eureka Server

# Eureka Server Self-Preservation

# References

# CODE

# Monitoring

# Autoscaling

# References

Sunday, July 28, 2019

On Scaling Microservices

THOUGHTS ON SCALING MICROSERVICES

# Qualitative and Quantitative growth scales of a Microservice

## Quantitative

## Qualitative

## What To Monitor For Each Microservice Availability

### Infrastructure Metrics

### Monitor the availability of the service

# Monitoring ADVICE

Making Eureka Service Discovery Responsive on PCF

Making Service Discovery More Responsive on PCF

1. Ribbon Ping Configuration

2. BOSH DNS Discovery

3 Eureka Settings

# Ribbon Settings

# Eureka Client

# Eureka Server

# Eureka Server Self-Preservation

# References

# CODE

# Monitoring

# Autoscaling

# References

On Scaling Microservices

THOUGHTS ON SCALING MICROSERVICES

# Qualitative and Quantitative growth scales of a Microservice

## Quantitative

## Qualitative

## What To Monitor For Each Microservice Availability

### Infrastructure Metrics

### Monitor the availability of the service

# Monitoring ADVICE