About Me

My photo
Rohit is an investor, startup advisor and an Application Modernization Scale Specialist working at Google.

Saturday, August 3, 2019

Making Eureka Service Discovery Responsive on PCF

Making Service Discovery More Responsive on PCF

Eureka service registries and Eureka clients are tuned for cloud scale deployment of applications. This results in tuning their respective server and client side caches to account for network brownouts and self-preservation in case of network partitions, failed compute or storage.

All this has the effect of the service registry getting stale sometimes especially in autoscaling or auto descaling scenarios. If autoscaling and scaling down is happening very fast then the REST or HTTP clients sometimes experience timeouts since the service IPs are outdated and the service registry has not been updated to the latest set of microservices app instances.

We ran into one such issue on PCF and configured Service Discovery in three ways to eliminate timeouts. Much of this detailed experimentation was done by my colleague Rohit Bajaj
  1. Ribbon Ping Configuration
  2. BOSH DNS Polyglot discovery 
  3. Eureka server configuration to eliminate server timeouts 
Summary

We Determined that Native Polyglot service discovery provided by Cloud Foundry as the optimal Configuration for Service Discovery. Error rate drops to 0-1% in the auto descaling scenarios  as opposed to > 2% with other settings.

For your average  spring, spring-boot,  Java app that requires service discovery Eureka Service Registry fits the bill nicely; however if your workload is highly dynamic and you need close to 0 error rate when load balancing across transient service instances then BOSH DNS is better. When using Bosh DNS you don’t use Ribbon. So Custom Circuit Breaker + BOSH DNS replaces Ribbon + Eureka. 

1. Ribbon Ping Configuration


2. BOSH DNS Discovery

Polyglot service discovery introduces new capabilities with a familiar workflow. An app developer can configure an internal route to create a DNS entry for their app. This makes the app discoverable on the container network. A DNS lookup on an internal route returns a list of container IPs for applications corresponding to that particular internal route. docs

We recommend pairing BOSH service discovery with a robust circuit breaker like Resilience4J Sample code on Reslience4J + BOSH Polyglot discovery

3 Eureka Settings

Ideal setting for tuning Eureka clients and servers to be super responsive 

# Ribbon Settings 

ribbon.ServerListRefreshInterval = 50

# Eureka Client

eureka.instance.lease-renewal-interval-in-seconds = 10
eureka.client.initialInstanceInfoReplicationIntervalSeconds = 10
eureka.client.instanceInfoReplicationIntervalSeconds = 10
eureka.client.registryFetchIntervalSeconds = 10

# Eureka Server

eureka.instance.lease-expiration-duration-in-seconds = 30
eureka.server.eviction-interval-timer-in-ms = 20 * 1000
eureka.server.responseCacheUpdateIntervalMs = 10 * 1000
eureka.server.getWaitTimeInMsWhenSyncEmpty = 10 * 1000

Typically changing Eureka server side settings is not possible when the Eureka server is provisioned by the Spring Cloud Services tile.

# Eureka Server Self-Preservation

eureka.server.enableSelfPreservation = false

Self Preservation window is every 15 minutes
  DEFAULT = 30s interval for `lease-renewal-interval`
  N = 24
  Number of heartbeats per minute to trigger self-preservation < 2 * 24 * 0.85 = 41 heartbeats

  NEW = 10s interval , N = 24
  Number of heartbeats per minute to trigger self-preservation < 6 * 24 * 0.85 = 123 heartbeats

# References

- Spring Cloud Netflix Eureka - The Hidden Manual https://blog.asarkar.org/technical/netflix-eureka/
- The Mystery of Eureka self-preservation https://medium.com/@fahimfarookme/the-mystery-of-eureka-self-preservation-c7aa0ed1b799
- Eureka http://netflix.github.io/ribbon/ribbon-eureka-javadoc/index.html
- Spring Cloud Ribbon https://cloud.spring.io/spring-cloud-netflix/multi/multi_spring-cloud-ribbon.html
- B/g for Eureka https://app-transformation-cookbook-internal.cfapps.io/duplicates/replatforming/blue-green-with-eureka/3e7f94f6c49795ef347b70141a36c134/

# CODE

https://github.com/Netflix/eureka/blob/master/eureka-client/src/main/java/com/netflix/discovery/EurekaClientConfig.java
https://github.com/Netflix/eureka/wiki/Eureka-REST-operations
https://github.com/Netflix/eureka/tree/master/eureka-core/src/main/java/com/netflix/eureka/registry

# Monitoring

https://docs.newrelic.com/docs/apis/get-started/intro-apis/understand-new-relic-api-keys
https://github.com/micrometer-metrics/micrometer/blob/master/implementations/micrometer-registry-new-relic/src/main/java/io/micrometer/newrelic/NewRelicMeterRegistry.java
https://docs.newrelic.com/docs/insights/insights-data-sources/custom-data/send-custom-events-event-api
https://docs.spring.io/spring-boot/docs/current/reference/html/production-ready-metrics.html#production-ready-metrics-export-newrelic
https://micrometer.io/docs/registry/new-relic
https://github.com/TechPrimers/spring-boot-1.5-micrometer-prometheus-example
https://github.com/cloudfoundry/java-buildpack-metric-writer/tree/master/java-buildpack-metric-writer-common/src/main/java/org/cloudfoundry/metrics
https://docs.pivotal.io/pivotalcf/2-6/metric-registrar/using.html

# Autoscaling

https://docs.pivotal.io/pivotalcf/2-6/appsman-services/autoscaler/using-autoscaler.html
https://www.toptal.com/devops/scaling-microservices-applications
https://speakerdeck.com/adriancole/observability-3-ways-logging-metrics-and-tracing?slide=3
https://github.com/TechPrimers/spring-boot-1.5-micrometer-prometheus-example/blob/master/pom.xml
https://medium.com/finc-engineering/autoscaling-microservices-on-aws-part-1-c8488c64f6d1
https://docs.pivotal.io/pivotalcf/2-4/appsman-services/autoscaler/using-autoscaler-cli.html
https://nephely-io.github.io/app-autoscaling-calculator/?source=post_page
https://docs.pivotal.io/pivotalcf/2-6/appsman-services/autoscaler/using-autoscaler.html

# References 

How to improve the eviction policy in the Eureka Service Registry
https://thepracticaldeveloper.com/2017/06/28/how-to-fix-eureka-taking-too-long-to-deregister-instances/

Working with load balancers
https://github.com/Netflix/ribbon/wiki/Working-with-load-balancers

Autoscaling using HTTP Throughput & Latency metrics
https://community.pivotal.io/s/article/autoscaling-using-http-throughput-latency-metrics

Client Side Load Balancer: Ribbon
https://cloud.spring.io/spring-cloud-netflix/multi/multi_spring-cloud-ribbon.

Spring Boot Actuator: Production-ready features
https://docs.spring.io/spring-boot/docs/current/reference/html/production-ready-metrics.html

Observability
https://speakerdeck.com/adriancole/observability-3-ways-logging-metrics-and-tracing?slide=21

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.