About Me

My photo
Rohit is an investor, startup advisor and an Application Modernization Scale Specialist working at Google.

Saturday, September 7, 2019

Data The Forgotten Middle Child of the Cloud Native Family

Most of the discussion around cloud native revolves around writing cloud native or cloud friendly 15 factor apps, decomposing monolithic applications with DDD or containerizing legacy apps with dockerization to Kubernetes, Cloud Foundry or other container orchestration platforms.

Unfortunately everyone forgets about Data!! 

The best resource I have found for data migration is https://www.martinfowler.com/articles/evodb.html and https://www.slideshare.net/ThoughtWorks/data-patterns-andrew-jones-by-thoughtworksThe last chapter of the Cloud Native Patterns book by Cornelia Davis has a chapter dedicated to this topic. 


Q. How do we manage the migration of data from the legacy services to the modernized services?  Some of our tables have millions of records and hundreds of columns.  

[Branch-by-abstraction] enables rapid deployment with feature development that requires large changes to the codebase. For example, consider the delicate issue of migrating data from an existing store to a new one. This can be broken down as follows:
  1. Require a transition period during which both the original and new schemas exist in production
  2. Encapsulate access to the data in an appropriate data type. Expose a Facade service to  encapsulate DB changes.
  3. Modify the implementation to store data in both the old and the new stores. Move logic and constraints to the edge aka services
  4. Bulk migrate existing data from the old store to the new store.
  5. This is done in the background in parallel to writing new data to both stores.
  6. Modify the implementation to read from both stores and compare the obtained data. Implement retry and compensations. Database Transformation Patterns cataloged like Data sync, data replication and migrating data.
  7. Leverage techniques like TCP Proxy for JDBC to understand the flow of data and transparently intercept traffic. Use Change Data Capture tooling to populate alternate datastores.
  8. When convinced that the new store is operating as intended, switch to using the new store exclusively (the old store may be maintained for some time to safeguard against unforeseen problems).
Managing Persistence: You will need to choose between creating a new DB or letting the old and new implementations share the same datastore. Separating the DBs is more complex if you need to keep them in sync, but it gives you a lot more freedom. If your old and new applications share a datastore, you’ll need to build a translation layer to translate between the old and new models. If you give your old and new applications separate datastores, be prepared to invest a lot of effort in tooling to synchronize the two DBs. If your DB synchronization mechanism writes directly to the DB, be careful you don’t violate any assumptions the application makes about being the sole writer.[Re-engineering Legacy Software]. Splitting data for microservices involves breaking foreign key relationships and managing constraints in the resulting services rather than at the database level. For shared mutable data you may need to split the schemas , keep the service together before splitting the application out into separate microservices. By splitting the schemas  but keeping the application code together, we can revert our changes or continue to tweak things without impacting any consumers of our service. Once we are satisfied that the DB separation makes sense, we can then think about splitting out the application code into two services.[Refactoring Databases]

Role of Streaming in Data Decomposition
The road to data flows through Kafka - We count on Kafka for consistency, strict ordering, replay, durability and auditability. Using microservices gives the ability to plug in new modules for encoding and enrichment in real time.CDC, data pumps, facades, caching and Event Shunting are emerging patterns  that allow streaming data platform teams to offer mainframe and legacy RDBMS events to microservices teams. Each team can build appropriate persistence and achieve multi-DC replication with streaming platform.
https://www.confluent.io/blog/spring-for-apache-kafka-deep-dive-part-2-apache-kafka-spring-cloud-stream

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.