07 Apr 2020
Cross region, high-performing microservices architecture using NATS
Vijendra Singh  Tomar
Vijendra Singh Tomar
Software Engineer
bcf82cd5-23ea-5806-9d09-1d9fd44cbea3

One of the challenges with microservices-based applications lies in choosing the mechanism for inter-service communication. A microservices-based application is a distributed system running on multiple processes or services; the interaction between services is more than simple function calls and requires protocols such as HTTP, AMQP, or a binary protocol like TCP. Furthermore, to achieve the full advantages of microservices-based architecture, such as being able to scale, operate, and evolve each service individually, this communication must also be very loosely coupled.

Services typically interact with one of the 3 common patterns:

  • Request-response messaging: A producer produces a message and expects a response from each message consumer who received that message to know what the result of all executions was.
  • One-way messaging: A producer produces a message to a messaging channel and doesn’t expect or want a response from any consumer.
  • Publish-Subscribe messaging: A message is published to a topic and immediately received by all of the subscribers.

All three are very common communication patterns, and all are applicable for various use cases at Unbxd. However, every team/service builds its own solution without central platform-level support.

NATS

NATS is an open-source cloud-native messaging system. NATS solves the problems of performance and availability while staying incredibly lean. It is always on and available and uses a fire-and-forget messaging pattern. Its simplicity, focus, and lightweight characteristics make it a prime candidate for the microservices ecosystem.

  • Highly performant and lightweight
  • Clustered servers
  • Cluster aware clients
  • Text-based simple protocol
  • No external dependency like zookeeper, etc
  • Unlike many other messaging systems, no need to create topics and subscriptions before use.
  • Easy to extend and expand clusters using Gateways.
  • Wildcard support on subscription topics. NATS supports all three communication patterns described earlier.

NATS supports all three communication patterns described earlier.

b0b1754d-9d10-4185-8780-34c0a892005d_42.1.webp

2c572187-89c5-445c-8ee5-f9ff3006a972_42.2.webp

ebf47ab8-b075-4f01-b7fc-720443cdf832_42.3.webp

Use-Cases at Unbxd

Here we discuss a couple of use cases for leveraging NATS as a central messaging and communication platform at Unbxd.

Configuration management

We have a central configuration service responsible for managing all customer configurations. Various other services consume these configurations. Configuration updates need to be available to all consumer services in real time. Consumer services should also be able to subscribe to updates on only a subset of configurations.

a704ca4a-fa5e-48fb-a7c1-ff1fd0238e33_42.4.webp

NATS provides all the capabilities required to meet these requirements.

Configuration updates are published on NATS notifier with subjects as service names, for example, service. service-a.property-x, service.service-b.property-y or particular configuration like property.type-a, property.type-b. Consumer services subscribe to interesting topics and receive real-time updates. NATS wildcard subscription feature greatly simplifies the subscription logic by simply subscribing on service.service-a.*; this enables subscription on all updates on service-a properties. New configurations can be subscribed to simply by adding the new topic to a list of interesting topics.

Here is what happens in a step-by-step manner

  • The client publishes a configuration update to the config service.
  • Config service updates the local store and publishes the message on the NATS cluster on certain subjects.
  • NATS pushes the data to cluster nodes connected to the consumer interested in these subjects.
  • Consumer service receives the message and updates its local store.
Cross-region synchronization and replication

Unbxd serves customers globally, with its services deployed across multiple AWS regions. A customer's data resides in a specific region and is served only from that region. We have an application-level mechanism to redirect a request that has landed in the wrong AWS region due to various routing reasons back to the customer's home region. This adds considerable latency to the end user. Also, the inability to serve requests from a different region can cause an outage due to a region/service failure.

Cross-region replication thus becomes essential to serve requests from any region and a fault-tolerant system.

To be able to serve requests from any region, we need to replicate all our data, configurations, and services across multiple regions. Most of our services are stateless; hence, it becomes relatively easy to replicate services across regions with proper deployment practices. The major challenge lies in replicating configuration and data, which needs to be replicated in near real-time.

When we started with the design, we decided that each region would be an isolated entity, and there would be no inter-region service calls. Any request will be served completely by a single region only. We also decided that strong consistency is not a requirement. We rely on the eventual consistency model for replication, where differences are okay for a short period across regions. For example, it's okay if 2 different regions occasionally have slightly different configuration/data. These assumptions simplify the design to not deal with global locking, transactional update, inter-region reads, rollbacks, etc.

9d0f6ef6-f162-4e5a-8d82-cadeb7418874_42.5.webp

The diagram above shows the high-level design of cross-region infrastructure.

The design involves building a layer on top of the existing stack in each region, which can intercept the actions, publish them on a message broker across the regions, and replay the action.

Bridge

This service acts as a proxy between the actual service and the client. It proxies HTTP requests (Action) to downstream services. On successful response, It converts the request to an event entity, decides the topic on which the request needs to be published, and publishes the request on a message broker. Eventually, this service will be responsible for handling concurrency and retries.

Broadcaster

This service subscribes to the messages on the broker published by the bridge. Based on message metadata, it decides which other regions the event needs to be published in and publishes it on the message broker for different regions.

Re-player

This service listens to the messages of the broker. Based on message metadata, it converts the event into an HTTP request and makes an appropriate downstream HTTP call.

NATS cluster and gateway

NATS is responsible for asynchronous communication between the bridge, the broadcaster, and the replayer. NATS clusters are deployed independently in each AWS region; the NATS gateway forms a communication link between clusters in different regions. When a message is published on the NATS cluster on a given topic, if a subscription exists for that topic in another region, the NATS gateway is responsible for making that message available to that cross-region subscriber.

The diagram above shows how a single Action in Region 1 is replicated in Region 2.

  • The client requests an action in region 1. It submits the request on the bridge.
  • Bridge proxies the requests to downstream services in the same region.
  • Upon successful response, the bridge converts the action into an event and publishes it on the NATS cluster in the same region.
  • Broadcaster in the same region receives the event, reads the metadata, and determines which other regions event needs to be published.
  • Broadcaster publishes the message on the subjects (topic) of those regions.
  • NATS identifies subscribers in the other region and sends the data across the region via a gateway.
  • Replayer in Region 2 receives the event, converts it back into action, and calls the downstream (downstream info is embedded into the event in a region-agnostic manner).

Future

The NATS server, designed for high performance and simplicity, doesn't provide a persistent message store. NATS Streaming comes with a persistent store for having a log for the messages published over the NATS server. To make the system build on top of NATS to be reliable and resilient, we will explore NATS streaming as an alternative.