Istio is a powerful platform that can create services to mesh on a single cluster. However, there is an increasing need for supporting topologies where multiple clusters are interacting with each other and share microservices among them. Furthermore, in some cases the cluster might even be private (for example, IBM Cloud Private), which makes the integration between clusters even more challenging.
One option for an Istio multi-cluster has been introduced in Istio 0.8 and provides a way to expand the services mesh of a local cluster with services from remote cluster(s). In this approach, the user is installing only the critical components necessary to connect remote services to the local Istio mesh (for example, Sidecar Injector and Citadel). This approach creates a flat, single mesh of microservices. The advantages of one mesh are clear, yet there are some scenarios when this isn’t feasible to use this solution. One of the prerequisites for using it is that Pods/Services CIDR addresses are unique across the connected clusters. Some cloud providers can guarantee this, while there are other providers like IBM Cloud Kubernetes Service that can’t. IBM Kubernetes Service clusters have the same subnet ranges, which prevents users from using it to connect services across two IBM Kubernetes Service clusters, for example.
In this blog post, we present a different concept for Istio multi-clusters that leverages its core capabilities of routing and ingress/egress gateways to support sharing services between clusters. For the sake of simplicity we will describe it with a topology of two clusters but this can scale to a larger number of clusters. We illustrate this in our new code pattern, where we create a hybrid cloud by connecting services between IBM Cloud Private and the IBM Cloud Kubernetes Service clusters by using Istio.
Having different clusters with an Istio mesh running on each allows the operators to use Istio components to connect the clusters, while enjoying separated service meshes. One of the major benefits of separated meshes over a single mesh is that operators can selectively choose the services to be exposed to remote clusters.
Traffic can travel between the clusters as both have ingress and egress gateways allow traffic to enter or exit the clusters. Once we have those we can use Istio’s routing capabilities to selectively route traffic with a remote destination to the remote ingress through the local egress. The question is, how can we do this? The clusters operator explicitly exposes one or more selected services in a cluster under a common custom DNS suffix (e.g., *.remote) via the ingress gateway. Access to remote clusters can be granted by adding an Istio ServiceEntry object that points to the respective remote cluster’s ingress gateway for all hosts that are associated with the remote cluster. Routing rules (Virtual Services) are set up in such a way, that traffic to a remote service always traverses through the local egress gateway. If necessary, you can have a dedicated egress gateway that transfers all traffic for the remote cluster through it.
For example, a client (curl command) in Cluster A calls
http://service_b.remote, which gets routed from the local proxy to the local Egress Gateway. From the Egress Gateway the request traverses to the remote Ingress Gateway of Cluster B, which then routes the request to the appropriate local service
There is one hidden assumption in this approach – that the hostnames with the custom suffix are resolvable. Our services need to be able to resolve these custom hostnames to an IP address at runtime otherwise a resolution error will be thrown. We therefore need to configure the cluster DNS to return some IP address for all hostnames that have the custom hostname suffix. Kube-DNS itself is limited, but we can instead use a CoreDNS that has the Kubernetes plugin. CoreDNS provides the necessary configuration extensibility to resolve host names with the custom DNS suffix. We can configure CoreDNS to resolves all queried hosts with the *.remote suffix to return a fixed arbitrary IP address (e.g., 10.1.1.1). Sidecar’s Envoy simply ignores the invalid IP address and traffic will be captured by the sidecar which routes based on the HTTP Authority header or SNI name.
Each cluster can be configured to have an mTLS connections between the Istio control plane components and between the mesh services. Can we also have an end-to-end mTLS all the way between a local caller service and a remote-called service? The answer is yes, as long as there is a shared root of trust between both clusters and accessible for CA validation. Having a common root CA for all locally generated CAs will enable the mTLS connection between the egress gateway of Cluster A and ingress gateway of Cluster B. And vice versa. Each local Citadel can be configured with the common root CA as well as an upstream CA address and the same mechanism is used by Istio to generate and rotate certificates by the local Citadel. The only difference is the generated CAs will have the common root CA in their certificates chain. If necessary, the Istio Citadel can also play the role of the root CA and this can be achieved by running the Citadel as a standalone service on a cluster accessible by both clusters.
The concepts discussed above are shown in the following schematic components diagram:
Service A.2 from Cluster A is able to access Service B.1 on Cluster B with mTLS all the way from the source to the destination. On the other direction, Service B.2 is accessing Service A.1 which encapsulates its internal logic (calls to A.3) from the remote cluster. Cluster A’s operator can choose which services to expose. For instance, she may choose to expose only Service A.1 making services A.2 and A.3 inaccessible to Cluster B.