Critical to building a highly-available DDI network infrastructure, database failover refers to the mechanisms in place to eliminate a single point of failure—specifically, with regard to the data containers in a Private DNS/DDI configuration.
NS1 supports both automated and manual failover:
-
Manual failover
An operator chooses a data container to act as primary to which all read-write operations occur. Other data containers are acting as replicas. In the event the primary becomes unavailable, an operator can choose and manually switch one of the replicas to act as primary instead. -
Automatic failover
Automatic failover is a new feature introduced in version 2.2. Operators can configure a cluster of 3 or 5 data containers (a.k.a. “cluster mode”) which establishes a consensus and automatically promotes one of the members to become the primary data container. In the event the primary becomes unavailable, one of the remaining replicas will automatically be promoted to primary putting the cluster in a “degraded”, but functional state for read-write operations to continue. A cluster in a degraded state is restored to a healthy, steady-state when 3 of 3 (or 5 of 5) containers join the cluster.
You cannot go from un-clustered to clustered mode. This setting must be defined during the initial system setup via CLI with environment variables. It cannot be undone.
If configuring automatic failover via cluster mode, you are not able to manually change the assigned primary container after initial configuration. In other words, in the event of a failover, the remaining replicas elect a new primary between them (you cannot specify which container becomes the primary).
Key terms
-
Primary data container
This is the data container against which read/write operations occur. The designation of “primary” container is subject to change depending on configuration settings. If automatic failover is enabled, the primary container will change dynamically if the original primary container is unhealthy or unavailable. -
Replica data container(s)
Replica data container(s) serve as “hot standby” data containers—receiving updates from the primary data container and able to take over as a primary data container through automated- or manual failover. -
Data peers
This is an array of IP addresses or hostnames of other data containers in the cluster.WARNING
This setting must not include the IP address or hostname of the data container itself (in other words, you cannot configure a data container to be its own peer), or risk corrupting the database. -
Cluster mode
Cluster mode (true/false
orclustering_on/clustering_off
) indicates whether automated failover is used or not. When configuring cluster mode, you must have installed Private DNS/DDI software version 2.2.1 or later, and you must have three or five DATA containers—ideally on separate hosts—meeting minimum recommended specifications. -
Cluster ID (automatic failover only)
For configuration purposes, this is the unique identification value for each data container within a cluster. It is defined by the system administrator during setup. The cluster ID must be a unique value of 1-3 or 1-5, depending on the number of data containers in your configuration.NOTE
To clarify, the "cluster ID" is not a unique value for each cluster. Instead, it is the ID of each data container within a cluster. In other words, one cluster may include 3 or 5 data containers—each with its own cluster ID. -
Cluster size (automatic failover only)
An odd number of data containers establish consensus for the purposes of automatic failover. NS1 supports 3 or 5 node cluster sizes.
How it works
You cannot go from un-clustered to clustered mode. This setting must be defined during initial system setup via CLI with environment variables.
Configuring cluster mode (automatic failover)
Prerequisites
-
To implement automatic failover, you must have installed version 2.2.1 or higher.
-
You must have three or five DATA containers—ideally, on separate hosts—meeting minimum recommended specifications.
Recommendations
-
Install clustered data peers such that they are proximate in the same data center or, if using a cloud infrastructure, the same region across two availability zones.
WARNING
All data hosts MUST be within 10 ms roundtrip time of each other for clustered data. -
Perform container configuration via the command-line interface (CLI) when using cluster mode to run the data containers on each host.
-
Use independent hosts for each DATA container in the cluster.
Depending on your setup (3-node or 5-node), use the tables below as reference for configuring the data containers during initial setup.
Table 1: Sample configuration parameters for 3-node cluster
Containers | |||
data1 | data2 | data3 | |
--data_peers | data2, data3 | data1,data3 | data1,data2 |
--cluster_id | 1 | 2 | 3 |
--cluster_size | 3 | 3 | 3 |
--cluster_mode | clustering_on | clustering_on | clustering_on |
Table 2: Sample configuration parameters for 5-node cluster
Containers | |||||
data1 | data2 | data3 | data4 | data5 | |
--data_peers | data2, data3, data4, data5 | data1, data3, data4, data5 | data1, data2, data4, data5 | data1, data2, data3, data5 | data1, data2, data3, data4 |
--cluster_id | 1 | 2 | 3 | 4 | 5 |
--cluster_size | 5 | 5 | 5 | 5 | 5 |
--cluster_mode | clustering_on | clustering_on | clustering_on | clustering_on | clustering_on |
Instructions
When enabling cluster mode on the containers, use the code below to start each container individually (option A) or use the provided Terraform module for multi-node configuration (option B).
Option A: Configure and start each container individually.
To start each container individually, run the following on the first host. (Note: Replace the variable $TAG with the version of the software you are running.):
docker run -p 3301:3300 -p 9091:9090 -p 8681:8686 \
--sysctl net.ipv6.conf.lo.disable_ipv6=0 \
ns1inc/privatedns_data:$TAG \
--cluster_size 3 \
--cluster_mode clustering_on \
--data_peers data2,data3 \
--cluster_id 1
Option B: Configure all containers in the cluster at once.
To stand up all containers in the cluster at once, use the NS1 Terraform module and example multi-node configuration.