Note: The following instructions apply to non-enterprise HAProxy account users. If you have an HAProxy enterprise-level account, refer to these instructions.
Introduction
To improve resiliency and application performance, modern enterprises are deploying increasingly complex and distributed applications across multiple data centers. Global server load balancing (GSLB) is the intelligent steering of traffic across multiple, geographically-distributed points of presence (PoPs).
While most GSLB services route based solely on proximity and binary up/down monitoring, NS1 can take a more nuanced approach by ingesting relevant metrics directly from your load balancers to perform intelligent load shedding. In any distributed application environment, load shedding is a critical tool that can help optimize application delivery by preventing outages related to load or capacity constraints at the data center level.
NS1's load balancer integrations allow you to push metrics like system load, client response times, or connection counts to the edge of our DNS platform where the metrics are automatically taken into account when computing traffic steering decisions.
How it works
Load balancers are designed to distribute traffic reliably across multiple backend servers based on policies that look at metrics like connection counts, load, or response times. NS1's load shedding capabilities work using the exact same principal—but at a global, cross-data center level.
For example, if you have load balancers that are able to safely handle up to 10,000 requests per second before performance degrades, you can use HAProxy's Send Metrics module to send RPS numbers to NS1. On relevant DNS records you can then configure a low watermark of 8,000 and a high watermark of 10,000 on your datacenter.
If a load balancer reports that it's reached the low watermark of 8,000 connections, NS1 recognizes that the endpoint is in redline territory. Gracefully and automatically, NS starts to deprioritize the endpoint, instead responding to the query with the next best performing answer using other filters—such as geotargeting or Pulsar's latency-based routing. If the high watermark of 10,000 is reached, NS1 completely stops sending new users to that PoP until the load balancer indicates that the workload has returned to acceptable levels.
In this article:
- Example: Load shedding with NS1 + HAProxy
- Step 1: Setting up NS1 API data source & feeds
- Step 2: Setting up HAProxy to send metrics to the data feeds
- Step 3: Configuring the domain and A record
- Step 4: Configuring the NS1 filter chain
Example: Load shedding with NS1 + HAProxy
In this example, we’ll walk through the process of setting up load shedding with NS1 using HAProxy* across a simulated global network of 3 POPs.
*Note: Any software or hardware from which you can extract metrics can be used.
Example of the three points-of-presence (POPs):
- LGA01 – New York
- LAX01 – California
- LHR01 – United Kingdom
Actionable metric: Total connections to the HAProxy frontend
Step One: Setup NS1 API data source & feeds
1. Generate an API key in the NS1 portal (https://my.nsone.net).
2. Create an NS1 API data source, which acts as a container for the data feeds, with this call:
curl -sH 'X-NSONE-Key: ${API_KEY}' -X PUT 'https://api.nsone.net/v1/data/sources' -d '{"sourcetype": "nsone_v1", "name": "HA_PROXY_CONNECT"}'
Note: ${API_KEY} is your unique API key generated via the NS1 portal (http://my.nsone.net).
The returned JSON body will look like this:
{
"status":"ok",
"name":"HA_PROXY_CONNECT",
"feeds":[],
"config":{},
"id":"760e670096f4f59dec045bed383aac5c",
"sourcetype": "nsone_v1"
}
Note: The “id” field contains the UUID value we will use later in creating our webhook POST address. For example:
https://api.nsone.net/v1/feed/{id}...
3. Create the individual data feeds inside of the newly created data source. For example, the following call creates the data feed for the LGA01 HAProxy instance:
curl -sH 'X-NSONE-Key: ${API_KEY}' -X PUT 'https://api.nsone.net/v1/data/feeds/760e670096f4f59dec045bed383aac5c' -d '{"name": "lga01", "config": {"label": "lga01"}, "destinations": []}'
Notes:
- The last section of the URL is our data source ID (from the response in step 2).
- The name field is for human-readability purposes.
- The config.label value will be used in the POST values later.
4. Repeat the previous step (step 3) two more times, replacing lax01 with lhr01 (UK) and then lga01 (CA) to finish creating all three data feeds.
Step 2: Setup HAProxy to send metrics to the data feeds
Note: HAProxy does not have the ability to send to outbound webhooks directly—however, with a few simple modifications and basic CLI utilities, we can start sending HAProxy metrics to NS1 data feeds.
1. Access the LGA01 HAProxy server.
2. Edit /etc/haproxy/haproxy.cfg by adding the following line in the “global” section:
stats socket /var/run/haproxy.sock mode 600 level admin
Note: If you will need to use the socket interactively, add a timeout value to the global section as well to allow the socket to wait for input, for example:
stats timeout 1m
3. Reload HAProxy.
4. Verify the HAProxy stats socket is functioning appropriately by running the following commands.
echo "show info" | nc -U /var/run/haproxy.sock stdio
echo "show stat" | nc -U /var/run/haproxy.sock stdio
You should now see general information, as well as comma-delimited stats using netcat non-interactively.
5. Generate a new API key via the NS1 portal (http://my.nsone.net). This API key should be limited to data sources and data feeds. This key is used in the following steps (6-8).
6. Record the data source “id” from step 1.2. We’ll use this as the “SOURCEUUID.”
7. Construct three BASH scripts—one for each PoP—to gather current connections to the frontend. Note: Any language may be used for this script.
8. The following script will query the socket, grab the value(s) we want, and POST to the associated data feed.
#!/bin/bash
APIKey='apikeygoeshere'
SourceUUID='760e670096f4f59dec045bed383aac5c'
Region='lga01'
CurrConns=`echo "show info" | nc -U /var/run/haproxy.sock | grep CurrConns | cut -d " " -f2`
curl -sX POST -H 'X-NSONE-Key: '$APIKey
'https://api.nsone.net/v1/feed/'$SourceUUID -d '{"'$Region'":
{"connections": '$CurrConns'}}'
Note: The “region” value will change to reflect the specific POP on which it was set up.
9. Save the script as /root/shed_load.sh, and set it up to fire off once per minute in the root crontab by executing:
crontab -e
and then adding the following task:
*/1 * * * * /root/shed_load.sh
10. Repeat steps 2.6 to 2.8 for LAX01 and LHR01. Once complete, the metrics are ingested by the NS1 API data feeds.
Step 3: Configure the domain and A record
1. Each data feed can be attached to as many targets (records, answers, or groups) as needed. As an example, we created a record with three possible answers that correspond to each of PoP:
- 1.1.1.1 – LGA01
- 2.2.2.2 – LAX01
- 3.3.3.3 – LHR01
Step 4: Configure the NS1 filter chain
As a basic filter chain, we will use UP with up/down metadata attached to each answer. In the case of a production record, the answer-level metadata would be triggered by NS1 monitoring or another third party health check service. For this example, we have statically configured them to UP.
The next filter we use is GEOTARGET_COUNTRY, and attached location metadata of New York to 1.1.1.1, California to 2.2.2.2, and United Kingdom to 3.3.3.3. This establishes an initial baseline for targeting each requester to the geographically-closest PoP.
Our third filter is the SHED_LOAD filter, which will work in conjunction with the data feeds we have just configured. In order for load shedding to be effective, we need the record to be refreshed often, so we set the base record TTL at 30 seconds. In the SHED_LOAD filter itself, we select the Load metric of Active connections.
There will be three distinct metadata types that work in conjunction with this filter: high watermark, low watermark, and active connections. Each watermark can be different or even dynamic for each answer. This example assumes that all of our POPs have equivalent capacity, so we assigned a high watermark of 1900 and a low watermark of 1600 at the record level.
These limits are in turn inherited by all answers in this record, and we should begin to progressively remove an answer from service at 1600 concurrent connections and eliminate serving it entirely at 1900 concurrent connections.
The Active connections metadata is attached individually to each answer, selecting the HA_PROXY_CONNECT data source, and then selecting the matching data feed for the respective answer (lga01, lax01, or lhr01).
Finally, our last filter is SELECT_FIRST_N, and we set this equal to 1. This causes us to serve only the single best answer to the requesting recursive resolver which may otherwise introduce its own shuffling behavior to a sorted list of answers.
Once record configuration is complete and it is attached to the three data feeds, we are now able to see the connections are active in the NS1 portal (Integrations > Incoming).
This process can be extended to work with any arbitrary metric you wish, from anything that can POST simple JSON output to NS1 API data feeds.