Load Balance NSX-T Managers using NSX-ALB (AVI)

The NSX-T Management Cluster comprises of three NSX-T Manager nodes to provide high availability and scalability. To support a single access point for the NSX-T Manager user interface and API, you can assign a VIP address for the NSX-T Management Cluster. Once the VIP is set, any UI and API requests to NSX-T are redirected to the virtual IP address of the cluster, which is owned by the leader node. This option has some disadvantages, these include:

  • VIP does not perform load balancing across the NSX Managers. This is especially important in large deployments where there is significant amount of change so one node 
  • VIP requires all the NSX Managers to be in the same subnet.
  • VIP recovery takes about 1 – 3 minutes in the event of a manager node failure.

To mitigate these disadvantages you can deploy an external load balancer. An external load balancer can provide the following benefits:

  • Load balance across the NSX Managers.
  • The NSX Managers can be in different subnets.
  • Fast recovery time in the event of a Manager node failure

With this in mind it would be completely logical to use NSX-ALB/AVI especially if your license for the product.

This guide will explain the fundamental elements for successfully deploying a virtual service for your NSX-T managers. It will not explain how to deploy a virtual service or create NSX-ALB pools.

Note: At the current time of writing, customers who have purchased NSX-T Advanced or Enterprise Licenses can deploy NSX ALB Basic Edition as an alternative to the existing NSX-T LB..

Management Plane:

For NSX-ALB the control plane (NSX-ALB/AVI Controllers) need to not reside outside an NSX-T managed segment or a VLAN backed transport Zone. There might be a number of reasons these include

Supporting Services outside of the NSX-T Cloud

Any NSX issues not impacting your ability to manage your load balancer/GSLB estate

VIPs

Based on VMware recommendations oh high API/Script usage, you should create two VIPs for the following reasons:

  • One VIP with source-IP persistence configured to handle all the authentication methods.
  • A second VIP without source-IP persistence for API and script usage.

Note: Use the first VIP for browser access to NSX Manager only.

Health Checks:

How to Check NSX-T manager health:

NSX-ALB (AVI) uses health checks to validate whether servers are working correctly and to accommodate additional workloads before load balancing them. These health checks can be basic ICMP, HTTP and HTTPS.

The above checks can be used, but with the NSX-T Control plane being so critical extra checks are required, especially if APIs/Automation is used and consumed within the platform.

NSX has a vast array of API that you can call directly. If you have API Tools like Postman, Paw (Mac), For example, you can run the following call to get the node health based on the key services:

GET /api/v1/reverse-proxy/node/health

The below code is an example of a health NSX-T manager node, notice the key service stating as “UP” and the health status as “true”

TTP/1.1 200 
Set-Cookie: JSESSIONID=3A1EFA33D7A802529E46CCDD674EDCC0; Path=/; Secure; HttpOnly
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Pragma: no-cache
Expires: 0
Strict-Transport-Security: max-age=31536000 ; includeSubDomains
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
X-Content-Type-Options: nosniff
vary: accept-encoding
Content-Type: application/json
Transfer-Encoding: chunked
Date: Mon, 06 Dec 2021 10:55:45 GMT
Connection: close
Server: NSX

{
  "healthy" : true,
  "components_health" : "MANAGER:UP, POLICY:UP, SEARCH:UP, NODE_MGMT:UP, UI:UP"
}

You can see the same services with the NSX manager GUI, see the NSX-T Manager Appliances Health & NSX-T Manager node health pictures below:

NSX-T Manager Appliances Health
NSX-T Manager node health

Create an NSX-T manager health check within AVI:

As previously mentioned above, we have the ability to run an API health check against each node of the pool (NSX manager nodes). But we would like our load balancer to run these health checks for us to determine their availability within the server pool. This is where health monitors come in. Health monitors perform this function by either actively sending a synthetic transaction to a server or by passively monitoring client experience with the server.

To create a health monitor click on the following within the AVI GUI:

  • 1: Templates
  • 2: Heath Monitors
  • 3: Create
Create an NSX-ALB Health Monitor

When you create a health monitor the below window appears:

AVI New Health Monitor

Add a name and description. the select the type to https.

NSX-ALB Monitoring the Health of NSX Managers

Now the new health monitor is set to https a new set of options appear, set the following:

  • 1: Heath Monitor Port – 443
  • 2: Authentication Type – Basic
  • 3: Username: NSX Username to query the NSX API
  • 4: Password The password for the user in username
AVI Heath Monitor https settings

Once the https settings are complete, we now need to fill out the API query and response. In the Client Request Header field enter the following:

GET /api/v1/reverse-proxy/node/health HTTP/1.1

In the Client Request Body field:

application/json

In the Server Response Data field:

"healthy" : true

in the Response Code field:

2xx 

The Completed form should look like the below output:

This health check is running same manual API query that is mentioned above. This enables AVI to determine the health status of the NSX-T managers within the pool.

Once complete click save and add the monitor to the virtual service that is set up for your NSX-Manager Nodes.

Validation

To validate the NSX-ALBhealth check we need to simulate a real life failure of an NSX service. For this blog, I will stop the policy service on one of my NSX-T manager nodes.

Before and service is stopped check the health of the pool, below is an example of a healthy pool before we fail any service.

NSX-T Manager Pool

To stop the NSX-T manager policy service run the following:

stop service policy

To confirm the node is unhealthy test the API health check manually. You can check using the NSX manager GUI as demonstrated below.

NSX-T API Manager Node Down
NSX-T Manager Cluster Degraded
NSX-T Manager UI Policy Down

The API query the policy service is confirmed as down we can visibly see that the health of the pool has diminished, but most importantly the health check reconsised the member that is causing the issue.

AVI NSX-T Manager Node Down

To bring the service backup

start service policy

Confirm the node is health by running the API query manually:

NSX-T API Manager Node Up

Check the pool status in the AVI Gui:

For more details of the outage we can look at the events within the pool.

AVI NSX-T Manager Pool Events

References:

VMware’s NSX-T manager external load balancer documentation can be found here:

https://docs.vmware.com/en/VMConfiguring an External Load Balancerdcdd-1D77-46CF-9F1E-AD9BE6BC55C1.html

AVI Heathcheck documentation:

https://avinetworks.com/docs/21.1/overview-of-health-monitors/