In my past blog articles, I wrote about the integration of Antrea with NSX-T and how to implement NSX ALB as ingress controller for a K8s cluster.
This blog is about the integration of NSX-T and NSX ALB as a combined solution the networking within Tanzu and K8s.
If you read this blog, you will learn what is required to deploy Tanzu with NSX-T and NSX ALB and how to deploy it. As a prerequisite you should already know how to deploy NSX-T and NSX ALB, since this will not be covered.
Since vSphere with Tanzu is available the best practices for the deployment changed several times. Starting with vSphere 7 it was possible to deploy vSphere with Tanzu in the following two flavors.
Few years ago, VMware by Broadcom bought AVI Vantage as an advanced Load Balancing solution to increase the feature set for Load Balancing requirements in the product line of VMware NSX. Starting with vCenter 7 update 2 VMware added NSX ALB (previously AVI Vantage) as a new flavor for the vSphere with Tanzu deployment and HA proxy was no longer officially supported.
Also, VMware by Broadcom announced the deprecation of the native NSX-T Load Balancer and the replacement of it with the NSX ALB. This does also mean all customers who are using the flavor with the NSX-T integration will be forced to migrate the Load Balancing features towards NSX ALB after some time, but the combined integration was not available for a while, but it was already possible to take advantage of NSX ALB as a K8s ingress controller as described in the blog article mentioned above.
Since September 2023 it is finally possible to combine the features of NSX ALB and NSX-T Datacenter for the deployment of vSphere with Tanzu. My recommendation for all future deployments of vSphere with Tanzu is to use this combined flavor of the integration, because NSX-T native Load Balancer will be removed in the NSX releases 5.x and a migration from other flavors of the integration might be challenging. So, if you fulfill the requirements mentioned in the next section you should take advantage of the combined integration and if not, you should plan the upgrade of your environment to fulfill the requirements.
The deployment of vSphere with Tanzu based on the combined integration with NSX-T and NSX ALB is recommended from my point of view, but you need to fulfill the following requirements to take advantage of the combined integration.
The networks might be different based on your decision if you want to use NAT to hide the Namespace CIDR or if you want to have the Namespace CIDR routed.
If you are using NAT, you require a Egress CIDR and the Namespace CIDR will not be routed, otherwise the Egress CIDR is not required but the Namespace CIDR have to be routable.
The following networks should always be routable through the NSX-T Multi-Tier setup towards the underlay.
Hint: If you fulfill all the requirements except the versions, NSX ALB will not be used for the deployment of the Load Balancing services. Instead, the deployment just uses the NSX-T native Load Balancer. Further you need to onboard the NSX ALB within your NSX-T Manager, but this will be described in detail within the next chapters.
First, you should deploy your vSphere Cluster with a minimum of three ESXi hosts within your vCenter. Afterwards you should deploy your NSX-T Manager, NSX-T Egdes, NSX ALB Controllers. I will not cover the basic setup of these components here.
After the basic setup is done, the NSX ALB must be configured with the NSX-T Cloud as shown below.
First, you will define a name for the NSX-T Cloud within the NSX ALB as well as the credentials. Further you need to fill the checkbox for “DHCP”, since the Service Engines will get IPs assigned from a DHCP server within NSX-T later on.
In the next step you will choose the overlay transport zone as well as a T1 router and an overlay segment for the Management IP assignment of the Service engines.
The same needs to be done for the default data network of the service engines, which can be used for some LB virtual services.
Hint: In the screenshot below, you will see several data networks. The network “seg-alb-data” is the one chosen manually as a default, the other two are automatically created trough Tanzu.
After the connection towards NSX-T is done, you need also connect the vCenter Server, which will be used to deploy the Service Engines later. Further you need to assign an IPAM profile which will be used for the IP assignment of the LB virtual services.
Within the IPAM profile you can add the data segment you did choose as a default, the networks required for Tanzu Ingress do not have to be configured and assigned manually.
After the NSX-T Cloud is configured and the connection between NSX ALB and NSX-T Manager is established, the NSX Alb needs to get additionally onboarded over the NSX-T Manager API. The required onboarding process will be done by the following API call.
curl --location --request PUT 'https://10.255.51.84/policy/api/v1/infra/alb-onboarding-workflow' \
--header 'X-Allow-Overwrite: True' \
--header 'Content-Type: application/json' \
--header 'Authorization: Basic YWRtaW46Vk13YXJlMSFWTXdhcmUxIQ==' \
--data '{
"owned_by":"LCM",
"cluster_ip" :"10.255.51.86",
"infra_admin_username" : "admin",
"infra_admin_password" : "VMware1!",
"ntp_servers": [
"depool.ntp.org"
],
"dns_servers": [
"10.255.51.66"
]
}'
Warning: The certificate of the NSX ALB Controller should not be replaced before the previous onboarding process is completed. If you replace the certificate in advance the API call will be successful, but the connection between NSX ALB and NSX-T Manager is somehow broken and the later Tanzu deployment will either fail or does use the NSX-T native Load Balancer. The reason might be, that the onboarding process does also replace the Controller certificate.
After the onboarding is finally completed you can check the status with the following API call.
curl --location 'https://10.255.51.84/policy/api/v1/infra/sites/default/enforcement-points/alb-endpoint' \
--header 'X-Allow-Overwrite: True' \
--header 'Authorization: Basic YWRtaW46Vk13YXJlMSFWTXdhcmUxIQ==' \
--header 'Cookie: JSESSIONID=FDA0832265D393BB21C3A9AA5B1D457D'
This will generate a similar output as shown below, where a valid certificate should be included. If there is not a valid certificate shown, this might be a hint that the onboarding process did not work as expected.
{
"connection_info": {
"username": "",
"tenant": "admin",
"expires_at": "2024-01-20T21:00:05.589Z",
"managed_by": "LCM",
"status": "DEACTIVATE_PROVIDER",
"certificate": "-----BEGIN CERTIFICATE-----\nMIIEFjCCAv6gAwIBAgIUZ33j5u3G6eyPXRPPh9cgvFcLJxYwDQYJKoZIhvcNAQEL\nBQAwgagxCzAJBgNVBAYTAkRFMRgwFgYDVQQIDA9SaGVpbmxhbmQtUGZhbHoxDjAM\nBgNVBAcMBU1haW56MRQwEgYDVQQKDAtldm9pbGEgR21iSDEWMBQGA1UECwwNQlUg\nRGF0YWNlbnRlcjEeMBwGA1UEAwwVYWxiY3RhbnNzMDEubGFiLmxvY2FsMSEwHwYJ\nKoZIhvcNAQkBFhJzc2NocmFtbUBldm9pbGEuZGUwHhcNMjMxMDE2MDYzNDU3WhcN\nMjQxMDE1MDYzNDU3WjCBqDELMAkGA1UEBhMCREUxGDAWBgNVBAgMD1JoZWlubGFu\nZC1QZmFsejEOMAwGA1UEBwwFTWFpbnoxFDASBgNVBAoMC2V2b2lsYSBHbWJIMRYw\nFAYDVQQLDA1CVSBEYXRhY2VudGVyMR4wHAYDVQQDDBVhbGJjdGFuc3MwMS5sYWIu\nbG9jYWwxITAfBgkqhkiG9w0BCQEWEnNzY2hyYW1tQGV2b2lsYS5kZTCCASIwDQYJ\nKoZIhvcNAQEBBQADggEPADCCAQoCggEBAKMT4qRbH++kB470x3BjRv/SeqJ9T04a\nLy3D+PgSZ7Gp+VR+y7k9KwbFjx6LCMP4OZ7mm820VHCLEwWCZ19cIxjJRNn4Me9a\no92Zk8Yk+iuIPdo6/ccfUicrXd/1AS3Cn39M7qINgfG6IY2S5BvDu0OJHcrG7NUa\nd1aR8ZlEiekgEyPrh+voppQSu+RmlNdVAI8B06aWVsfNyCRutkVgd6NFRhP5KNhi\nXK6VlSh4AWdJumcmgpFKxBiJl6Hz00fO1JZKf0MuwNOZxgqJfG1fz98FUiqp88AB\nQJhptrUS9aTeGZV3dVWJ0HuiNWTRv3dUXI7HoDsr7XggcY/Zv/CJSL0CAwEAAaM2\nMDQwMgYDVR0RBCswKYILYWxiY3RhbnNzMDGCFGFsYmN0YW5zczAxLmxhYi5sb2Ns\nhwQK/zNWMA0GCSqGSIb3DQEBCwUAA4IBAQAPubXVA9N5N2tUpBJSPrREjEXAt05x\ncioAHpCzNX7y2wRQMDVBgP4D5cS1JFHhXZv7f6+0L/7eKeH5EmjssBdJle1weQ1Y\nAxh9UEmuSRztNoRwnGsKVysbAAcxSW8L/XG1/Lgau7VbFRxv30j06GyI+BHUEOX6\n/qv1y6iBORbzOymMXdUHRSX/GqgyqfisuJ5l/a0ELtByAT6vrybOhwTBH3eJ+8Ay\nCj0qyxZecuuYLG+jlYtDW/oFZXq79YC5/NHZbKNOj+oKX7gWayYJ8ic6xI/5TnBd\n3yEaIx4XsOdRtYlRkCW64Ctecla01c4kuGuiRxESSm2508Jlk4DzINvA\n-----END CERTIFICATE-----\n",
"enforcement_point_address": "10.255.51.86",
"resource_type": "AviConnectionInfo"
},
"auto_enforce": true,
"resource_type": "EnforcementPoint",
"id": "alb-endpoint",
"display_name": "alb-endpoint",
"path": "/infra/sites/default/enforcement-points/alb-endpoint",
"relative_path": "alb-endpoint",
"parent_path": "/infra/sites/default",
"remote_path": "",
"unique_id": "5484649a-f7c1-498d-85d7-b1543ad2a3af",
"realization_id": "5484649a-f7c1-498d-85d7-b1543ad2a3af",
"owner_id": "00923316-f084-4c03-9927-c443c4275376",
"marked_for_delete": false,
"overridden": false,
"_create_time": 1697616121021,
"_create_user": "admin",
"_last_modified_time": 1705762805860,
"_last_modified_user": "system",
"_system_owned": false,
"_protection": "NOT_PROTECTED",
"_revision": 472
}
After the onboarding process is done, you are finally allowed to replace the NSX ALB controller certificate. Replacing the certificate is not optional, since the Tanzu deployment will check the Common Name and the Subject alternative names within the certificate, which needs to match with the FQDN used for the NSX ALB.
After NSX ALB is prepared you should also prepare your NSX-T setup. As already mentioned in the requirement section you need a T0 router which does have routing established towards the underlay router. Further Tanzu required a edge cluster with at least large sized edge nodes and a T1 router to connect the management network for the Tanzu Supervisor Control Plane nodes. An example of the T1 router and required overlay segment is shown in the screenshots below.
As shown in the screenshot above, the T1 router is “Distributed only” since no stateful services are required, further the route advertisement for “All Connected Segments and Service Ports” is required. The route advertisement for “All IPsec Local Endpoints” is enabled by default, but not required.
As shown above the segment is based on a transport zone from type “Overlay” and does have a subnet configured as well as an attachment to the previously created T1 router. Any other settings are default.
After all preparations are done, you can start with Tanzu deployment.
The deployment is exactly the same as for the deployment with NSX-T only integration. Tanzu will decide if NSX ALB will be used as Load Balancer instead of the native NSX-T load Balancer, based on the requirements and preparations mentioned above.
If you have anything prepared, but NSX-T is installed in a version below 4.1.1, the deployment will use the native NSX-T Load Balancer and not the NSX ALB. So take care any of the mentioned requirements are fulfilled.
In the screenshot below you can see the parameters configured for the management network.
The next screenshot shows the parameters for the Workload network like Edge Cluster, T0 gateway and the delivered network CIDRs.
In my case NAT mode is enabled and the egress CIDR therefore required.
All the other parameters for the Tanzu deployment are not mentioned in detail, but are still required. For the blog we will focus on the NSX-T and NSX ALB integration and will not describe all these parameters in detail.
In the following screenshot you can see an overview over the NSX architecture after the Tanzu deployment, like Supervisor cluster management T1 router and segment as well as the T1 router “domain-***” automatically created by the Tanzu deployment.
Further you can see a dedicated T1 router called “t1-ako-ingress”, which will be used for the AKO integration within a TKC workload cluster. This additional AKO integration is required to use NSX ALB for ingress services within a TKC workload cluster, since this is not covered by the deployment of the supervisor cluster.
The deployment of AKO is not described here, but a dedicated blog article is linked at the beginning.
Hint: For the AKO deployment in each TKC workload cluster you can use dedicated Service Engine Groups. This might also be required to prevent scaling issues, since each vSphere Namespace will cause a dedicated virtual network card assignment for the service engines and the limit of “Virtual NICs per virtual machine“ is 10. The limit is also shown under the following link.
https://configmax.esp.vmware.com/guest?vmwareproduct=vSphere&release=vSphere%208.0&categories=1-0
Based on this limit you can create a maximum of 8 vSphere Namespaces (10 vNICs – 1 vNic for Service Engine management and 1 vNic for the Supervisor cluster deployment).
This is at least a limitation for the moment, but hopefully there will be a solution for this in future versions.
As shown in the next screenshot no Load Balancers except the Distributed Load Balancer is created within NSX-T.
The distributed Load Balancer is not covered by NSX ALB, but as shown below all other Load Balancing Services like the Load Balancer for the K8s APi are created within the NSX ALB.
As you can see in the screenshot above the IPs used for the Load Balancing services are from the Ingress CIDR used within the deployment. The two other Load Balancing Services with the Subnet 10.255.56.x are from a specific vSphere Namespace created after the deployment as a day-2 operation.
This means the feature of “overwrite network settings” for each vSphere Namespace CIDR is still available and optional. An example is shown in the screenshot below.
As you can see dedicated subnets are defined for the different networks and NAT mode is disabled. These dedicated subnets will be used for all TKC clusters created within this vSphere Namespace.
If the deployment fails or the NSX-T native Load Balancer will be used instead of the NSX ALB, doublecheck the requirements and tasks described in the preparation section first.
In case the deployment did not complete as expected, you should check the log file “/var/log/vmware/wcp/wcpsvc.log”.
You will find details for the LB selection process inside this log and many more possible error messages which give you a hint why the deployment is failed.
The second possible problem might be the NSX ALB controller certificate, which should match the configured FQDN of the controller and should be visible in the API call to check the ALB onboarding process mentioned in the preparation section.