Change Witness when a host is disconnected

Recently I got a question about this particular case described in Stretched Cluster Desing Considerations :

When a host is disconnected or not responding, you cannot add or remove the witness host. This limitation ensures that vSAN collects enough information from all hosts before initiating reconfiguration operations.

It seems to be true and valid for vSAN 6.7. Is it a problem that we cannot change witness in this type of a corner case?

It would be a problem if we were not able to change witness when a host is in maintenance mode meaning that it would not be possible to replace a witness during an upgrade or a planned reboot. But we CAN change a witness in such a situation.

Let’s check what will happen if a host disconnects.

vCenter web gui
RVC for vCenter

This is exactly the case described in documentation and confirms we need to have all the hosts connected to vCenter to be able to replace a witness.

Is there a workaround for a failed host scenario? Sure. We can always disconnect it and remove from the vSAN cluster and then change our witness. If a host is not available, vSAN will rebuild data on some other host anyway (if we have enough hosts), so this should not be a problem.

I have no idea why anyone would want to change witness exactly at the same time when a host is not responding. I hope to find an answer someday.

vSAN Health Check: “After 1 additional host failure” – how it is calculated?

This Health Check is very useful because it predicts how the vSAN cluster will behave from the storage and component utilization perspective after one host fails.

I have been always wondering how it is calculated.

3-Node cluster

Let’s take 3 node cluster as an example. In this case, there would be no additional host to rebuild data, so I guess the Health Check will just decrease the size of the vsanDatastore.

My hosts have two disk groups, each one has a size of 2×3.64 TB (TiB actually), each host contributes 14,56 TB to the vsanDatastore.

Overall sum of the disks size is 3×14,56TB = 43,68 TB. File system overhead is 54,66 GB (should be no more than 3% of the cluster’s capacity). vsanDatastore capacity is 43,68TB-54,66TB=43,62TB.

When one host is offline, the capacity will be 2/3 of 43.62 TB = 29,08 TB, exactly how it is shown on in the report below.

What about data? If I put one of the hosts in MM, I will have no option to do a full migration. I can Ensure Accessibility – migrating my FTT=0 VMDKs before MM. But when the same host goes offline unexpectedly, I will loose access to those VDMKs. There will be no option to migrate and to rebuild.

It looks like the second option is taken info consideration, hosts could have slightly different disks utilisation and we do not know which host may potentially fail and which one should be taken as an example for this Health Check . This is 6.7.U1 environment, in 6.7.U3 this is more straightforward, because Health Check shows what happens when most utilized host fails.

After one randomly chosen hosts fails we will have roughly 2/3 of the data available. Out of 43,62 TB available, we have 15,71 TB of free space, so 43,62TB-15,71 TB= 27,91TB. ~2/3*27,91 TB=18,60 TB data is written.

18,60/29,08 = ~62%.

4+ node cluster

For larger number of hosts this Health Check calculates disk space utilization after a host failure + after the data is rebuilt.

Let’s say we have a 20 TB vsanDatastore on 4 identical ESXi hosts. This means each host contributes 5 TB of storage. If one host goes offline, vsanDatastore size will drop from 20TB to 15 TB.

Imagine we have 14 TB of data written on those 4 hosts. Roughly 3,5 TB per host. If one of the servers is offline, others have to take over +3,5 TB, roughly +1,17 TB each.

So after one host failure, vsanDatastore will be 15 TB and 3* (3,5TB + 1,17 TB) = 14,01 TB will be physically written. Our Health Check will report 14,01/15 = 93% of a datastore usage.

This Health Check will definitely be predicting a potential issue. Any (even hypothetical) datastore usage above 90% always requires admin’s attention.

Playing with some non-obvious vSAN storage policy changes.

Having a small vSAN cluster might be a challenge during maintenance work. It might be even a bigger challenge when we face a host failure, especially we do not have +1 host to migrate data.

Imagine a situation when you have a 2-node vSAN cluster (esx-05 and esx-06) and one host (esx-05) is down. Our VMs will work because they will be reading and writing to the other copy of VMDK on esx-06. VMs that run on esx-05 will be restarted by HA on esx-06.

The question is will we be able to create NEW VMs?

With default SPBM: vSAN Default Storage Policy when we require 2 copies of the VMDK we would not have sufficient Fault Domains to create a vSAN object.

That is why with 2-Node clusters we should consider enabling one of the Advanced Policy Rules: Force Data Provisioning. This policy will create an object on the vSAN datastore event if it does not have enough Fault Domains available = just one host when there are 2 required (+ Witness).

With Force Provisioning enabled, we can create a new VM (VM5), even if host (esx-05) is not accessible. Both Witness node and the second host (esx-06) are available and form a vSAN cluster together. Creating VMs is possible because vSAN creates objects with FTT=0 policy. It means that temporarily we have just one copy do the data.

As soon as the second host (esx-05) is up, we can either manually immediately Repair Object (apply actions to be compliant to the SPBM) or we can wait till vSAN timer kicks in (default setting is 60min).

Here we can see that all of our objects are healthy because vSAN object health has a green status.

This screen shows that objects are complaint to their original policy of having two copies of the data.

Another question is will it work for Erasure Coding eg. RAID-5?

If Force Provisioning is enabled, we can for example create a VM when we have just 3 nodes out of 4 available. But objects will have again only FTT=0 policy.

Storage Policy Compliance is Noncompliant in this case, but VM runs fine on just one copy (but we cannot afford any other failures).

When our host is back, objects will be re-written to match Storage Policies = to be compliant to the policies.

It is worth to mention that there are some exceptions where Force Provisioning will not help us. When a VMDK has FTT=1 mirror policy and we have 3 out of 4 hosts available, we will not be able to change the policy to FTT=1 RAID-5, event when it has Force Provisioning setting enabled.

Will we be able to change the policy from FTT=1 RAID-5 to FTT=1 mirror when we have 3 out of 4 hosts running?

Yes we will.

If you want to play with vSAN Storage Polices, you can use VMware Hands On Labs like I did. There is a vSAN lab where can work on 8 ESXis and test our scenarios without touching our production evnironments.

Which gateway is used by vSAN kernel by default?

vSAN unlike vMotion does not have dedicated TCP/IP stack. This means it uses default gateway.

In clusters where vSAN uses a single L2 domain it is not a problem. In cases where there are multiple L2 domains within a cluster (stretched cluster, dedicated L2 domain per site, clusters that span racks in a Leaf and Spine topology) we need to define static routes to reach other L2 domains.

It is important to know that when you enter a dedicated gateway address for the vSAN network (Override default gateway for this adapter) it does not override the routing table on the ESXi host:

ESXi attempts to route all traffic through the default gateway of the default TCP/IP stack (Management) instead.

So far, the only option to route traffic via dedicated gateway for vSAN is to create a static route using this command:

esxcli network ip route ipv4 add --gateway IPv4_address_of_router --network IPv4_address 

Other useful commands:

esxcli network ip route ipv4 list

esxcli network ip route ipv4 remove -n network_ip/mask -g gateway_ip
esxcfg-route -l

Let’s not forget about the most important one after every change in the network: vmkping:

If you have Jumbo Frames configured in your environment, run vmkping with -d (disable fragmentation) and -s (size).

vCenter Server Appliance on a one-host vSAN

Usually we deploy vCenter when we have a datastore with enough free space available. In case of a brand new vSAN cluster installation we need (or at least we should have) a vCenter to activate vSAN but we need a vsanDatastore to install a vCenter. Classic chicken and egg situation.

Starting from vSAN 6.6 we do not have this issue any more, you can find more details in Release Notes for this version :

"You can create a vSAN cluster as you deploy a vCenter Server Appliance, and host the appliance on that cluster. The vCenter Server Appliance Installer enables you to create a one-host vSAN cluster, with disks claimed from the host. vCenter Server Appliance is deployed on the vSAN cluster"

How does it work? When you run vCenter Server Appliance Installer in step 7 you are asked to select a datastore, you can pick an option to “Install on a new vSAN cluster containing the target host”.

This option will create one-host vSAN Cluster and install vCenter on it – with a storage policy SPBM: FTT=0.

Step 8 will also allow us to claim disks for vSAN for this particular host.

After vCenter is deployed on one-host vSAN there are some things that we can check to confirm vSAN is running fine:

esxcli vsan cluster get

Sub-Cluster Member Count: 1 indicates one-host cluster.

Using esxcli vsan debug object list we can verify that VMDKs that belong to vCenter have FTT=1 SPBM policy but also Force Provisioning is enabled, that means we have just one copy of VMDK on this one-host cluster and this is not complaint to FTT=1 so the health state of the object is: reduced-availability-with-no-rebiuld. There is no other ESXi in the cluster to host a second copy and another one to host a witness so there is no way to satisfy FTT=1 for now.

Logging into web GUI of ESXi we see that vsanDatastore is created next to other VMFS datastore.

Looking into the Health tab we can verify the state of our object which matches the output of esxcli vsan commands.

It might me interesting to know, one-host vSAN does not have a vmknic configured.

After vCenter is deployed we definitely need to finish our vSAN configuration, vCenter should not run on FTT=0 longer than necessary.

Next steps include:

  • adding vSAN vmknics
  • adding remaining hosts to vSAN cluster
  • configuring disk gropus
  • configuring SPBM policies
  • applying licenses
  • configuring HA

and we are reminded to do so at the end of our installation: