esxcli vsan cluster unicastagent list

This may often happen in a nested vSAN environments in our home labs. We play with networking, remove vSAN kernels, put vCenter down, remove hosts from vSAN cluster…and there is this one step too far that results in having all our objects inaccessible, including vCenter. To be able to access the data (it is stored securely on the disk groups) we need to re-create our cluster back again.

How this can be done without vCenter? vSAN works fine when vCenter is down, but what happens when vCenter IS actually down and cluster is broken or needs to be reconfigured?

vSAN Health in ESXi web interface is a good start to asses the “damage”. If all of the hosts are isolated, all of them will be masters of their own single-node vSAN. If we do not see any other hosts in Hosts tab, this means the host does not see any of its neighbors in the vSAN network.

What we could do next is to ssh to all esxi hosts and check the cluster status with the command: esxcli vsan cluster get.

This will confirm that hosts are isolated or will help us to determine how the cluster is partitioned.

vmkping -I vmkX x.x.x.x will always help us to check if this is a network problem of the nested host. In this scenario we assume network works fine, pings are successful but nested hosts somehow cannot form the cluster.

It is vCenter’s role to inform hosts about their vSAN neighbors when we form the cluster but in this case we need to do this manually.

We need to “inform” hosts about their neighborhood (vSAN uses unicast). On the screen below we see 4 vSAN 7.0 hosts with vmk2 tagged with vsan traffic.

Every host should have a list of other host in a cluster. We can check it using esxcli vsan cluster unicastagent list.

If the cluster runs fine, this command shows the complete list of the neighbors from the single host perspective. Here we can see esxi-13 seeing all three other hosts on their vSAN network on vmk2.

On the screen below we can see that the host esxi-10 sees only esxi-11 and esxi-12.

Assuming network is fine, vCenter is down and won’t help us with this issue, we need to fill gaps in the unicastagent lists manually. Just remember, never add the IP of the host whose table is being configured. Here is the command we have to use:

esxcli vsan cluster unicastagent add -t node -u <Host_UUID> -U true -a <Host_VSAN_IP> -p 12321

for esxi-10 we have to have esxi-11,12 & 13 on the list, for esxi-11 it will be esxi-10, 12, 13 etc.

If the lists are complete, cluster should instantly be recreated and objects available again. Check out the sub-cluster member count – it was 1 and now it is 4.

The cluster if formed back again and vCenter should be starting.

vSphere and vSAN 7.0 are GA!

Brand new vSphere and vSAN 7.0 binaries are available to download on my.vmware.com.

Check out the small sneak peak of 7.0, freshly installed on 4 vSAN all-flash hosts. We say goodbye to the old flash-based web client, we welcome VM hardware version 17 with watchdog timer (resetting the VM if the guest OS is no longer responding) and support for Precision Time Protocol, new re-written workload – centric DRS with scalable shares, vSAN memory consumption dashboards and many more….

“Why vSAN?” in 60 seconds

“If you can’t explain it simply, you don’t understand it well enough.”

Albert Einstein

I have an impression that for Pre-sales Engineers it is harder than for others ;-). Especially when you need to explain the benefits of a complex solution in a simple but also short way.

Keep it Sesame Street Simple, they say.

I am practicing my SSS skills on vSAN…

Usually in traditional 3 -Tier architecture under every vCenter we have a long list of datastores backed by different LUNs created on storage arrays from different vendors having various settings: mirrored LUN, RAID-10/5/6, deduplication ON/OFF, synchronous and asynchronous replications. They are thin-provisioned and have various used/free ratios. Admin can identify a suitable datastore for a VM by a name or a tag but it is not always sufficient, especially when storage and compute resources are managed by different teams. VMs can have many VMDKs with different performance and resilency requirements. How to keep it all in order?

With vSAN there is always one datastore per cluster. It uses storage policies that can be assigned on per-VMDK basis. This allows to allocate storage resources granularly and with an application requirements in mind.

vSAN makes storage management way more easier. It also simplifies the troubleshooting process. You do not have to figure out which datastore is used by the particular VM and troubleshoot it individually. The storage path from a VMDK to a physical disk can be tracked and analyzed in details in a vCenter.

The power of Stripe Width Policy

Number of disk stripes per object is a vSAN SPBM policy that I don’t see implemented often. It’s probably because vSAN performs great and doesn’t need additional tuning in most cases.

Just in case you are running vSAN on hybrid cluster (it improves destaging from cache to capacity) or have a case for a particular VMDK that would benefit from the access to the IOPS available through a disk group or disk groups, you might want to check this SPBM.

Let’s see some examples for a simple 3-node cluster.

VMDK with STRIPE=1 will have two copies on two hosts and a witness metadata on a third host. I am assuming the disk is less than 255GB, if it is more than 255GB, vSAN will stripe the object for us anyway.

[by the way the default stripe >255GB differs from stripe defined by SPBM, because the components could sometimes be placed on the same disks and when it is controlled by SPBM, there are always different disks]

VMDK with STRIPE=3 will also have two copies on two hosts and a witness metadata on a third host. BUT the copies will reside on 3 disks instead on 1 disk. On ESXi 2 one part of the stripe is on disk group DG1 and the second part is on DG2 – so there is an extra boost because two disk caches are used for this replica.

VMDK with STRIPE=4 will again have two copies on two hosts and a witness metadata on a third host. BUT the copies will reside on 4 disks instead on 1 disk. On ESXi 2 and ESXi 3 one part of the stripe is on disk group DG1 and the second on DG2 – so again there is an extra boost because two disk caches are used for this replica.

vSAN will place the components of the object automatically for us, we don’t have to configure it. The only we configure is the number of disks we want to use the stripe on. We also need to have enough disks available for stripe to happen.

Remember, objects can be striped across disks within a disk group but also across different disk groups in the same or even a different hosts.

Now, let’s see a different example, this time we have more hosts and erasure coding. Stripe also works together for RAID-5 and RAID-6 polices.

In this example we have 8 hosts. VMDK uses RAID-6 policy (min. 6 hosts are required for this). Imagine we have just one VM with only one VMDK that uses this 8 hosts exclusively. With STRIPE=1 the components of VMDK would be placed on 6 hosts only and 2 hosts will be “empty”. STRIPE > 1 will increase the probability that components will also be placed on other hosts and more disks (cache and capacity) will be used for better disk utilization.

I marked hosts in the middle as FD=2 and FD=5 just to emphasize that they have components of the same object for this particular VMDK.

Is witness metadata component always required for FTT=1 mirror SPBM?

Usually when we use a basic SPBM FTT-1 mirror policy and there is no stripe involved, we end up with 2 copies of the data on 2 different hosts and additionally a component metadata on a third one per each object to avoid a split brain scenario.

Like in this example below – two VMDKs on ESX1 and ESX2 and witness metadata on ESX3. For other objects placement will be probably different.

But this is not always the case. When objects are striped, witness metadata may not be needed. Here is an example of FTT-1, stripe=3 policy. One of VMDK objects is striped into RAID-0 on ESX1 and ESX4 (on 3 different disks ) and RAID-0 on ESX3 (on 3 different disks).

Here is how it looks like from the vote perspective: on ESX3 all three components have V=1, component on ESX4 has V=3 and components on ESX1 have V=1 and V=2.

In this case there is no witness metadata component required because this component distribution and votes prevent split brain scenario.

By the way, the component placement is done by vSAN automatically, we do not have to worry about votes and metadata components. But still, it is good to know how it works.