vSAN 7.0 U1 – new features spotted (part 1)

While browsing the familiar vCenter UI right after an upgrade from 6.7U3 to 7.0U1 I noticed some small but nice changes I wanted to share.

RAID_D for integrity

When we put a host in Maintenance Mode with Ensure Accessibility option we will see a new RAID type ->RAID_D. It guarantees that all new write operations have two copies of data (for data integrity). In the example below esxi-59 is in MM, so all new writes go to esxi-74 (as a backup for esxi-59) and to esxi-79. Note that Witness object in this example is also on esxi-74 node as the RAID_D component. It is because we have 3 node vSAN cluster.

Compression ONLY

If the workload is not dedupe – friendly but can benefit from compression, now there is an option to enable Compression only on vSAN all-flesh datastore. The bonus is (in comparison to dedupe & compression) that in case of a failure of a capacity disk, whole disk group will NOT be unmounted.

Datastore Sharing aka HCI Mesh

If you have more than 2 clusters under the same vCenter, you can mount vSAN datastore from the remote cluster and use as a spare capacity for VMs.

IOINSIGHT

Now the famous VMware fling IOInsight is integrated with vCenter UI performance tab. We can run an instance of it on the selected target hosts and monitor in detail I/O performance of VMs.

I/O Performance comparison between VMs

We can now pick up to 10 VMs and compare I/O performance between them for more detailed analysis on how does a certain VM behaves in comparison to others on a shared chart or use separated charts.

vSAN upgrade to 7.0 U1 – format change

When you upgrade your cluster to 7.0 U1 you get a new disk format version 13.0. And like always, if it is an upgrade from recent vSAN versions like 6.7, no data evacuation is required, it is metadata change only.

I was sure my update will be quick as previous ones but not this time ;-). I was surprised by a new activity in the Resyncing Objects dashboard called “Format change”.

It turns out that for 7.0 U1 objects that are greater than 255GB have to be rewritten to a new format. My cluster had many of such objects (like VMs with VMDKs of 1 TB ) so it took some time to change their format.

There is also a new Skyline Health test that shows how many objects need a new format:

Why the format change is needed? It is a kind of internal optimisation to allow vSAN to change policies or rebuild with less than 25-30% free space required. So slack space can be smaller from now on which leaves more capacity for workloads.

Is there always just one witness component for a vSAN object with mirror policy?

Some time ago we looked into a rare case where a vSAN object (VMDK) with FTT-1 mirror SPBM policy didn’t require a witness component.

So usually with FTT-1 mirror policy we have two components of the object and a witness component. What do you think happens with a witness component when instead of FTT-1, FTT-2 SPBM policy is assigned to an object ? FTT-2 means our object can survive a failure of its two components. For a VMDK object it means there will be three copies of this VMDK and two can be inaccessible without affecting VM’s I/O traffic.

For FTT-2 there will be more than just one witness component…

A VMDK will have three copies + 2 witness components. That is why FTT-2 policy requires minimum of 2n+1 = 5 nodes / ESXi hosts. And any two out of 5 can be inaccessible without affecting the service.

VMDK object with SPBM: FTT-2 mirror

How about FTT-3? FTT-3 means our object can survive a failure of its three components. For a VMDK object it means there will be four copies of this VMDK and three can be inaccessible without affecting VM’s I/O traffic.

A VMDK will have four copies + 3 witness components. That is why FTT-3 policy requires minimum of 2n+1 = 7 nodes / ESXi hosts. And any three out of 7 can be inaccessible without affecting the service.

VMDK object with SPBM: FTT-3 mirror

esxcli vsan debug object list is a command that can be used to list the components of the object , here is how it looks for a VMDK with FTT-3 SPBM policy:

esxcli vsan debug object list command

vSAN disk removal & evacuation throughput

How quickly can vSAN evacuate data from my disk? Great question and of course the answer is “it depends”. On disks performance, network performance but also on the current I/O load of the cluster.

It seems there are a lot of dependencies, but vSAN has it covered. Resync dashboards, resync throttling (use with care), fairnes scheduler, performance dashboards, pre-checks and esxtop – all of those are there to make the data evacuation easier, regardless of the reason of the evacuation. It can be a migration to a larger disk, disk group rebuild or moving from hybrid to all-flash cluster.

Let’s check those tools.

Fairness scheduler is a mechanizm that keeps a healthy balance between frontend I/O (VM I/O) and backend I/O (policy changes, evacuations, rebalancing, repairs). It is very important because the disk group throughput has to be shared between many types of I/O traffic and on one hand we want the resync to complete as fast as possible, on the other, it cannot affect VM I/O.

vSAN can distinguish traffic types, prioritize the traffic and adapt to the current condition of the cluster.

It is like in the picture below. If there is no contention, resync or VM I/O use “full speed”. If VMs need more of the disk throughput, resync I/O is throttled for this period of time to max 20% of the disk throughput.

Pre-check evacuation option will help you to determine what components of the object can be potentially affected by the disk evacuation. You can select the preferred ‘vSAN data migration’ option: this can be a full migration, ensuring the accessibility of the object or no data migration at all.

The potential status of the components can vary. The pre-check will also let you know if you have enough space to move the data. Non-Compliant status of the object means the object will be accessible but it will not be compliant to the vSAN Storage Policy. At least for some time till the rebuild is finished.

You can use resyncing objects dashboard to monitor the status if the vSAN background activities. With each activity you can check the reason for it. It can be a decommissioning like on the screen below when we remove a disk and evacuate data.

The full list of the reasons (from Resyncin objects tab):

  • Decommissioning: The component is created and is resyncing after evacuation of disk group or host, to ensure accessibility and full data evacuation.
  • Compliance: The component is created and is resyncing to repair a bad component, or after vSAN object was resized, or its policy was changed.
  • Disk evacuation: The component is being moved out because a disk is going to fail.
  • Rebalance: The component is created and is resyncing for cluster rebalancing.
  • Stale: Repairing component which are out of sync with it’s replica.
  • Concatenation merge: Cleanup concatenation by merging them. Concatenation can be added to support the increased size requirement when object grows.

You can also set Resync Throttling, if you wish to manually control the bandwidth. Usually with vSAN fairness scheduler we do not need to modify anything, but it is good to know we can control it in case the need arises.

Frontend and backend traffic can be monitored with Performance Dashboard. It is very detailed. We can observe the traffic on the host level, disk group level or even on the disk level. All the traffic types are presented there.

If 5-min granularity is not enough (this is what is being offered by vCenter), we can always ssh to an ESXi. esxtop + x will show near real-time vSAN stats per host. To be able to observe rebuild traffic we need to use option ‘f’ and select ‘D’ for ‘recovery write stats’.

vSAN Health can be also used when we evacuate data from a disk. vSAN Object Health will show us how many objects are being rebuilt. And vSAN Health is also always a good place to check the overall status of the cluster.

Traffic Filtering as another great tool for remote POC

Traffic Filtering and Marking  is a vSphere Distributed Switch feature that protects virtualized systems from unwanted traffic and allows to apply QoS tags to certain type of traffic.

Distributed Switch requires Sphere Enterprise Plus license. But if you have vSAN, vDS is included in vSAN license so it will work on vSphere Standard license as well. 

This tool is also great for a remote POC or your home lab. It will help you to invoke a controlled vSAN host partition or isolation and learn the behavior of HA or vSAN cluster. Or verify if HA your configuration is correct. This is a great tool especially for testing vSAN stretched cluster behaviour. 

Sometimes creating a full site partition requires to involve a networking team, that may not be available on demand. This tool can help a virtualization team to test various scenarios on their own.

In the example below I used two DROP rules to discard the traffic coming to esx12 and esx13 in my 4-host cluster.

This way I created three cluster partitions: esx12, esx13 and the rest of the cluster (esx10 and esx11).

Right after I disabled the rules, cluster got back to normal.