vSAN Disk Fault Injection

Remote Proof of Concept testing seems to be gaining in popularity recently. The major difference in on-site vs remote testing is the access to HW to test drive unplug or physical network failure. What I use in case of disk failure testing in a vSAN cluster is vSAN Disk Fault Injection script that is available on ESXi. There is no need to download anything, it is there by default, check your /usr/lib/vmware/vsan/bin path but use the script for POC/homelab only.

We need to have a device id do run the script, we can test a cache or capacity drive per chosen disk group. In the example below I picked mpx.vmhba2:C0:T0:L0 which was a cache drive (Is Capacity Tier:false).

You can use esxli vsan storage list for that:

Or check in the vCenter console under Storage Devices:

Or under Disk Management:

python vsanDiskFaultInjection.pyc has the following options:

I am using -u for injecting a hot unplug.

/var/log/vmkernel.log is the place you can verify the disk status:

vSAN-> Disk Management will also show what is going on with a disk group that faced a drive failure.

And now we can observe the status of the data and the process of resyncing objects due to “compliance”.

After we are done with the testing, simple scan for new storage devices on the host will solve the issue.