VM snapshots on a vSAN datastore and their SPBM policy

When we create a snapshot of a VM on a vSAN datastore, delta disks (where all new writes go) inherit the storage policy from the base disk.

Our VM Test_VM_123 uses vSAN policy RAID – 1 mirror, which means there are at least two copies of the VMDK and a witness. After snapshot is taken, we see the same policy applied to delta disk.

But if we want to change a storage policy for Test_VM_123, we can change it for a VM Home object and base disk only. There is no option to change the policy for a “snapshot”/delta disk.

After the policy was changed for a base disk to FTT-0/ RAID-0 Stripe 4, we see the delta disk retained its FTT-1 policy.

This behaviour is described in VMware KB 70797 “Modifying storage policy rules on Virtual Machine running on snapshot in vSAN Data-store”. In order to keep all storage policies consistent across VM disks, it is recommended to consolidate all the snapshots before making a SPBM policy changes to a VM.

Direct file (.ova or .iso) upload to a vSAN datastore

Uploading files directly to vSAN is not the best option. There are many ways to move data to vSAN clusters. For .iso files you can use native vSAN NFS service. For moving VMs we have Storage vMotion, HCX can be used or Cross vCenter Workload Migration Utility fling (that is also included in vSphere 7.0 Update 1c release). The move-VM PowerCLI command is also possible between vCenters that don’t share SSO. vSphere Replication could also be an option or restore from a backup. Those methods are well documented and use supported APIs.

But if there is a corner case and a file needs to be put on vsanDatastore, this is also possible but not for all file sizes. System will not allow us to upload to the root path directly, we will have to create a folder on the datastore.

The first issue you will probably see after trying to upload a first file is the following:

Opening the recommended url and login into ESXi directly should be sufficient authorise us to upload files on vsanDatastore directly.

I tested it with some smaller files and up to 255 GB on vSAN 7.0 U1 cluster and the upload was successful:

but adding another file in this folder failed:

Also uploading a >255GB file in a new folder on the vsanDatastore failed. What I could find in ESXi vmkernel.log was the following:

write to large_file.ova (...) 1048576 bytes failed: No space left on device 

'cb954c60-5416-7dfa-6d87-1c34da607660': [rt 1] No Space - did not find enough resources after second pass! (needed: 1, found: 0)

It looks like uploading files do vsanDatastore directly bypasses the logic that stripes objects larger than 255GB into smaller components. Why is that?

Looking into the file path, you can determine its object UUID which in my case was: cb954c60-5416-7dfa-6d87-1c34da607660

You can use the following command to query this object in esxi directly:

esxcli vsan debug object list -u cb954c60-5416-7dfa-6d87-1c34da607660

Now we have the answer. Object type our our direct upload is vmnamespace and it’s like a container with fixed size of 255.00 GB. And this is the max number of GB that we can place there. By default it uses vSAN Default Storage Policy (FTT=1, mirror in this cluster).

How to run basic performance tests for HCX uplink interface

I believe the build-in HCX perftest tool should be used for every freshly deployed HCX Service Mesh before we start migrating VMs between sites. Although the test is just a benchmark (it uses iperf3, it is single threaded), it will give us an idea how fast the VM migration will be and what can be expected in production. With HCX perftest tool testing is easier than with native iperf3 because we don’t have to provide/remember any IP addresses of appliances on-prem and in the cloud ;-).

To start the test we have to ssh to HCX manager as admin and select the IX appliance we want to test:

>ccli

>list

> go x -> select your service mesh appliance

> perftest -> to check available options:

Available Commands:
all perftest uplink, ipsec, wanopt and site in one command
ipsec iperf3 perf testing against ipsec tunnels
perf iperf3 perf testing
reachability Ping remote peers to test reachability.
site iperf3 perf testing between sites
status Query the test status.
uplink iperf3 perf testing against uplink
wanopt tcpperf testing against WANOPT tunnels

Available flags are:

Flags:
-h, --help help for uplink
-i, --interval uint32 Interval in second to report. Default is 1 second. (default 1)
-m, --msgsize uint32 TCP maximum segment size to send.
-P, --parallel uint32 Number of parallel streams. Default is 1. (default 1)
-p, --port uint32 Listen port on server side. Default is 4500. -p 22 also allowed. (default 4500)
-T, --runtimeout uint32 Individual test duration in second. Default is 1 minute. (default 60)
-t, --timeout uint32 Total timeout in seconds. Default 10 min. (default 600)
-v, --verbose Show details during testing if set
.

PERFTEST SITE: GENERAL TUNNEL CHECK

>perftest site
++++++++++ StartTest ++++++++++

---------- Site-0 [192.0.2.33 >>> 192.0.2.34] ----------
Duration Transfer Bandwidth Retransmit
server workload started
[ 4] 0.00-30.00 sec 13.8 GBytes 3.96 Gbits/sec 365 sender
[ 4] 0.00-30.00 sec 13.8 GBytes 3.95 Gbits/sec receiver
Done

---------- Site-0 [192.0.2.33 <<< 192.0.2.34] ----------
Duration Transfer Bandwidth Retransmit
[ 4] 0.00-30.00 sec 14.8 GBytes 4.24 Gbits/sec 167 sender
[ 4] 0.00-30.00 sec 14.8 GBytes 4.23 Gbits/sec receiver
Done

The iperf3 native commands that are used for this test with default values :

iperf3 -c 192.0.2.34 -i 1 -p 9000 -P 1 -t 30

iperf3 -s -p 9000 -B 192.0.2.33

PERFTEST IPSEC: TEST INSIDE IPSEC

> perftest ipsec
++++++++++ StartTest ++++++++++

---------- Ipsec-0 [t_0, 192.0.2.37 >>> 192.0.2.45] ----------
Duration Transfer Bandwidth Retransmit
server workload started
[ 4] 0.00-30.00 sec 3.40 GBytes 973 Mbits/sec 0 sender
[ 4] 0.00-30.00 sec 3.39 GBytes 972 Mbits/sec receiver
Done

---------- Ipsec-0 [t_0, 192.0.2.37 <<< 192.0.2.45] ----------
Duration Transfer Bandwidth Retransmit
[ 4] 0.00-30.00 sec 3.40 GBytes 974 Mbits/sec 0 sender
[ 4] 0.00-30.00 sec 3.40 GBytes 973 Mbits/sec receiver
Done

---------- Ipsec-1 [t_0, 192.0.2.38 >>> 192.0.2.46] ----------
Duration Transfer Bandwidth Retransmit
server workload started
[ 4] 0.00-30.00 sec 3.40 GBytes 973 Mbits/sec 0 sender
[ 4] 0.00-30.00 sec 3.40 GBytes 973 Mbits/sec receiver
Done

---------- Ipsec-1 [t_1, 192.0.2.38 <<< 192.0.2.46] ----------
Duration Transfer Bandwidth Retransmit
[ 4] 0.00-30.00 sec 3.40 GBytes 974 Mbits/sec 0 sender
[ 4] 0.00-30.00 sec 3.40 GBytes 973 Mbits/sec receiver
Done

---------- Ipsec-2 [t_2, 192.0.2.39 >>> 192.0.2.47] ----------
Duration Transfer Bandwidth Retransmit
server workload started
[ 4] 0.00-30.00 sec 3.39 GBytes 971 Mbits/sec 0 sender
[ 4] 0.00-30.00 sec 3.39 GBytes 970 Mbits/sec receiver
Done

---------- Ipsec-2 [t_2, 192.0.2.39 <<< 192.0.2.47] ----------
Duration Transfer Bandwidth Retransmit
[ 4] 0.00-30.00 sec 3.39 GBytes 971 Mbits/sec 1181 sender
[ 4] 0.00-30.00 sec 3.39 GBytes 970 Mbits/sec receiver
Done

The iperf3 native commands that are used for this test with default values :

iperf3 -c 192.0.2.45 -i 1 -p 9000 -P 1 -t 30

iperf3 -s -p 9000 -B 192.0.2.37

PERFTEST UPLINK: UPLINK INTERFACE CHECK

> perftest uplink

Testing uplink reachability…
Uplink-0 round trip time:
rtt min/avg/max/mdev = 66.734/67.081/68.135/0.578 ms

Uplink native throughput test is initiated from LOCAL site.
++++++++++ StartTest ++++++++++

---------- Uplink-0 [te_0, a.a.a.a >>> b.b.b.b] ----------
Duration Transfer Bandwidth Retransmit
server workload started
[ 4] 0.00-60.00 sec 5.20 GBytes 745 Mbits/sec 5116 sender
[ 4] 0.00-60.00 sec 5.20 GBytes 744 Mbits/sec receiver
Done
---------- Uplink-0 [te_0, a.a.a.a <<< b.b.b.b] ----------
Duration Transfer Bandwidth Retransmit
server workload started
[ 4] 0.00-60.00 sec 4.55 GBytes 652 Mbits/sec 6961 sender
[ 4] 0.00-60.00 sec 4.55 GBytes 651 Mbits/sec receiver
Done

The iperf3 native commands that are used for this test with default values :

iperf3 -c a.a.a.a -i 1 -p 4500 -P 1 -B b.b.b.b -t 60

iperf3 -c a.a.a.a -i 1 -p 4500 -P 1 -B b.b.b.b -t 60

Keep in mind that this is the only test that uses 4500 TCP port by default. If you have only 4500 UDP port open (this is the standard HCX Uplink requirement), your test will fail. You will see probably something like this:

"Command error occurs: Error calling peer [a.a.a.a.a:9445]: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp b.b.b.b:9445: connect: connection refused"

PERFTEST ALL: ALL TESTS COMBINED

This test will run iperf for uplink, ipsec, wanopt and site.

>perftest all
========== PERFTEST ALL STARTING ==========
== WanOpt is Present ==
== TOTAL # of TESTs : 11 ==
== ESTIMATED TEST DURATION : 12 minutes ==
-T option to change individual test duration [default 60 sec]
-k option to skip 'perftest uplink' if tcp port 4500 or 22 not opened
== Are you ready to start ?? [y/n]:

USEFUL FLAGS

You can use more streams to saturate the pipe (-P), but keep in mind the test uses a single thread.

>perftest site -P 2
++++++++++ StartTest ++++++++++

---------- Site-0 [ 192.0.2.33 >>> 192.0.2.34] ----------
Duration Transfer Bandwidth Retransmit
server workload started
[ 4] 0.00-60.00 sec 16.8 GBytes 2.40 Gbits/sec 1498 sender
[ 4] 0.00-60.00 sec 16.8 GBytes 2.40 Gbits/sec receiver
[ 6] 0.00-60.00 sec 16.4 GBytes 2.35 Gbits/sec 1815 sender
[ 6] 0.00-60.00 sec 16.4 GBytes 2.35 Gbits/sec receiver
[SUM] 0.00-60.00 sec 33.2 GBytes 4.76 Gbits/sec 3313 sender
[SUM] 0.00-60.00 sec 33.2 GBytes 4.75 Gbits/sec receiver
Done
---------- Site-0 [ 192.0.2.33 <<< 192.0.2.34] ----------
Duration Transfer Bandwidth Retransmit
[ 4] 0.00-60.00 sec 19.0 GBytes 2.72 Gbits/sec 937 sender
[ 4] 0.00-60.00 sec 19.0 GBytes 2.72 Gbits/sec receiver
[ 6] 0.00-60.00 sec 19.5 GBytes 2.80 Gbits/sec 806 sender
[ 6] 0.00-60.00 sec 19.5 GBytes 2.79 Gbits/sec receiver
[SUM] 0.00-60.00 sec 38.5 GBytes 5.52 Gbits/sec 1743 sender
[SUM] 0.00-60.00 sec 38.5 GBytes 5.51 Gbits/sec receiver
Done

>perftest site -P 4
++++++++++ StartTest ++++++++++

---------- Site-0 [ 192.0.2.33 >>> 192.0.2.34] ----------
Duration Transfer Bandwidth Retransmit
server workload started
[ 4] 0.00-60.00 sec 9.22 GBytes 1.32 Gbits/sec 2108 sender
[ 4] 0.00-60.00 sec 9.21 GBytes 1.32 Gbits/sec receiver
[ 6] 0.00-60.00 sec 9.13 GBytes 1.31 Gbits/sec 2194 sender
[ 6] 0.00-60.00 sec 9.12 GBytes 1.31 Gbits/sec receiver
[ 8] 0.00-60.00 sec 9.20 GBytes 1.32 Gbits/sec 2288 sender
[ 8] 0.00-60.00 sec 9.19 GBytes 1.32 Gbits/sec receiver
[ 10] 0.00-60.00 sec 8.71 GBytes 1.25 Gbits/sec 2396 sender
[ 10] 0.00-60.00 sec 8.70 GBytes 1.25 Gbits/sec receiver
[SUM] 0.00-60.00 sec 36.3 GBytes 5.19 Gbits/sec 8986 sender
[SUM] 0.00-60.00 sec 36.2 GBytes 5.19 Gbits/sec receiver
Done
---------- Site-0 [ 192.0.2.33 <<< 192.0.2.34] ----------
Duration Transfer Bandwidth Retransmit
[ 4] 0.00-60.00 sec 10.2 GBytes 1.45 Gbits/sec 2071 sender
[ 4] 0.00-60.00 sec 10.1 GBytes 1.45 Gbits/sec receiver
[ 6] 0.00-60.00 sec 10.0 GBytes 1.43 Gbits/sec 1932 sender
[ 6] 0.00-60.00 sec 10.0 GBytes 1.43 Gbits/sec receiver
[ 8] 0.00-60.00 sec 10.2 GBytes 1.47 Gbits/sec 2149 sender
[ 8] 0.00-60.00 sec 10.2 GBytes 1.47 Gbits/sec receiver
[ 10] 0.00-60.00 sec 10.3 GBytes 1.47 Gbits/sec 2366 sender
[ 10] 0.00-60.00 sec 10.3 GBytes 1.47 Gbits/sec receiver
[SUM] 0.00-60.00 sec 40.7 GBytes 5.83 Gbits/sec 8518 sender
[SUM] 0.00-60.00 sec 40.7 GBytes 5.82 Gbits/sec receiver
Done

You can change MTU to test the best option (-m) and identify any MTU mismatch issues. You can also modify MTU settings in HCX Network Profile for Uplink profile.

> perftest site -m 1390
++++++++++ StartTest ++++++++++

---------- Site-0 [ 192.0.2.33 >>> 192.0.2.34] ----------
Duration Transfer Bandwidth Retransmit
server workload started
[ 4] 0.00-60.00 sec 30.6 GBytes 4.37 Gbits/sec 518 sender
[ 4] 0.00-60.00 sec 30.5 GBytes 4.37 Gbits/sec receiver
Done
---------- Site-0 [192.0.2.33 <<< 192.0.2.34] ----------
Duration Transfer Bandwidth Retransmit
[ 4] 0.00-60.00 sec 31.1 GBytes 4.46 Gbits/sec 270 sender
[ 4] 0.00-60.00 sec 31.1 GBytes 4.45 Gbits/sec receiver
Done

perftest site -m 9000
++++++++++ StartTest ++++++++++

---------- Site-0 [ 192.0.2.33 >>> 192.0.2.34] ----------
Duration Transfer Bandwidth Retransmit
server workload started
[ 4] 0.00-60.00 sec 29.4 GBytes 4.21 Gbits/sec 341 sender
[ 4] 0.00-60.00 sec 29.4 GBytes 4.20 Gbits/sec receiver
Done
---------- Site-0 [ 192.0.2.33 <<< 192.0.2.34] ----------
Duration Transfer Bandwidth Retransmit
[ 4] 0.00-60.00 sec 29.3 GBytes 4.19 Gbits/sec 307 sender
[ 4] 0.00-60.00 sec 29.2 GBytes 4.19 Gbits/sec receiver
Done

How does vSphere Replication 8.3 work with vSAN 7.0 U1?

VMware vSphere Replicaton Appliances VRA installed on Protected and Recovery sites enable replication of VMs between those locations. VRA is very popular and well documented. The question is how its latest edition integrates with vSAN 7.0 U1?

The best way to check is to install and test both solutions. The configuration proces is simple, we domwnload ova, deploy it on both sites, make sure an appliance is reachable by local vCenter (DNS and NTP are required) and a service account we use for registering VRA in vCenter has sufficient privileges. During installation, we get a Site Recovery plugin in a local vCenter and VR Agent on each local ESXi.

Configuring VRA in https://fqdn_vra:5480 VAMI portal

VRA can work as standalone solution providing replication service between clusters under the same vCenter. Paired with remote VRA offers a protection from a site failure. It integrates with vCenter we can setup a VM protection directly from vCenter UI.

vcsa-5 and vcsa-16 pairing

After site pairing is done, we can start replication tasks. In this example both sites are vSAN 7.0 U1 clusters. To check how vSAN SPBM integrates with VRA, I will create a task to change vSAN SPBM when replicating on the destination site.

VM aga_2 has a 400 GB VMDK disk on the source side. It has FTT-1 mirror SPBM policy with Object Space Reservation 100%. This means I get 2 copies of VMDK on the vSAN datastore and it occupies around 800 GB of space.

From the source vCenter I can now configure replication for this VM,

The steps are simple, we have to select a Target site (vCenter-16).

VRA checks if this VM can be configured for replication.

For a vSAN datastore at a Target site we can select (even per disk) a different vSAN storage policy than the one at a Source site. In this scenario I select FTT-0 SPBM, which means I will have just one copy od the VMDK (not recommended in production!). Sometimes it happens a budget for a remote site is limited and a storage space there may not be sufficient to store more copies of the data. I want to check if it is possible to replicate from FTT-1 to FTT-0. If yes, in real-life scenarios we could save some space and have for example RAID-1 mirror on a Source side and a RAID-5 on Destination.

vSAN SPBM at Destination site

The most important part of the replication setup is setting Recovery Point Objective and the number of points in time (snapshots) we can revert to in case of a failure at the Source site.

Configuring replication of aga2 to use vSAN storage policy ftt-0.
After first replication sync, on Target vCenter: vcsa-16, replica of aga2.vmdk has around 400GB, which means only one copy of the object is stored on the vSAN datastore.

After a successful syncing between Source and Target sites, we can RECOVER aga2 VM. This means it will be registered on Target site. It can also be powered on right away.

The VM was replicated with points of time/snapshosts available. Now we can revert to the selected snapshot from the Target site.
The VM is powered on at the Target site, change to FTT-0 vSAN policy was successful.

vSAN storage policies integrate with VRA. This is not a new feature introduced in 7.0 but I still remember some time ago replicated VMs had always Default vSAN Policy (2 copies) and manual SPBM refresh was required after recovering a VM. There was no way to save space on the Target site with the SPBM policy change as it had to be re-applied later.

What is new in 7.0 is the fact that vSAN Capacity Report in vCenter UI shows how much data is used by vSphere Replicaton. In previous versions we were not able to detect how much of a vSAN capacity was used by disk replicas.

A small advice at the end. VM replicas are not registered in a Target vCenter, so it is easy to miss the actual size they use. We will not see so many VMs under vCenter but the replicas will be there. Check usage breakdown regularly. If you keep many point in time copies of the VM, the number of the vSAN components will also grow (with FTT- policy each object will have 3+ components ). It is good to check Capacity utilisation in vSAN Skyline Health: “What if the most consumed host fails”. Component utilisation in vSAN environments that are target for vSphere Replication is usually high.

Extending a network with HCX Network Extension

Network Extension (NE) is a HCX service mesh appliance that helps to extend L2 network between two sites. It is used to provide network accessibility when migrating VMs between sites. Most popular use case is to use NE when migrating (via HCX or using other methods) VMs from on-prem site to cloud and back. It is also a little bit overused because the configuration is so easy and fast, we may want it stay there forever ;-). If this is the case, it is worth mentioning Mobility Optimised Networking (MON) NE feature would be needed for latency sensitive production workload. MON provides routing based on locality of source and destination VMs and prevents L2 Extension Tromboning. With MON VM in site B (remote) could communicate with other VMs in other segments without reaching site A where its gateway is located.

For my step by step demo I am using two locations: site A (on-prem) where network segment aga_test 10.99.99.1/24 is originally configured and site B (cloud) where the network aga_test will be extended. Site A uses NSX-T and DHCP is configured for my segment but NSX-T is not required, it can be any vSphere Distributed Switch VLAN/tagged network.

HCX-5 (site A, connector role) and HCX-1 (site B, manager role) are paired and NE service mesh appliances are deployed on both locations. NEs create unmanaged Encrypted Transport Tunnel between sites on the network link defined in Network Uplink Profile.

The goal is to enable L2 communication between vm1 in site A and vm2 in site B. Additional points are for making DHCP working on extended network.

aga_test is NSX-T 3.0 subnet: 10.99.99.1/24 with DHCP enabled
HCX service mesh with Network Extension appliance deployed between hcx-5 (site A) and hcx-1 (site B).
When NE appliance is deployed, we can create a Network Extension. Take a look at the description, “the default gateway for the network extension only exist at the origin site”, that is why MON may be useful.
We pick a network to extend from the list: aga_test.
This is the moment when we can enable MON. It is included in HCX Enterprise license. We provide gateway address and NE appliance that we want to use.
The network extension is ready in just a few minutes.
Service Mesh view provides more details on extended network: L2E_aga_test
vCenter on site B shows the extended network L2E_aga_test in the Network tab
Extended segment is visible in the Segments view in NSX-T on site B. Default Segment Security doesn’t allow DHCP so for the L2E_aga_test it has to be allowed.
Creating DHCP_Allow_Sec profile that allows to receive DHCP traffic for VMs on the extended network.
vm1 is deployed on Site A in aga_test network and has 10.99.99.107 address
vm2 is deployed on Site in L2E_aga_test extended network and got 10.99.99.131 address
vm1 pinging vm2
vm2 pinging vm1
The connectivity between vm1 and vm2 can be also verified using NSX-T Traceflow feature.