Skip to content

Enable volume snapshots (vSphere CSI)

You’ll enable Velero to capture persistent volume data using vSphere CSI snapshots on an RKE2 cluster, so your backups include both Kubernetes resources and on-disk application state.

By default, UDS Core backs up Kubernetes resources only. Volume snapshots are disabled:

SettingDefault
snapshotsEnabledfalse
schedules.udsbackup.template.snapshotVolumesfalse
  1. Install and configure the vSphere CSI driver

    On your RKE2 cluster, set the cloud provider in your RKE2 configuration:

    config.yaml
    cloud-provider-name: rancher-vsphere

    Provide HelmChartConfig overrides for the CPI and CSI drivers. Three CSI overrides are critical: blockVolumeSnapshot must be enabled, configTemplate must be overridden to include the snapshot limit, and global-max-snapshots-per-block-volume must be set high enough for your retention policy.

    helmchartconfig.yaml
    ---
    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
    name: rancher-vsphere-cpi
    namespace: kube-system
    spec:
    valuesContent: |-
    vCenter:
    host: "<vsphere-server>"
    port: 443
    insecureFlag: true
    datacenters: "<vsphere-datacenter-name>"
    username: "<vsphere-csi-username>"
    password: "<vsphere-csi-password>"
    credentialsSecret:
    name: "vsphere-cpi-creds"
    generate: true
    ---
    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
    name: rancher-vsphere-csi
    namespace: kube-system
    spec:
    valuesContent: |-
    vCenter:
    datacenters: "<vsphere-datacenter-name>"
    username: "<vsphere-csi-username>"
    password: "<vsphere-csi-password>"
    configSecret:
    configTemplate: |
    [Global]
    cluster-id = "<rke2-cluster-id>"
    user = "<vsphere-csi-username>"
    password = "<vsphere-csi-password>"
    port = 443
    insecure-flag = "1"
    [VirtualCenter "<vsphere-server>"]
    datacenters = "<vsphere-datacenter-name>"
    [Snapshot]
    global-max-snapshots-per-block-volume = 12
    csiNode:
    tolerations:
    - operator: "Exists"
    effect: "NoSchedule"
    blockVolumeSnapshot:
    enabled: true
    storageClass:
    reclaimPolicy: Retain
  2. Create a VolumeSnapshotClass

    Define a VolumeSnapshotClass that tells Velero how to create snapshots using the vSphere CSI driver. Deploy this as a manifest in a Zarf package included in your bundle:

    volumesnapshotclass.yaml
    apiVersion: snapshot.storage.k8s.io/v1
    kind: VolumeSnapshotClass
    metadata:
    name: vsphere-csi-snapshot-class
    labels:
    velero.io/csi-volumesnapshot-class: "true"
    driver: csi.vsphere.vmware.com
    deletionPolicy: Retain
  3. Enable CSI snapshots in Velero

    Add the following overrides to enable CSI-based volume snapshots:

    uds-bundle.yaml
    packages:
    - name: core
    repository: registry.defenseunicorns.com/public/core
    ref: x.x.x-upstream
    overrides:
    velero:
    velero:
    values:
    - path: configuration.features
    value: EnableCSI
    - path: snapshotsEnabled
    value: true
    - path: configuration.volumeSnapshotLocation
    value:
    - name: default
    provider: velero.io/csi
    - path: schedules.udsbackup.template.snapshotVolumes
    value: true
  4. Create and deploy your bundle

    Terminal window
    uds create <path-to-bundle-dir>
    uds deploy uds-bundle-<name>-<arch>-<version>.tar.zst
Terminal window
# Verify snapshots are enabled on the schedule
uds zarf tools kubectl get schedule -n velero velero-udsbackup -o jsonpath='{.spec.template.snapshotVolumes}'
# Verify the VolumeSnapshotLocation exists
uds zarf tools kubectl get volumesnapshotlocation -n velero
# After a backup completes, check for volume snapshots
uds zarf tools kubectl get volumesnapshot -A

Success criteria:

  • snapshotVolumes is true on the schedule
  • A VolumeSnapshotLocation with provider velero.io/csi exists in the velero namespace
  • After a backup completes, VolumeSnapshot resources are created for each PVC
  • Snapshot count matches the number of PVCs in backed-up namespaces

To trigger a manual backup for testing, see Perform a manual backup.

Symptoms: Backups fail with a FailedPrecondition error in the Velero logs:

error executing custom action: rpc error: code = FailedPrecondition desc =
the number of snapshots on the source volume reaches the configured maximum (3)

Solution: Increase global-max-snapshots-per-block-volume in the vSphere CSI HelmChartConfig. A value of at least 10 is required for the default 10-day retention, with 12 recommended for buffer. See the snapshot limit guidance in Before you begin and update the [Snapshot] section in the CSI configTemplate in step 1.

Problem: VolumeSnapshotContents remain after backup deletion

Section titled “Problem: VolumeSnapshotContents remain after backup deletion”

Symptoms: Deleting a backup does not clean up the associated VolumeSnapshotContents in Kubernetes or in vSphere.

Solution: Be cautious when deleting backups that have been used for restores — Velero may attempt to delete VolumeSnapshotContents that are still in use by restored volumes. Velero’s garbage collection runs hourly by default.

These guides and concepts may be useful to explore next: