Welcome to Knowledge Base!

KB at your finger tips

This is one stop global knowledge base where you can learn about all the products, solutions and support features.

Categories
All

Storage and Backups-Nutanix

Nutanix Disaster Recovery Guide

Disaster Recovery (Formerly Leap) pc.2022.6

Product Release Date: 2022-07-25

Last updated: 2022-11-22

Nutanix Disaster Recovery Overview

Legacy disaster recovery (DR) configurations use protection domains (PDs) and third-party integrations to protect your applications. These DR configurations replicate data between on-prem Nutanix clusters. Protection domains provide limited flexibility in terms of supporting complex operations (for example, VM boot order, network mapping). With protection domains, you have to perform manual tasks to protect new guest VMs as and when your application scales up.

Nutanix Disaster Recovery offers an entity-centric automated approach to protect and recover applications. It uses categories to group the guest VMs and automate the protection of the guest VMs as the application scales. Application recovery is more flexible with network mappings, an enforceable VM start sequence, and inter-stage delays. Application recovery can also be validated and tested without affecting your production workloads. Asynchronous, NearSync, and Synchronous replication schedules ensure that an application and its configuration details synchronize to one or more recovery locations for a smoother recovery.

Note: You can protect a guest VM either with legacy DR solution (protection domain-based) or with new Nutanix Disaster Recovery . To see various Nutanix DR solutions, refer Nutanix Disaster Recovery Solutions.

Nutanix Disaster Recovery works with sets of physically isolated locations called availability zones. An instance of Prism Central represents an availability zone. One availability zone serves as the primary AZ for an application while one or more paired availability zones serve as the recovery AZs.

Note: Nutanix Disaster Recovery supports the use of Flow Virtual Networking enabled Virtual Private Clouds.
Figure. A primary on-prem AZ and one recovery on-prem AZ
Click to enlarge A primary on-prem AZ and one recovery on-prem AZ

Figure. A primary on-prem AZ and two recovery on-prem AZs
Click to enlarge A primary on-prem AZ and two recovery on-prem AZs

Figure. A primary on-prem AZ and two recovery AZs: one on-prem recovery AZ and one recovery AZ in Cloud (Xi Cloud Services)
Click to enlarge Disaster recovery to an on-prem AZ and Xi Cloud

Figure. A primary on-prem AZ and one recovery AZ at Xi Cloud Services
Click to enlarge Disaster recovery to an on-prem AZ and Xi Cloud

Figure. A primary Nutanix cluster and at most two recovery Nutanix clusters at the same on-prem AZ
Click to enlarge Disaster recovery to clusters at the same on-prem AZ

Figure. A primary AZ at Xi Cloud Services and recovery on-prem AZ
Click to enlarge Disaster recovery to an on-prem AZ

When paired, the primary AZ replicates the entities (protection policies, recovery plans, and recovery points) to the recovery AZs in the specified time intervals (RPO). The approach helps application recovery at any of the recovery AZs when there is a service disruption at the primary AZ (For example, natural disasters or scheduled maintenance). The entities start replicating back to the primary AZ when the primary AZ is up and running to ensure High Availability of applications. The entities you create or update synchronize continuously between the primary and recovery AZs. The reverse synchronization enables you to create or update entities (protection policies, recovery plans, or guest VMs) at either the primary or the recovery AZs.

This guide is primarily divided into the following two parts.

  • Protection and DR between On-Prem AZs (Nutanix Disaster Recovery)

    The section walks you through the procedure of application protection and DR to other Nutanix clusters at the same or different on-prem AZs. The procedure also applies to protection and DR to other Nutanix clusters in supported public cloud.

  • Protection and DR between On-Prem AZ and Xi Cloud Service (Xi Leap)

    Xi Leap is essentially an extension of Leap to Xi Cloud Services. You can protect applications and perform DR to Xi Cloud Services or from Xi Cloud Services to an on-prem availability zone. The section describes application protection and DR from Xi Cloud Services to an on-prem Nutanix cluster. For application protection and DR to Xi Cloud Services, refer the supported capabilities in Protection and DR between On-Prem AZs (Nutanix Disaster Recovery) because the protection procedure remains the same when the primary AZ is an on-prem availability zone.

Configuration tasks and DR workflows are largely the same regardless of the type of recovery AZ. For more information about the protection and DR workflow, see Nutanix Disaster Recovery Deployment Workflow.

Nutanix Disaster Recovery Terminology

The following section describes the terms and concepts used throughout the guide. Nutanix recommends gaining familiarity with these terms before you begin configuring protection with Nutanix Disaster Recovery or Xi Leap disaster recovery.

Availability Zone (AZ)

A zone that can have one or more independent datacenters inter-connected by low latency links. An AZ can either be in your office premises (on-prem) or in Xi Cloud Services. AZs are physically isolated from each other to ensure that a disaster at one AZ does not affect another AZ. An instance of Prism Central represents an on-prem AZ.

On-Prem Availability Zone

An AZ in your premises.

Xi Cloud Services

An AZ in the Nutanix Enterprise Cloud Platform (Xi Cloud Services).

Primary Availability Zone

An AZ that initially hosts guest VMs you want to protect.

Recovery Availability Zone

An AZ where you can recover the protected guest VMs when a planned or an unplanned event occurs at the primary AZ causing its downtime. You can configure at most two recovery AZs for a guest VM.

Nutanix Cluster

A cluster running AHV or ESXi nodes on an on-prem AZ, Xi Cloud Services, or any supported public cloud. Leap does not support guest VMs from Hyper-V clusters.

Prism Element

The GUI that provides you the ability to configure, manage, and monitor a single Nutanix cluster. It is a service built into the platform for every Nutanix cluster deployed.

Prism Central

The GUI that allows you to monitor and manage many Nutanix clusters (Prism Element running on those clusters). Prism Starter, Prism Pro, and Prism Ultimate are the three flavors of Prism Central. For more information about the features available with these licenses, see Software Options.

Prism Central essentially is a VM that you deploy (host) in a Nutanix cluster (Prism Element). For more information about Prism Central, see Prism Central Guide. You can set up the following configurations of Prism Central VM.

Small Prism Central
A Prism Central VM with configuration equal to or less than 8 vCPU and 32 GB memory. The VM hot adds extra 4 GB and 1 GB memory when you enable Leap and Flow respectively in small Prism Central.
Small Prism Central (Single node)
A small Prism Central deployed in a single VM.
Small Prism Central (Scaleout)
Three small Prism Centrals deployed in three VMs in the same availability zone (AZ).
Large Prism Central
A Prism Central VM with configuration more than 8 vCPU and 32 GB memory. The VM hot adds extra 8 GB and 1 GB memory when you enable Leap and Flow respectively in large Prism Central.
Large Prism Central (Single node)
A large Prism Central deployed in a single VM.
Large Prism Central (Scaleout)
Three large Prism Centrals deployed in three VMs in the same availability zone (AZ).
Note: A scaleout Prism Central works like a single node Prism Central in the availability zone (AZ). You can upgrade a single node Prism Central to scaleout Prism Central to increase the capacity, resiliency, and redundancy of Prism Central VM. For detailed information about the available configurations of Prism Central, see Prism Central Scalability in Prism Central Release Notes.

Virtual Private Cloud (VPC)

A logically isolated network service in Xi Cloud Services. A VPC provides the complete IP address space for hosting user-configured VPNs. A VPC allows creating workloads manually or by failover from a paired primary AZ.

The following VPCs are available in each Xi Cloud Services account. You cannot create more VPCs in Xi Cloud Services.

Production VPC
Used to host production workloads.
Test VPC
Used to test failover from a paired AZ.

Source Virtual Network

The virtual network from which guest VMs migrate during a failover or failback.

Recovery Virtual Network

The virtual network to which guest VMs migrate during a failover or failback operation.

Network Mapping

A mapping between two virtual networks in paired AZs. A network mapping specifies a recovery network for all guest VMs of the source network. When you perform a failover or failback, the guest VMs in the source network recover in the corresponding (mapped) recovery network.

Category

A VM category is a key-value pair that groups similar guest VMs. Associating a protection policy with a VM category ensures that the protection policy applies to all the guest VMs in the group regardless of how the group scales with time. For example, you can associate a group of guest VMs with the Department: Marketing category, where Department is a category that includes a value Marketing along with other values such as Engineering and Sales .

VM categories remain the same way on on-prem AZs and Xi Cloud Services. For more information about VM categories, see Category Management in Prism Central Guide .

Recovery Point

A copy of the state of a system at a particular point in time.

Crash-consistent Snapshots
A snapshot is crash-consistent if it captures all of the data components (write order consistent) at the instant of the crash. VM snapshots are crash-consistent (by default), which means that the vDisks that the snapshot captures are consistent with a single point in time. Crash-consistent snapshots are more suited for non-database operating systems and applications which may not support quiescence (freezing) and un-quiescence (thawing) and such as file servers, DHCP servers, print servers.
Application-consistent Snapshots
A snapshot is application-consistent if, in addition to capturing all of the data components (write order consistent) at the instant of the crash, the running applications have completed all their operations and flushed their buffers to disk (in other words, the application is quiesced). Application-consistent snapshots capture the same data as crash-consistent snapshots, with the addition of all data in memory and all transactions in process. Therefore, application-consistent snapshots may take longer to complete.

Application-consistent snapshots are more suited for systems and applications that can be quiesced and un-quiesced or thawed, such as database operating systems and applications such as SQL, Oracle, and Exchange.

Recoverable Entity

A guest VM that you can recover from a recovery point.

Protection Policy

A configurable policy that takes recovery points of the protected guest VMs in equal time intervals, and replicates those recovery points to the recovery AZs.

Recovery Plan

A configurable policy that orchestrates the recovery of protected guest VMs at the recovery AZ.

Recovery Point Objective (RPO)

The time interval that refers to the acceptable data loss if there is a failure. For example, if the RPO is 1 hour, the system creates a recovery point every 1 hour. On recovery, you can recover the guest VMs with data as of up to 1 hour ago. Take Snapshot Every in the Create Protection Policy GUI represents RPO.

Recovery Time Objective (RTO)

The time period from failure event to the restored service. For example, an RTO of 30 minutes enables you to back up and run the protected guest VMs in 30 minutes after the failure event.

Nutanix Disaster Recovery Solutions

The following flowchart provides you with the detailed representation of the disaster recovery (DR) solutions of Nutanix. This decision tree covers both the DR solutions—protection domain-based DR and Nutanix Disaster Recovery helping you to make quick decisions on which DR strategy will best suit your environment.

Figure. Decision Tree for Nutanix DR Solutions Click to enlarge Nutanix DR Solution Decision Tree

For information about protection domain-based (legacy) DR, see Data Protection and Recovery with Prism Element guide. With Leap, you can protect your guest VMs and perform DR to on-prem availability zones (AZs) or to Xi Cloud Services. A Leap deployment for DR from Xi Cloud Services to an on-prem Nutanix cluster is Xi Leap. The detailed information about Leap and Xi Leap DR configuration is available in the following sections of this guide.

Protection and DR between On-Prem AZs (Nutanix Disaster Recovery)

  • For information about protection with Asynchronous replication schedule and DR, see Protection with Asynchronous Replication Schedule and DR (Nutanix Disaster Recovery).
  • For information about protection with NearSync replication schedule and DR, see Protection with NearSync Replication Schedule and DR (Nutanix Disaster Recovery).
  • For information about protection with Synchronous replication schedule and DR, see Protection with Synchronous Replication Schedule (0 RPO) and DR.

Protection and DR between On-Prem AZ and Xi Cloud Service (Xi Leap)

  • For information about protection with Asynchronous replication schedule and DR, see Protection with Asynchronous Replication and DR (Xi Leap).
  • For information about protection with NearSync replication schedule and DR, see Protection with NearSync Replication and DR (Xi Leap).

Nutanix Disaster Recovery Deployment Workflow

The workflow for entity-centric protection and disaster recovery (DR) configuration is as follows. The workflow is largely the same for both Nutanix Disaster Recovery and Xi Leap configurations except a few extra steps you must perform while configuring Xi Leap.

Procedure

  1. Enable Leap at the primary and recovery on-prem AZs (Prism Central).
    Enable Leap at the on-prem AZ only. For more information about enabling Leap, see Enabling Nutanix Disaster Recovery for On-Prem AZ.
  2. Pair the primary and recovery AZs with each other.
    Only when you pair an AZ, the AZ lists while configuring recovery AZ in protection policies and recovery plans (see step 6 and step 7). For more information about pairing the AZs, see Pairing AZs (Nutanix Disaster Recovery).
  3. (only for Xi Leap configuration) Set up your environment to proceed with replicating to Xi Cloud Services.
    For more information about environment setup, see Xi Leap Environment Setup.
  4. (only for Xi Leap configuration) Reserve floating IP addresses.
    For more information about floating IP addresses, see Floating IP Address Management in Xi Infrastructure Service Management Guide .
  5. Create production and test virtual networks at the primary and recovery AZs.
    Create production and test virtual networks only at the on-prem AZs. Xi Cloud Services create production and test virtual networks dynamically for you. However, Xi Cloud Services provides floating IP addresses (step 4), a feature that is not available for on-prem AZs. For more information about production and test virtual networks, see Nutanix Virtual Networks.
  6. Create a protection policy with replication schedules at the primary AZ.
    A protection policy can replicate recovery points to at most two other Nutanix clusters at the same or different AZs. To replicate the recovery points, add a replication schedule between the primary AZ and each recovery AZ.
    • To create a protection policy with an Asynchronous replication schedule, see:
      • Creating a Protection Policy with an Asynchronous Replication Schedule (Nutanix Disaster Recovery)
      • Creating a Protection Policy with Asynchronous Replication Schedule (Xi Leap)
    • To create a protection policy with a NearSync replication schedule, see:
      • Creating a Protection Policy with a NearSync Replication Schedule (Nutanix Disaster Recovery)
      • Creating a Protection Policy with NearSync Replication Schedule (Xi Leap)
      Note: To maintain the efficiency of minutely replication, protection policies allow you to add NearSync replication schedule between the primary AZ and only one recovery AZ.
    • To create a protection policy with the Synchronous replication schedule, see Creating a Protection Policy with the Synchronous Replication Schedule.
      Note: To maintain the efficiency of synchronous replication, protection policies allow you to add only one recovery AZ when you add Synchronous replication schedule. If you already have an Asynchronous or a NearSync replication schedule in the protection policy, you cannot add another recovery AZ to protect the guest VMs with Synchronous replication schedule.
    You can also create a protection policy at a recovery AZ. Protection policies you create or update at a recovery AZ synchronize back to the primary AZ. The reverse synchronization helps when you protect more guest VMs in the same protection policy at the recovery AZ.
  7. Create a recovery plan at the primary AZ.
    A recovery plan orchestrates the failover of the protected guest VMs (step 6) to a recovery AZ. For two recovery AZs, create two discrete recovery plans at the primary AZ—one for DR to each recovery AZ.
    • To create a recovery plan for DR to another Nutanix cluster at the same or different on-prem AZs, see Creating a Recovery Plan (Nutanix Disaster Recovery).
    • To create a recovery plan for DR to Xi Cloud Services, see Creating a Recovery Plan (Xi Leap).
    You can also create a recovery plan at a recovery AZ. The recovery plan you create or update at a recovery AZ synchronizes back to the primary AZ. The reverse synchronization helps in scenarios where you add more guest VMs to the same recovery plan at the recovery AZ.
  8. Validate or test the recovery plan you create in step 7.
    To test a recovery plan, perform a test failover to a recovery AZ.
    • To perform test failover to another Nutanix cluster at the same or different on-prem AZs, see Performing a Test Failover (Leap).
    • To perform test failover to Xi Cloud Services, see Failover and Failback Operations (Xi Leap).
  9. (only for Xi Leap configuration) After the failover to recovery AZ, enable external connectivity. To enable external connectivity, perform the following.
      1. After a planned failover, shut down the VLAN interface on the on-prem Top-of-Rack (TOR) switch.
      2. To access the Internet from Xi Cloud Services, create both inbound and outbound policy-based routing (PBR) policies on the virtual private cloud (VPC). For more information, see Policy Configuration in Xi Infrastructure Service Administration Guide .
  10. (only for Xi Leap configuration) Perform the following procedure to access the recovered guest VMs through the Internet.
      1. Assign a floating IP address to the guest VMs failed over to Xi Cloud Services. For more information, see Floating IP Address Management in Xi Infrastructure Service Administration Guide
      2. Create PBR policies and specify the internal or private IP address of the guest VMs. For more information, see Policy Configuration in Xi Infrastructure Service Administration Guide .
    Note: If a guest VM (that hosts a publicly accessible webAZ) fails over, update the authoritative DNS server (for example, Amazon Route 53, GoDaddy, DNSmadeEasy) with the primary failover record (on-prem public IP address) and the secondary failover record (Xi floating IP address). For example, if your authoritative DNS server is Amazon Route53, configure the primary and the secondary failover records. Amazon Route53 performs the health checks on the primary failover record and returns the secondary failover record when the primary is down.

On-Prem Hardware Resource Requirements

For DR solutions with Asynchronous, NearSync, and Synchronous replication schedules to succeed, the nodes in the on-prem AZs (AZs or AZs) must have certain resources. This section provides information about the node, disk and Foundation configurations necessary to support the RPO-based recovery point frequencies.

  • The conditions and configurations provided in this section apply to Local and Remote recovery points.

  • Any node configuration with two or more SSDs, each SSD being 1.2 TB or greater capacity, supports recovery point frequency for NearSync.

  • Any node configuration that supports recovery point frequency of six (6) hours also supports AHV-based Synchronous replication schedules because a protection policy with Synchronous replication schedule takes recovery points of the protected VMs every 6 hours. See Protection with Synchronous Replication Schedule (0 RPO) and DR for more details about Synchronous replication.

  • Both the primary cluster and replication target cluster must fulfill the same minimum resource requirements.

  • Ensure that any new node or disk additions made to the on-prem AZs (Availability Zones) meet the minimum requirements.

  • Features such as Deduplication and RF3 may require additional memory depending on the DR schedules and other workloads run on the cluster.

Note: In case of on-prem deployments, the default minimum recovery point frequency using the default Foundation configuration is 6 hours. To increase the recovery point frequency, you must also modify the Foundation configuration (SSD and CVM) accordingly. For example, an all-flash setup with a capacity between 48 TB to 92 TB has the default recovery point frequency is 6 hours. If you want to decrease the recovery point interval to one (1) hour, you must modify the default Foundation configuration to:
  • 14 vCPUs for CVM
  • 40 GB for CVM

The table lists the supported frequency for the recovery points across various hardware configurations.

Table 1. Recovery Point Frequency
Type of disk Capacity per node Minimum recovery point frequency Foundation Configuration - SSD and CVM requirements
Hybrid Total HDD tier capacity of 32 TB or lower. Total capacity (HDD + SSD) of 40 TB or lower.
  • NearSync
  • Async (Hourly)
No change required—Default Foundation configuration.
  • 2 x SSDs

  • Each SSD must be minimum 1.2 TB or more for NearSync.

Total HDD tier capacity between 32-64 TB. Total capacity (HDD + SSD) of 92 TB or lower.

Up to 64 TB HDD

Up to 32 TB SSD (4 x 7.68 TB SSDs)

  • NearSync
  • Async (Hourly)
Modify Foundation configurations to minimum:
  • 4 x SSDs
  • Each SSD must be minimum 1.2 TB or more for NearSync.
  • 14 vCPU for CVM
  • 40 GB for CVM

Total HDD tier capacity between 32-64 TB. Total capacity (HDD + SSD) of 92 TB or lower.

Up to 64 TB HDD

Up to 32 TB SSD

Async (Every 6 hours) No change required—Default Foundation configuration.

Total HDD tier capacity between 64-80 TB. Total capacity (HDD + SSD) of 96 TB or lower.

Async (Every 6 hours) No change required—Default Foundation configuration.

Total HDD tier capacity greater than 80 TB. Total capacity (HDD + SSD) of 136 TB or lower.

Async (Every 6 hours) Modify Foundation configurations to minimum:
  • 12 vCPU for CVM
  • 36 GB for CVM
All Flash Total capacity of 48 TB or lower
  • NearSync
  • Async (Hourly)
No change required—Default Foundation configuration.
Total capacity between 48-92 TB
  • NearSync
  • Async (Hourly)
Modify Foundation configurations to minimum:
  • 14 vCPU for CVM
  • 40 GB for CVM
Total capacity between 48-92 TB Async (Every 6 hours) No change required—Default Foundation configuration.
Total capacity greater than 92 TB Async (Every 6 hours) Modify Foundation configurations to minimum:
  • 12 vCPU for CVM
  • 36 GB for CVM

Native Encryption of Replication Traffic

You can enable or disable encryption of the replication traffic that is generated between the primary cluster and the replication (remote) cluster or AZ by replication schedules. This encryption feature does not include or consider encryption of the traffic flowing between Prism Element and Prism Central.
Note: This feature only supports the use of native keys and certificates and does not support custom, user-provided keys and certificates to encrypt the replication traffic.

For details about the ports and protocols used by encrypted replication traffic, see Ports and Protocols.

Enabling Encryption of Replication Traffic

Before you begin

For encryption of Leap-based DR replication traffic, ensure that Leap-based DR is set up and the two AZs (AZs) are paired. This requirement is not applicable to protection domain based DR.

About this task

For details about the ports and protocols used by encrypted replication traffic, see Ports and Protocols.

To enable encryption of replication traffic, perform the following steps on the primary and replication (remote) clusters.
Important: For encryption of Leap based DR replication traffic also, you must perform these steps of the clusters in primary and replication AZs. Also, ensure that the Prism Centrals of the two AZs are paired.

Procedure

  1. SSH to the cluster Controller VM.
  2. Change the folder to bin .
    nutanix@CVM:$ cd bin
    nutanix@CVM:~/bin$ 
  3. Run the script with enable option.
    nutanix@CVM:~/bin$ python onwire_encryption_tool.py --leap --enable <remote_cluster_vip>

    For example: If the IP address of the replication (remote) cluster is 10.xxx.xxx.xxx.

    nutanix@CVM:~/bin$ python onwire_encryption_tool.py --leap --enable 10.xxx.xxx.xxx
    Important: Enter the password for the nutanix access for the remote cluster CVM when prompted.
    nutanix@CVM:~/bin$ python onwire_encryption_tool.py --leap --enable 10.xxx.xxx.xxx
    Checking Source Cluster Compatibility
    Check Complete. Source is compatible
    Checking Remote cluster Compatibility
    nutanix@10.xxx.xxx.xxx's password:
    Check Complete. Remote cluster is compatible
    Importing root.crt from Remote Cluster: 10.xxx.xxx.xxx
    
    nutanix@10.xxx.xxx.xxx's password:
    Checking if Remote Cluster's root.crt file already exists
    nutanix@10.xxx.xxx.xxx's password:
    Encryption enabled Successfully. Please perform rolling restart of Cerebro and Stargate services for changes to take effect

What to do next

Ensure that the changes take effect by performing a rolling restart of Cerebro and Stargate services on the primary and replication clusters using the following command.
nutanix@CVM:$ allssh "source /etc/profile; genesis stop cerebro stargate && cluster start; sleep 180"

Verifying the Status of Encryption of Replication Traffic

You can check the enabled or disabled status of the encryption of replication traffic.

About this task

For details about the ports and protocols used by encrypted replication traffic, see Ports and Protocols.

To verify the status of encryption of replication traffic, perform the following step on the cluster.

Procedure

  1. Run the following command to identify the CVM that contains trusted_certs.crt certificate:
    nutanix@CVM:~/bin$ allssh "ls -la /home/nutanix/tmp/ | grep -i trusted"
    A sample output is as follows:
    nutanix@CVM:~/$ allssh "ls -la /home/nutanix/tmp/ | grep -i trusted"
    ================== xx:xx:xx:2 =================
    ================== xx:xx:xx:3 =================
    ================== xx:xx:xx:4 =================
    -rw-------.  1 nutanix nutanix  1790 Dec xx 06:27 trusted_certs.crt.xx:xx:xx:xx
  2. SSH to the CVM identified (for exmple, xx.xx.xx.2) in the previous step.
  3. Run the script with verify option.
    nutanix@CVM:~/bin$ python onwire_encryption_tool.py --leap --verify <remote_cluster_vip>

    For example: If the IP address of the replication (remote) cluster is 10.xxx.xxx.xxx.

    nutanix@CVM:~/bin$ python onwire_encryption_tool.py --leap --verify 10.xxx.xxx.xxx
    Important: Enter the password for the nutanix access for the remote cluster CVM when prompted.
    nutanix@CVM:~/bin$ python onwire_encryption_tool.py --leap --verify 10.xxx.xxx.xxx
    Checking Source Cluster Compatibility
    Check Complete. Source is compatible
    Checking Remote cluster Compatibility
    nutanix@10.xxx.xxx.xxx's password:
    Check Complete. Remote cluster is compatible
    Verifying Encryption: Checking if Remote Cluster's root.crt file already exists
    nutanix@10.xxx.xxx.xxx's password:
    Encryption Verification Successful. Encryption is already enabled for this remote

Disabling Encryption of Replication Traffic

You can disable encryption of replication traffic

About this task

To disable encryption of replication traffic, perform the following step on the cluster.

Procedure

  1. SSH to the cluster Controller VM.
  2. Change the folder to bin .
    nutanix@CVM:$ cd bin
    nutanix@CVM:~/bin$ 
  3. Run the script with disable option.
    nutanix@CVM:~/bin$ python onwire_encryption_tool.py --leap --disable <remote_cluster_vip>

    For example: If the IP address of the replication (remote) cluster is 10.xxx.xxx.xxx.

    nutanix@CVM:~/bin$ python onwire_encryption_tool.py --leap --disable 10.xxx.xxx.xxx
    Important: Enter the password for the nutanix access for the remote cluster CVM when prompted.
    nutanix@CVM:~/bin$ python onwire_encryption_tool.py --leap --disable 10.xxx.xxx.xxx
    Checking Source Cluster Compatibility
    Stopping Cerebro on all nodes of the cluster
    
    Encryption disabled Successfully. Please perform rolling restart of Cerebro and Stargate services for changes to take effect.

What to do next

Ensure that the changes take effect by performing a rolling restart of Cerebro and Stargate services on the primary and replication clusters using the following command.
nutanix@CVM:$ allssh "source /etc/profile; genesis stop cerebro stargate && cluster start; sleep 180"

Protection and DR between On-Prem AZs (Nutanix Disaster Recovery)

Leap protects your guest VMs and orchestrates their disaster recovery (DR) to other Nutanix clusters when events causing service disruption occur at the primary AZ. For protection of your guest VMs, protection policies with Asynchronous, NearSync, or Synchronous replication schedules generate and replicate recovery points to other on-prem AZs (AZs). Recovery plans orchestrate DR from the replicated recovery points to other Nutanix clusters at the same or different on-prem AZs.

Protection policies create a recovery point—and set its expiry time—in every iteration of the specified time period (RPO). For example, the policy creates a recovery point every 1 hour for an RPO schedule of 1 hour. The recovery point expires at its designated expiry time based on the retention policy—see step 3 in Creating a Protection Policy with an Asynchronous Replication Schedule (Nutanix Disaster Recovery). If there is a prolonged outage at an AZ, the Nutanix cluster retains the last recovery point to ensure you do not lose all the recovery points. For NearSync replication (lightweight snapshot), the Nutanix cluster retains the last full hourly snapshot. During the outage, the Nutanix cluster does not clean up the recovery points due to expiry. When the Nutanix cluster comes online, it cleans up the recovery points that are past expiry immediately.

For High Availability of a guest VM, Nutanix Disaster Recovery enables replication of its recovery points to one or more on-prem AZs. A protection policy can replicate recovery points to maximum two on-prem AZs. For replication, you must add a replication schedule between AZs. You can set up the on-prem AZs for protection and DR in the following arrangements.

Figure. The Primary and recovery Nutanix clusters at the different on-prem AZs
Click to enlarge The Primary and recovery Nutanix clusters at the different AZs

Figure. The Primary and recovery Nutanix clusters at the same on-prem AZ
Click to enlarge The Primary and recovery Nutanix clusters at the same on-prem AZ

The replication to multiple AZs enables DR to Nutanix clusters at all the AZs where the recovery points replicate or exist. To enable performing DR to a Nutanix cluster at the same or different AZ (recovery AZ), you must create a recovery plan. To enable performing DR to two different Nutanix clusters at the same or different recovery AZs, you must create two discrete recovery plans—one for each recovery AZ. In addition to performing DR to Nutanix clusters running the same hypervisor type, you can also perform cross-hypervisor disaster recovery (CHDR)—DR from AHV clusters to ESXi clusters, or from ESXi clusters to AHV clusters.

The protection policies and recovery plans you create or update synchronize continuously between the primary and recovery on-prem AZs. The reverse synchronization enables you to create or update entities (protection policies, recovery plans, and guest VMs) at either the primary or the recovery AZs.

The following section describes protection of your guest VMs and DR to a Nutanix cluster at the same or different on-prem AZs. The workflow is the same for protection and DR to a Nutanix cluster in supported public cloud platforms. For information about protection of your guest VMs and DR from Xi Cloud Services to an on-prem Nutanix cluster (Xi Leap), see Protection and DR between On-Prem AZ and Xi Cloud Service (Xi Leap).

Nutanix Disaster Recovery Requirements

The following are the general requirements of Nutanix Disaster Recovery . Along with the general requirements, there are specific requirements for protection with the following supported replication schedules.

  • For information about the on-prem node, disk and Foundation configurations required to support Asynchronous, NearSync, and Synchronous replication schedules, see On-Prem Hardware Resource Requirements.
  • For specific requirements of protection with Asynchronous replication schedule (1 hour or greater RPO), see Asynchronous Replication Requirements (Nutanix Disaster Recovery).
  • For specific requirements of protection with NearSync replication schedule (1–15 minutes RPO), see NearSync Replication Requirements (Nutanix Disaster Recovery).
  • For specific requirements of protection with Synchronous replication schedule (0 RPO), see Synchronous Replication Requirements.

License Requirements

The AOS license required depends on the features that you want to use. For information about the features that are available with an AOS license, see Software Options.

Hypervisor Requirements

The underlying hypervisors required differ in all the supported replication schedules. For more information about underlying hypervisor requirements for the supported replication schedules, see:

  • Asynchronous Replication Requirements (Nutanix Disaster Recovery)
  • NearSync Replication Requirements (Nutanix Disaster Recovery)
  • Synchronous Replication Requirements

Nutanix Software Requirements

  • Each on-prem AZ (AZ) must have a Leap enabled Prism Central instance. To enable Leap in Prism Central, see Enabling Nutanix Disaster Recovery for On-Prem AZ.
    Note: If you are using ESXi, register at least one vCenter Server to Prism Central. You can also register two vCenter Servers, each to a Prism Central at different AZs. If you register both the Prism Central to the single vCenter Server, ensure that each ESXi cluster is part of different datacenter object in vCenter.

  • The primary and recovery Prism Central and Prism Element on the Nutanix clusters must be running on the supported AOS versions. For more information about the required versions for the supported replication schedules, see:
    • Asynchronous Replication Requirements (Nutanix Disaster Recovery)
    • NearSync Replication Requirements (Nutanix Disaster Recovery)
    • Synchronous Replication Requirements
    Tip:

    Nutanix supports replications between the all the latest supported LTS and STS released AOS versions. To check the list of the latest supported AOS versions, see KB-5505. To determine if the AOS versions currently running on your clusters are EOL, see the EOL document .

    Upgrade the AOS version to the next available supported LTS/STS release. To determine if an upgrade path is supported, check the Upgrade Paths page before you upgrade the AOS.

    Note: If both clusters have different AOS versions that are EOL, upgrade the cluster with lower AOS version to match the cluster with higher AOS version and then perform the upgrade to the next supported LTS version.

    For example, the clusters are running AOS versions 5.5.x and 5.10.x respectively. Upgrade the cluster on 5.5.x to 5.10.x. After both the clusters are on 5.10.x, proceed to upgrade each cluster to 5.15.x (supported LTS). Once both clusters are on 5.15.x you can upgrade the clusters to 5.20.x or newer.

    Nutanix recommends that both the primary and the replication clusters or AZs run the same AOS version.

User Requirements

You must have one of the following roles in Prism Central.

  • User admin
  • Prism Central admin
  • Prism Self Service admin
  • Xi admin

To view the available roles or create a role, click the hamburger icon at the top-left corner of the window and go to Administration > Roles in the left pane.

Firewall Port Requirements

To allow two-way replication between Nutanix clusters at the same or different AZs, you must enable certain ports in your external firewall. To know about the required ports, see Disaster Recovery - Leap in Port Reference.

Networking Requirements

Requirements for static IP address preservation after failover
You can preserve one IP address of a guest VM (with static IP address) for its failover (DR) to an IPAM network. After the failover, the other IP addresses of the guest VM have to be reconfigured manually. To preserve an IP address of a guest VM (with static IP address), ensure that:
Caution: By default, you cannot preserve statically assigned DNS IP addresses after failover (DR) of guest VMs. However, you can create custom in-guest scripts to preserve the statically assigned DNS IP addresses. For more information, see Creating a Recovery Plan (Nutanix Disaster Recovery).
  • Both the primary and the recovery Nutanix clusters run AOS 5.11 or newer.
  • The protected guest VMs have Nutanix Guest Tools (NGT) version 1.5 or newer installed.

    For information about installing NGT, see Nutanix Guest Tools in Prism Web Console Guide .

  • The protected guest VMs have at least one empty CD-ROM slot.

    The empty CD-ROM is required for mounting NGT at the recovery AZ.

  • The protected guest VMs can reach the Controller VM from both the AZs.
  • The protected guest VMs have NetworkManager command-line tool (nmcli) version 0.9.10.0 or newer installed.
    Also, the NetworkManager must manage the networks on Linux VMs. To enable NetworkManager on a Linux VM, in the interface configuration file, set the value of the NM_CONTROLLED field to yes . After setting the field, restart the network service on the VM.
    Tip: In CentOS, the interface configuration file is /etc/sysconfig/network-scripts/ifcfg-eth0 .
Requirements for static IP address mapping of guest VMs between source and target virtual networks
You can explicitly define IP addresses for guest VMs that have static IP addresses on the primary AZ. On recovery, such guest VMs retain the explicitly defined IP address. To map static IP addresses of guest VMs between source and target virtual networks, ensure that:
  • Both the primary and the recovery Nutanix clusters run AOS 5.17 or newer.
  • The protected guest VMs have static IP addresses at the primary AZ.
  • The protected guest VMs have Nutanix Guest Tools (NGT) version 1.5 or newer installed.

    For information about installing NGT, see Nutanix Guest Tools in Prism Web Console Guide .

  • The protected guest VMs have at least one empty CD-ROM slot.

    The empty CD-ROM is required for mounting NGT at the recovery AZ.

  • The protected guest VMs can reach the Controller VM from both the AZs.
  • The recovery plan selected for failover has VM-level IP address mapping configured.
Virtual network design requirements
You can design the virtual subnets that you plan to use for DR to the recovery AZ so that they can accommodate the guest VMs running in the source virtual network.
  • Maintain a uniform network configuration for all the virtual LANs (VLANs) with the same VLAN ID and network range in all the Nutanix clusters at a AZ. All such VLANs must have the same subnet name, IP address range, and IP address prefix length ( (Gateway IP/Prefix Length) ).

    For example, if you have VLAN with id 0 and network 10.45.128.0/17, and three clusters PE1, PE2, and PE3 at the AZ AZ1, all the clusters must maintain the same name, IP address range, and IP address prefix length ( (Gateway IP/Prefix Length) ), for VLAN with id 0.

  • To use a virtual network as a recovery virtual network, ensure that the virtual network meets the following requirements.
    • The network prefix is the same as the network prefix of the source virtual network. For example, if the source network address is 192.0.2.0/24, the network prefix of the recovery virtual network must also be 24.
    • The gateway IP address is the same as the gateway IP address in the source network. For example, if the gateway IP address in the source virtual network 192.0.2.0/24 is 192.0.2.10, the last octet of the gateway IP address in the recovery virtual network must also be 10.
  • To use a single Nutanix cluster as a target for DR from multiple primary Nutanix clusters, ensure that the number of virtual networks on the recovery cluster is equal to the sum of the number of virtual networks on the individual primary Nutanix clusters. For example, if there are two primary Nutanix clusters, with one cluster having m networks and the other cluster having n networks, ensure that the recovery cluster has m + n networks. Such a design ensures that all recovered VMs attach to a network.

Additional Requirements

  • Both the primary and recovery Nutanix clusters must have an external IP address.
  • Both the primary and recovery Prism Centrals and Nutanix clusters must have a data services IP address.
  • The Nutanix cluster that hosts the Prism Central must meet the following requirements.
    • The Nutanix cluster must be registered to the Prism Central instance.
    • The Nutanix cluster must have an iSCSI data services IP address configured on it.
    • The Nutanix cluster must also have sufficient memory to support a hot add of memory to all Prism Central nodes when you enable Leap. A small Prism Central instance (4 vCPUs, 16 GB memory) requires a hot add of 4 GB, and a large Prism Central instance (8 vCPUs, 32 GB memory) requires a hot add of 8 GB. If you enable Nutanix Flow, each Prism Central instance requires an extra hot-add of 1 GB.
  • Each node in a scaled-out Prism Central instance must have a minimum of 4 vCPUs and 16 GB memory.

    For more information about the scaled-out deployments of a Prism Central, see Nutanix Disaster Recovery Terminology.

  • The protected guest VMs must have Nutanix VM mobility drivers installed.

    Nutanix VM mobility drivers are required for accessing the guest VMs after failover. Without Nutanix VM mobility drivers, the guest VMs become inaccessible after a failover.

  • Maintain a uniform network configuration for all the virtual LANs (VLANs) with the same VLAN ID and network range in all the clusters at an AZ (AZ). All such VLANs must have the same subnet name, IP address range, and IP address prefix length ( (Gateway IP/Prefix Length) ).

    For example, if you have VLAN with id 0 and network 10.45.128.0/17, and three clusters PE1, PE2, and PE3 at the AZ AZ1, all the clusters must maintain the same name, IP address range, and IP address prefix length ( (Gateway IP/Prefix Length) ), for VLAN with id 0.

Nutanix Disaster Recovery Limitations

Consider the following general limitations before configuring Nutanix Disaster Recovery . Along with the general limitations, there are specific protection limitations with the following supported replication schedules.

  • For specific limitations of protection with Asynchronous replication schedule (1 hour or greater RPO), see Asynchronous Replication Limitations (Nutanix Disaster Recovery).
  • For specific limitations of protection with NearSync replication schedule (1–15 minutes RPO), see NearSync Replication Limitations (Nutanix Disaster Recovery).
  • For specific limitations of protection with Synchronous replication schedule (0 RPO), see Synchronous Replication Limitations.

Virtual Machine Limitations

You cannot do or implement the following.

  • Deploy witness VMs.
  • Protect multiple guest VMs that use disk sharing (for example, multi-writer sharing, Microsoft Failover Clusters, Oracle RAC).

  • Protect VMware fault tolerance enabled guest VMs.

  • Recover vGPU console enabled guest VMs efficiently.

    When you perform DR of vGPU console-enabled guest VMs, the VMs recover with the default VGA console (without any alert) instead of vGPU console. The guest VMs fail to recover when you perform cross-hypervisor disaster recovery (CHDR). For more information about DR and backup behavior of guest VMs with vGPU, see vGPU Enabled Guest VMs.

  • Configure NICs for a guest VM across both the virtual private clouds (VPC).

    You can configure NICs for a guest VM associated with either production or test VPC.

Volume Groups Limitation

You cannot protect volume groups.

Network Segmentation Limitation

You cannot apply network segmentation for management traffic (any traffic not on the backplane network) in Leap.

You get an error when you try to enable network segmentation for management traffic on a Leap enabled Nutanix Cluster or enable Leap in a network segmentation enabled Nutanix cluster. For more information about network segmentation, see Securing Traffic Through Network Segmentation in the Security Guide .
Note: However, you can apply network segmentation for backplane traffic at the primary and recovery clusters. Nutanix does not recommend this because when you perform a planned failover of guest VMs having network segmentation for backplane enabled, the guest VMs fail to recover and the guest VMs at the primary AZ are removed.

Virtual Network Limitation

Although there is no limit to the number of VLANs that you can create, only the first 500 VLANs list in the drop-down of Network Settings while creating a recovery plan. For more information about VLANs in the recovery plan, see Nutanix Virtual Networks.

Nutanix to vSphere Cluster Mapping Limitation

Due to the way the Nutanix architecture distributes data, there is limited support for mapping a Nutanix cluster to multiple vSphere clusters. If a Nutanix cluster is split into multiple vSphere clusters, migrate and recovery operations fail.

Failover Limitation

After the failover, the recovered guest VMs do not retain their associated labels.
Tip: Assign categories to the guest VMs instead of labels because VM categories are retained after the failover.

vGPU Enabled Guest VMs

The following table list the behavior of guest VMs with vGPU to disaster recovery (DR) and backup deployments.

Table 1.
Primary cluster Recovery cluster DR or Backup Identical vGPU models Unidentical vGPU models or no vGPU
AHV AHV Nutanix Disaster Recovery Supported:
  • Recovery point creation
  • Replication
  • Restore
  • Migrate
  • VM start
  • Failover and Failback
Supported:
  • Recovery point creation
  • Replication
  • Restore
  • Migrate
Unsupported:
  • VM start
  • Failover and Failback
    Note: Only for Synchronous replication, protection of guest VMs fail.
Backup: HYCU Guest VMs with vGPU fail to recover. Guest VMs with vGPU fail to recover.
Backup: Veeam Guest VMs with vGPU fail to recover.
  • Guest VMs with vGPU recover but with older vGPU.
  • Guest VMs with vGPU recover but do not start.
Tip: The VMs start when you disable vGPU on the guest VM
ESXi ESXi Nutanix Disaster Recovery Guest VMs with vGPU cannot be protected. Guest VMs with vGPU cannot be protected.
Backup Guest VMs with vGPU cannot be protected. Guest VMs with vGPU cannot be protected.
AHV ESXi Nutanix Disaster Recovery vGPU is disabled after failover of Guest VMs with vGPU. vGPU is disabled after failover of Guest VMs with vGPU.
ESXi AHV Nutanix Disaster Recovery Guest VMs with vGPU cannot be protected. Guest VMs with vGPU cannot be protected.

Nutanix Disaster Recovery Configuration Maximums

For the maximum number of entities you can configure with different replication schedules and perform failover (disaster recovery), see Nutanix Configuration Maximums. The limits have been tested for Leap production deployments. Nutanix does not guarantee the system to be able to operate beyond these limits.

Tip: Upgrade your NCC version to 3.10.1 to get configuration alerts.

Nutanix Disaster Recovery Recommendations

Nutanix recommends the following best practices for configuring Nutanix Disaster Recovery .

General Recommendations

  • Create all entities (protection policies, recovery plans, and VM categories) at the primary AZ (AZ).
  • Upgrade Prism Central before upgrading Prism Element on the Nutanix clusters registered to it. For more information about upgrading Prism Central, see Upgrading Prism Central in the Acropolis Upgrade Guide .
  • Do not include the guest VMs protected with Asynchronous, NearSync, and Synchronous replication schedules in the same recovery plan. You can include guest VMs protected with Asynchronous or NearSync replication schedules in the same recovery plan. However, if you combine these guest VMs with the guest VMs protected by Synchronous replication schedules in a recovery plan, the recovery fails.
  • Disable Synchronous replication before unpairing the AZs.

    If you unpair the AZs while the guest VMs in the Nutanix clusters are still in synchronization, the Nutanix cluster becomes unstable. For more information about disabling Synchronous replication, see Synchronous Replication Management.

Recommendation for Migrating Protection Domains to Protection Policies

You can protect a guest VM either with the legacy DR solution (protection domain-based) or with Leap. To protect a legacy DR-protected guest VM with Leap, you must migrate the guest VM from the protection domain to a protection policy. During the migration, do not delete the guest VM snapshots in the protection domain. Nutanix recommends keeping the guest VM snapshots in the protection domain until the first recovery point for the guest VM is available on Prism Central. For more information, see Migrating Guest VMs from a Protection Domain to a Protection Policy.

Recommendation for DR to Nutanix Clusters at the Same On-Prem AZ

If the single Prism Central that you use for protection and DR to Nutanix clusters at the same AZ (AZ) becomes inactive, you cannot perform a failover when required. To avoid the single point of failure in such deployments, Nutanix recommends installing the single Prism Central at a different AZ (different fault domain).

Recommendation for Virtual Networks

  • Map the networks while creating a recovery plan in Prism Central.
  • Recovery plans do not support overlapping subnets in a network-mapping configuration. Do not create virtual networks that have the same name or overlapping IP address ranges.

Recommendation for Container Mapping

Create storage containers with the same name on both the primary and recovery Nutanix clusters.

Leap automatically maps the storage containers during the first replication (seeding) of a guest VM. If a storage container with the same name exists on both the primary and recovery Nutanix clusters, the recovery points replicate to the storage container with the same name only. For example, if your protected guest VMs are in the SelfServiceContainer on the primary Nutanix cluster, and the recovery Nutanix cluster also has SelfServiceContainer , the recovery points replicate to SelfServiceContainer only. If a storage container with the same name does not exist at the recovery AZ, the recovery points replicate to a random storage container at the recovery AZ. For more information about creating storage containers on the Nutanix clusters, see Creating a Storage Container in Prism Web Console Guide .

Nutanix Disaster Recovery Service-Level Agreements (SLAs)

Nutanix Disaster Recovery enables protection of your guest VMs and disaster recovery (DR) to one or more Nutanix clusters at the same or different on-prem AZs. A Nutanix cluster is essentially an AHV or an ESXi cluster running AOS. In addition to performing DR to Nutanix clusters running the same hypervisor type, you can also perform cross-hypervisor disaster recovery (CHDR)—DR from AHV clusters to ESXi clusters, or from ESXi clusters to AHV clusters.

Leap supports DR (and CHDR) to maximum two different Nutanix clusters at the same or different AZs (AZs). You can protect your guest VMs with the following replication schedules.

  • Asynchronous replication schedule (1 hour or greater RPO). For information about protection with Asynchronous replication schedule, see Protection with Asynchronous Replication Schedule and DR (Nutanix Disaster Recovery).
  • NearSync replication schedule (1–15 minute RPO). For information about protection with NearSync replication schedule, see Protection with NearSync Replication Schedule and DR (Nutanix Disaster Recovery).
  • Synchronous replication schedule (0 RPO). For information about protection with Synchronous replication schedule, see Protection with Synchronous Replication Schedule (0 RPO) and DR.

    To maintain the efficiency in protection and DR, Leap allows to protect a guest VM with Synchronous replication schedule to only one AHV cluster and at the different on-prem availability zone.

Nutanix Disaster Recovery Views

The disaster recovery (DR) views enable you to perform CRUD operations on the following types of Leap entities.

  • Configured entities (for example, AZs, protection policies, and recovery plans)
  • Created entities (for example, guest VMs, and recovery points)

This chapter describes the views of Prism Central (on-prem AZ).

AZs View

The AZs view under the hamburger icon > Administration lists all of your paired AZs.

The following figure is a sample view, and the tables describe the fields and the actions that you can perform in this view.

Figure. AZs View Click to enlarge AZs View

Table 1. AZs View Fields
Field Description
Name Name of the AZ.
Region Region to which the AZ belongs.
Type Type of AZ. AZs that are backed by on-prem Prism Central instances are shown to be of type physical. The AZ that you are logged in to is shown as a local AZ.
Connectivity Status Status of connectivity between the local AZ and the paired AZ.
Table 2. Workflows Available in the AZs View
Workflow Description
Connect to AZ (on-prem Prism Central only) Connect to an on-prem Prism Central or to a Xi Cloud Services for data replication.
Table 3. Actions Available in the Actions Menu
Action Description
Disconnect Disconnect the remote AZ. When you disconnect an availability zone, the pairing is removed.

Protection Summary View

The Protection Summary view under the hamburger icon > Data Protection shows detailed information about the Leap entities in an AZ (AZ) and helps you generate DR reports for the specified time. The information in the Protection Summary view enables you to monitor the health of your DR deployments (Leap) and the activities performed on Leap entities. Select a topology in the left-hand side pane and protection and recovery information of the selected topology shows up in DR widgets on the right-hand side pane. The following figures are a sample view, and the tables describe the fields and the actions that you can perform in the DR widgets.

Figure. Protection Summary: Protected Entities, Replication Tasks, and Recovery Readiness Click to enlarge Protection Summary View: Protected Entities, Replication Tasks, and Recovery Readiness

Figure. Protection Summary: Configuration Alerts Click to enlarge Protection Summary View: Configuration Alerts

Figure. Protection Summary: Reports and Recovery Events Click to enlarge Protection Summary View: Reports and Recovery Events

Table 1. Protected Entities
Field Description
Total Number of guest VMs protected. Clicking the number shows the guest VMs protected in protection policies.
RPO Not Met Number of guest VMs that are protected but do not meet the specified RPO. Clicking the number shows the guest VMs that do not meet the specified RPO.
Table 2. Replication Tasks
Field Description
Ongoing Number of ongoing replication tasks.
Stuck Number of replication tasks that are stuck. Clicking the number shows the stuck alerts generated in Alerts .
Failed Number of replication tasks that failed. Clicking the number shows the alerts generated in Alerts .
Table 3. Recovery Readiness
Field Description
Measured by Failover operations to check the readiness of the recovery plans. You can use Validate , Test Failover , or Planned Failover from the drop-down list to check recovery readiness.
Succeeded Number of recovery plans on which the selected failover operation ran successfully. Clicking the number shows the recovery plans on which the selected failover operation ran successfully.
Succeeded With Warnings Number of recovery plans on which the selected failover operation ran successfully but with warnings. Clicking the number shows the recovery plans on which the selected failover operation ran successfully with warnings.
Failed Number of recovery plans on which the selected failover operation failed to run successfully. Clicking the number shows the recovery plans on which the selected failover operation failed to run successfully.
Not Executed Number of recovery plans on which no failover operation ran. Clicking the number shows the recovery plans on which no failover operation ran.
Table 4. Entities with RPO Not Met
Field Description
Name Names of guest VMs that do not meet the specified RPO. You can use the filters on the guest VMs to investigate the reason for RPO not meeting.
Table 5. Configuration Alerts
Field Description
Alert Description Description of configuration alerts raised on protection policies and recovery plans.
Impacted Entity The entities impacted by the configuration alerts.
Table 6. Reports
Field Description
Report Name Name of the report.
Generated at Date and time when the report was generated.
Download Option to download the report as a PDF or a CSV document.

Recovery Events

This widget shows you a detailed view of the Recovery Readiness . You can view information about all the recovery plans that ran on the selected AZs in the last 3 months.

Protection Policies View

The Protection Policies view under the hamburger icon > Data Protection lists all of configured protection policies from all the paired availability zones.

The following figure is a sample view, and the tables describe the fields and the actions that you can perform in this view.

Figure. Protection Policies View Click to enlarge Protection Policies View

Table 1. Protection Policies View Fields
Field Description
Policy Name Name of the protection policy.
Schedules Number of schedules configured in the protection policy. If the protection policy has multiple schedules, a drop-down icon is displayed. Click the drop-down icon to see the primary location:primary Nutanix cluster , recovery location:recovery Nutanix cluster , and RPO of the schedules in the protection policy.
Alerts Number of alerts issued for the protection policy.
Table 2. Workflows Available in the Protection Policies View
Workflow Description
Create protection policy Create a protection policy.
Table 3. Actions Available in the Actions Menu
Action Description
Update Update the protection policy.
Clone Clone the protection policy.
Delete Delete the protection policy.

Recovery Plans View

The Recovery Plans view under the hamburger icon > Data Protection lists all of configured recovery plans from all the paired availability zones.

The following figure is a sample view, and the tables describe the fields and the actions that you can perform in this view.

Figure. Recovery Plans View Click to enlarge Recovery Plans View

Table 1. Recovery Plans View Fields
Field Description
Name Name of the recovery plan.
Primary Location Replication source AZ for the recovery plan.
Recovery Location Replication target AZ for the recovery plan.
Entities Sum of the following VMs:
  • Number of local, live VMs that are specified in the recovery plan.
  • Number of remote VMs that the recovery plan can recover at this AZ.
Last Validation Status Status of the most recent validation of the recovery plan.
Last Test Status Status of the most recent test performed on the recovery plan.
Last Failover Status Status of the most recent failover performed on the recovery plan.
Table 2. Workflows Available in the Recovery Plans View
Workflow Description
Create Recovery Plan Create a recovery plan.
Table 3. Actions Available in the Actions Menu
Action Description
Validate Validates the recovery plan to ensure that the VMs in the recovery plan have a valid configuration and can be recovered.
Test Tests the recovery plan.
Clean-up test VMs Cleans up the VMs failed over as a result of testing recovery plan.
Update Updates the recovery plan.
Failover Performs a failover.
Delete Deletes the recovery plan.

VM Recovery Points

The VM Recovery Points view under the hamburger icon > Data Protection lists all the recovery points of all the protected guest VMs (generated over time).

The following figure is a sample view, and the tables describe the fields and the actions that you can perform in this view.

Figure. VM Recovery Points Click to enlarge VM Recovery Points View

Table 1. VM Recovery Points View Fields
Field Description
Name Name of the recovery point.
Latest Recovery Point on Local AZ Replication source AZ for the recovery plan.
Oldest Recovery Point on Local AZ Replication target AZ for the recovery plan.
Total Recovery Points Number of recovery points generated for the guest VM.
Owner Owner account of the recovery point.
Table 2. Actions Available in the Actions Menu
Action Description
Clone (Previously Restore) Clones the guest VM from the selected recovery points. The operation creates a copy of guest VM in the same Nutanix cluster without overwriting the original guest VM (out-of-place restore). For more information, see Manual Recovery of Guest VMs.
Revert Reverts the guest VMs to the selected recovery points. The operation recreates the guest VM in the same Nutanix cluster by overwriting the original guest VM (in-place restore). For more information, see Manual Recovery of Guest VMs.
Replicate Manually replicates the selected recovery points to a different Nutanix cluster in the same or different AZs. For more information, see Replicating Recovery Points Manually.

Dashboard Widgets

The dashboard includes widgets that display the statuses of configured protection policies and recovery plans. If you have not configured these VMs, the widgets display a summary of the steps required to get started with Leap.

To view these widgets, click the Dashboard tab.

The following figure is a sample view of the dashboard widgets.

Figure. Dashboard Widgets for Leap Click to enlarge Dashboard Widgets

Enabling Nutanix Disaster Recovery for On-Prem AZ

To perform disaster recovery (DR) to Nutanix clusters at different on-prem available zones (AZs), enable Leap at both the primary and recovery AZs (Prism Central). Without enabling Leap, you can configure protection policies and recovery plans that synchronize to the paired AZs but you cannot perform failover and failback operations. To perform DR to different Nutanix clusters at the same AZ, enable Leap in the single Prism Central.

About this task

To enable Nutanix Disaster Recovery , perform the following procedure.

Note: You cannot disable Nutanix Disaster Recovery once you have enabled it.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the settings button (gear icon) at the top-right corner of the window.
  3. Click Enable Leap in the Setup section on the left pane.
    Figure. Enabling Leap
    Click to enlarge Enabling Leap

    The Leap dialog box run prechecks. If any precheck fails, resolve the issue that is causing the failure and click check again .
  4. Click Enable after all the prechecks pass.
    Leap is enabled after at least 10 seconds.

Pairing AZs (Nutanix Disaster Recovery)

To replicate entities (protection policies, recovery plans, and recovery points) to different on-prem AZs (AZs) bidirectionally, pair the AZs with each other. To replicate entities to different Nutanix clusters at the same AZ bidirectionally, you need not pair the AZs because the primary and the recovery Nutanix clusters are registered to the same AZ (Prism Central). Without pairing the AZs, you cannot perform DR to a different AZ.

About this task

To pair an on-prem AZ with another on-prem AZ, perform the following procedure at either of the on-prem AZs.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Administration > AZs in the left pane.
    Figure. Pairing AZ
    Click to enlarge Pairing AZ

  3. Click Connect to AZ .
    Specify the following information in the Connect to Availability Zone window.
    Figure. Connect to AZ
    Click to enlarge Connect to AZ

    1. AZ Type : Select Physical Location from the drop-down list.
      A physical location is an on-prem AZ (AZ). To pair the on-prem AZ with Xi Cloud Services, select XI from the drop-down list, and enter the credentials of your Xi Cloud Services account in step c and set d.
    2. IP Address for Remote PC : Enter the IP address of the recovery AZ Prism Central.
    3. Username : Enter the username of your recovery AZ Prism Central.
    4. Password : Enter the password of your recovery AZ Prism Central.
  4. Click Connect .
    Both the on-prem AZs are paired to each other.

Protection and Automated DR (Nutanix Disaster Recovery)

Automated disaster recovery (DR) configurations use protection policies to protect your guest VMs, and recovery plans to orchestrate the recovery of those guest VMs to different Nutanix clusters at the same or different AZs (AZs). You can automate protection of your guest VMs with the following supported replication schedules in Nutanix Disaster Recovery .

  • Asynchronous replication schedule (1 hour or greater RPO). For information about protection with Asynchronous replication schedule, see Protection with Asynchronous Replication Schedule and DR (Nutanix Disaster Recovery).
  • NearSync replication schedule (1–15 minute RPO). For information about protection with NearSync replication schedule, see Protection with NearSync Replication Schedule and DR (Nutanix Disaster Recovery).
  • Synchronous replication schedule (0 RPO). For information about protection with Synchronous replication schedule, see Protection with Synchronous Replication Schedule (0 RPO) and DR.

    To maintain the efficiency in protection and DR, Leap allows to protect a guest VM with Synchronous replication schedule to only one AHV cluster and at the different on-prem availability zone.

Protection with Asynchronous Replication Schedule and DR (Nutanix Disaster Recovery)

Asynchronous replication schedules enable you to protect your guest VMs with an RPO of 1 hour or beyond. A protection policy with an Asynchronous replication schedule creates a recovery point in an hourly time interval, and replicates it to the recovery AZs (AZs) for High Availability. For guest VMs protected with Asynchronous replication schedule, you can perform disaster recovery (DR) to different Nutanix clusters at same or different AZs. In addition to performing DR to Nutanix clusters running the same hypervisor type, you can also perform cross-hypervisor disaster recovery (CHDR)—DR from AHV clusters to ESXi clusters, or from ESXi clusters to AHV clusters.

Note: Nutanix provides multiple DR solutions to protect your environment. See Nutanix Disaster Recovery Solutions for the detailed representation of the DR offerings of Nutanix.

Asynchronous Replication Requirements (Nutanix Disaster Recovery)

The following are the specific requirements for protecting your guest VMs with Asynchronous replication schedule. Ensure that you meet the following requirements in addition to the general requirements of Nutanix Disaster Recovery .

For information about the general requirements of Nutanix Disaster Recovery , see Nutanix Disaster Recovery Requirements.

For information about node, disk and Foundation configurations required to support Asynchronous replication schedules, see On-Prem Hardware Resource Requirements.

Hypervisor Requirements

AHV or ESXi

  • The AHV clusters must be running on AHV versions that come bundled with the supported version of AOS.
  • The ESXi clusters must be running on version ESXi 6.5 GA or newer.

Nutanix Software Requirements

Each on-prem AZ must have a Nutanix Disaster Recovery enabled Prism Central instance.

The primary and recovery Prism Central and Prism Element on the Nutanix clusters must be running the following versions of AOS.

  • AHV clusters
    • AOS 5.17 or newer for DR to different Nutanix clusters at the same AZ.
    • AOS 5.10 or newer for DR to Nutanix clusters at the different AZs.
  • ESXi clusters
    • AOS 5.17 or newer for DR to different Nutanix clusters at the same AZ.
    • AOS 5.11 or newer for DR to Nutanix clusters at the different AZs.

Cross Hypervisor Disaster Recovery (CHDR) Requirements

Guest VMs protected with Asynchronous replication schedule support cross-hypervisor disaster recovery. You can perform failover (DR) to recover guest VMs from AHV clusters to ESXi clusters or guest VMs from ESXi clusters to AHV clusters by considering the following requirements.

  • Both the primary and the recovery Nutanix clusters must be running AOS 5.17 or newer for CHDR to Nutanix clusters at the same AZ.
  • Both the primary and the recovery Nutanix clusters must be running AOS 5.11.2 or newer for CHDR to Nutanix clusters at different AZs.
  • Install and configure Nutanix Guest Tools (NGT) on all the guest VMs. For more information, see Enabling and Mounting Nutanix Guest Tools in Prism Web Console Guide .

    NGT configures the guest VMs with all the required drivers for VM portability. For more information about general NGT requirements, see Nutanix Guest Tools Requirements and Limitations in Prism Web Console Guide .

  • CHDR supports guest VMs with flat files only.
  • CHDR supports IDE/SCSI and SATA disks only.
  • For all the non-boot SCSI disks of Windows guest VMs, set the SAN policy to OnlineAll so that they come online automatically.
  • In vSphere 6.7, guest VMs are configured with UEFI secure boot by default. Upon CHDR to an AHV cluster, these guest VMs do not start if the host does not support the UEFI secure boot feature. For more information about supportability of UEFI secure boot on Nutanix clusters, see Compatibility Matrix.

  • For information about operating systems that support UEFI and Secure Boot, see UEFI and Secure Boot Support for CHDR.

  • Nutanix does not support vSphere inventory mapping (for example, VM folder and resource pools) when protecting workloads between VMware clusters.

  • Nutanix does not support vSphere snapshots or delta disk files.

    If you have delta disks attached to a guest VM and you proceed with failover, you get a validation warning and the guest VM does not recover. Contact Nutanix Support for assistance.

Table 1. Operating Systems Supported for CHDR (Asynchronous Replication)
Operating System Version Requirements and limitations
Windows
  • Windows 2008 R2 or newer versions
  • Windows 7 or newer versions
  • Only 64-bit operating systems are supported.
Linux
  • CentOS 6.5 and 7.0
  • RHEL 6.5 or newer and RHEL 7.0 or newer.
  • Oracle Linux 6.5 and 7.0
  • Ubuntu 14.04
  • SLES operating system is not supported.

Additional Requirement

The storage container name of the protected guest VMs must be the same on both the primary and recovery clusters. Therefore, a storage container must exist on the recovery cluster with the same name as the one on the primary cluster. For example, if the protected VMs are in the SelfServiceContainer storage container on the primary cluster, there must also be a SelfServiceContainer storage container on the recovery cluster.

Asynchronous Replication Limitations (Nutanix Disaster Recovery)

Consider the following specific limitations before protecting your guest VMs with Asynchronous replication schedule. These limitations are in addition to the general limitations of Nutanix Disaster Recovery .

For information about the general limitations of Leap, see Nutanix Disaster Recovery Limitations.

  • You cannot restore guest VMs with incompatible GPUs at the recovery Nutanix cluster.
  • You cannot protect guest VMs configured as part of a network function chain.
  • You cannot retain hypervisor-specific properties after cross hypervisor disaster recovery (CHDR).

    CHDR does not preserve hypervisor-specific properties (for example, multi-writer flags, independent persistent and non-persistent disks, changed block tracking (CBT), PVSCSI disk configurations).

Creating a Protection Policy with an Asynchronous Replication Schedule (Nutanix Disaster Recovery)

To protect the guest VMs in an hourly replication schedule, configure an Asynchronous replication schedule while creating the protection policy. The policy takes recovery points of those guest VMs in the specified time intervals (hourly) and replicates them to the recovery AZs (AZs) for High Availability. To protect the guest VMs at the same or different recovery AZs, the protection policy allows you to configure Asynchronous replication schedules to at most two recovery AZs—a unique replication schedule to each recovery AZ. The policy synchronizes continuously to the recovery AZs in a bidirectional way.

Before you begin

See Asynchronous Replication Requirements (Nutanix Disaster Recovery) and Asynchronous Replication Limitations (Nutanix Disaster Recovery) before you start.

About this task

To create a protection policy with an Asynchronous replication schedule, do the following at the primary AZ. You can also create a protection policy at the recovery AZ. Protection policies you create or update at a recovery AZ synchronize back to the primary AZ.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Policies > Protection Policies in the left pane.
    Figure. Protection Policy Configuration: Protection Policies
    Click to enlarge Protection Policy Configuration: Protection Policies

  3. Click Create Protection Policy .
    Specify the following information in the Create Protection Policy window.
    Figure. Protection Policy Configuration: Select Primary Location
    Click to enlarge Protection Policy Configuration: Select Primary Location

    1. Policy name : Enter a name for the protection policy.
      Caution: The name can be of only alphanumeric, dot, dash, and underscore characters.
    2. In the Primary Location pane, specify the following information.
        1. Location : From the drop-down list, check an AZ that hosts the guests VMs to protect.

          The drop-down lists all the AZs paired with the local AZ. Local AZ represents the local AZ (Prism Central). For your primary AZ, you can check either the local AZ or a non-local AZ.

        2. Cluster : From the drop-down list, check the Nutanix cluster that hosts the guest VMs to protect.

          The drop-down lists all the Nutanix clusters registered to Prism Central representing the selected AZ. If you want to protect the guest VMs from multiple Nutanix clusters in the same protection policy, check the clusters that host those guest VMs. All Clusters protects the guest VMs of all Nutanix clusters registered to Prism Central.

        3. Click Save .

          Clicking Save activates the Recovery Location pane. After saving the primary AZ configuration, you can optionally add a local schedule (step iv) to retain the recovery points at the primary AZ.

        4. Click + Add Local Schedule if you want to retain recovery points locally in addition to retaining recovery points in a replication schedule (step d.iv). For example, you can create a local schedule to retain 15 minute recovery points locally and also an hourly replication schedule to retain recovery points and replicate them to a recovery AZ every 2 hours. The two schedules apply differently on the guest VMs.

          Specify the following information in the Add Schedule window.

          Figure. Protection Policy Configuration: Add Local Schedule
          Click to enlarge Protection Policy Configuration: Add Local Schedule

          1. Take Snapshot Every : Specify the frequency in minutes , hours , days , or weeks at which you want the recovery points to be taken locally.
          2. Retention Type : Specify one of the following two types of retention policy.
            • Linear : Implements a simple retention scheme at the local AZ. If you set the retention number to n, the local AZ retains the n recent recovery points.

              When you enter the frequency in minutes , the system selects the Roll-up retention type by default because minutely recovery points do not support Linear retention types.

            • Roll-up : Rolls up the recovery points into a single recovery point at the local AZ.

              For more information about the roll-up recovery points, see step d.iii.

          3. Retention on Local AZ:PE_A3_AHV : Specify the retention number for the local AZ.
          4. If you want to take application-consistent recovery points, check Take App-Consistent Recovery Point .

            Irrespective of the local or replication schedules, the recovery points are of the specified type. If you check Take App-Consistent Recovery Point , the recovery points generated are application-consistent and if you do not check Take App-Consistent Recovery Point , the recovery points generated are crash-consistent. If the time in the local schedule and the replication schedule match, the single recovery point generated is application-consistent.

            Note: See Application-consistent Recovery Point Conditions and Limitations before you take application-consistent snapshot.
          5. Click Save Schedule .
    3. In the Recovery Location pane, specify the following information.
      Figure. Protection Policy Configuration: Select Recovery Location
      Click to enlarge Protection Policy Configuration: Select Recovery Location

        1. Location : From the drop-down list, select the AZ (AZ) where you want to replicate the recovery points.

          The drop-down lists all the AZs paired with the local AZ. Local AZ represents the local AZ (Prism Central). Select Local AZ if you want to configure DR to a different Nutanix cluster at the same AZ.

          If you do not select a AZ, local recovery points that are created by the protection policy do not replicate automatically. You can, however, replicate the recovery points manually and use recovery plans to recover the guest VMs. For more information, see Protection and Manual DR (Nutanix Disaster Recovery).

        2. Cluster : From the drop-down list, select the Nutanix cluster where you want to replicate the recovery points.

          The drop-down lists all the Nutanix clusters registered to Prism Central representing the selected AZ. You can select one cluster at the recovery AZ. If you want to replicate the recovery points to more clusters at the same or different AZs, add another recovery AZ with a replication schedule. For more information to add another recovery AZ with a replication schedule, see step e.

          Note: Selecting auto-select from the drop-down list replicates the recovery points to any available cluster at the recovery AZ. Select auto-select from the drop-down list only if all the clusters at the recovery AZ are up and running.
          Caution: If the primary Nutanix cluster contains an IBM POWER Systems server, you can replicate recovery points to an on-prem AZ only if that on-prem AZ contains an IBM Power Systems server.
        3. Click Save .

          Clicking Save activates the + Add Schedule button between the primary and the recovery AZ. After saving the recovery AZ configuration, you can optionally add a local schedule to retain the recovery points at the recovery AZ.

        4. Click + Add Local Schedule if you want to retain recovery points locally in addition to retaining recovery points in a replication schedule (step d.iv). For example, you can create a local schedule to retain one hourly recovery points locally to supplement the hourly replication schedule. The two schedules apply differently on the guest VMs after failover, when the recovery points replicate back to the primary AZ.

          Specify the following information in the Add Schedule window.

          Figure. Protection Policy Configuration: Add Local Schedule
          Click to enlarge Protection Policy Configuration: Add Local Schedule

          1. Take Snapshot Every : Specify the frequency in minutes , hours , days , or weeks at which you want the recovery points to be taken locally.
          2. Retention Type : Specify one of the following two types of retention policy.
            • Linear : Implements a simple retention scheme at the local AZ. If you set the retention number to n, the local AZ retains the n recent recovery points.

              When you enter the frequency in minutes , the system selects the Roll-up retention type by default because minutely recovery points do not support Linear retention types.

            • Roll-up : Rolls up the recovery points into a single recovery point at the local AZ.

              For more information about the roll-up recovery points, see step d.iii.

          3. Retention on 10.xx.xx.xxx:PE_C1_AHV : Specify the retention number for the local AZ.
          4. If you want to take application-consistent recovery points, check Take App-Consistent Recovery Point .

            Irrespective of the local or replication schedules, the recovery points are of the specified type. If you check Take App-Consistent Recovery Point , the recovery points generated are application-consistent and if you do not check Take App-Consistent Recovery Point , the recovery points generated are crash-consistent. If the time in the local schedule and the replication schedule match, the single recovery point generated is application-consistent.

            Note: See Application-consistent Recovery Point Conditions and Limitations before you take application-consistent snapshot.
          5. Click Save Schedule .
    4. Click + Add Schedule to add a replication schedule between the primary and the recovery AZ.

      Specify the following information in the Add Schedule window. The window auto-populates the Primary Location and Recovery Location that you have selected in step b and step c.

      Figure. Protection Policy Configuration: Add Schedule (Asynchronous)
      Click to enlarge Protection Policy Configuration: Add Schedule (Asynchronous)

        1. Protection Type : Click Asynchronous .
        2. Take Snapshot Every : Specify the frequency in hours , days , or weeks at which you want the recovery points to be taken.

          The specified frequency is the RPO. For more information about RPO, see Nutanix Disaster Recovery Terminology.

        3. Retention Type : Specify one of the following two types of retention policy.
          • Linear : Implements a simple retention scheme at both the primary (local) and the recovery (remote) AZ. If you set the retention number for a given AZ to n, that AZ retains the n recent recovery points. For example, if the RPO is 1 hour, and the retention number for the local AZ is 48, the local AZ retains 48 hours (48 X 1 hour) of recovery points at any given time.
            Tip: Use linear retention policies for small RPO windows with shorter retention periods or in cases where you always want to recover to a specific RPO window.
          • Roll-up : Rolls up the recovery points as per the RPO and retention period into a single recovery point at a AZ. For example, if you set the RPO to 1 hour, and the retention time to 5 days, the 24 oldest hourly recovery points roll up into a single daily recovery point (one recovery point = 24 hourly recovery points) after every 24 hours. The system keeps one day (of rolled-up hourly recovery points) and 4 days of daily recovery points.
            Note:
            • If the retention period is n days, the system keeps 1 day of RPO (rolled-up hourly recovery points) and n-1 days of daily recovery points.
            • If the retention period is n weeks, the system keeps 1 day of RPO, 1 week of daily and n-1 weeks of weekly recovery points.
            • If the retention period is n months, the system keeps 1 day of RPO, 1 week of daily, 1 month of weekly, and n-1 months of monthly recovery points.
            • If the retention period is n years, the system keeps 1 day of RPO, 1 week of daily, 1 month of weekly, and n-1 months of monthly recovery points.
            Note: The recovery points that are used to create a rolled-up recovery point are discarded.
            Tip: Use roll-up retention policies for anything with a longer retention period. Roll-up policies are more flexible and automatically handle recovery point aging/pruning while still providing granular RPOs for the first day.
        4. To specify the retention number for the primary and recovery AZs, do the following.
          • Retention on Local AZ: PE_A3_AHV : Specify the retention number for the primary AZ.

            This field is unavailable if you do not specify a recovery location.

          • Retention on 10.xx.xx.xxx:PE_C1_AHV : Specify the retention number for the recovery AZ.

            If you select linear retention, the remote and local retention count represents the number of recovery points to retain at any given time. If you select roll-up retention, these numbers specify the retention period.

        5. If you want to enable reverse retention of the recovery points, check Reverse retention for VMs on recovery location .
          Note: Reverse retention for VMs on recovery location is available only when the retention numbers on the primary and recovery AZs are different.

          Reverse retention maintains the retention numbers of recovery points even after failover to a recovery AZ in the same or different AZs. For example, if you retain two recovery points at the primary AZ and three recovery points at the recovery AZ, and you enable reverse retention, a failover event does not change the initial retention numbers when the recovery points replicate back to the primary AZ. The recovery AZ still retains two recovery points while the primary AZ retains three recovery points. If you do not enable reverse retention, a failover event changes the initial retention numbers when the recovery points replicate back to the primary AZ. The recovery AZ retains three recovery points while the primary AZ retains two recovery points.

          Maintaining the same retention numbers at a recovery AZ is required if you want to retain a particular number of recovery points, irrespective of where the guest VM is after its failover.

        6. If you want to take application-consistent recovery points, check Take App-Consistent Recovery Point .

          Application-consistent recovery points ensure that application consistency is maintained in the replicated recovery points. For application-consistent recovery points, install NGT on the guest VMs running on AHV clusters. For guest VMs running on ESXi clusters, you can take application-consistent recovery points without installing NGT, but the recovery points are hypervisor-based, and leads to VM stuns (temporary unresponsive VMs) after failover to the recovery AZs.

          Note: See Application-consistent Recovery Point Conditions and Limitations before you take application-consistent snapshot.
          Caution: Application-consistent recovery points fail for EFI-boot enabled Windows 2019 VM running on ESXi when NGT is not installed. Nutanix recommends installing NGT on guest VMs running on ESXi also.
        7. Click Save Schedule .
    5. Click + Add Recovery Location at the top-right if you want to add an additional recovery AZ for the guest VMs in the protection policy.
      • To add an on-prem AZ for recovery, see Protection and DR between On-Prem AZs (Nutanix Disaster Recovery)
      • To add Xi Cloud Services for recovery, see Protection and DR between On-Prem AZ and Xi Cloud Service (Xi Leap).
      Figure. Protection Policy Configuration: Additional Recovery Location Click to enlarge Protection Policy Configuration: Additional Recovery Location

    6. Click + Add Schedule to add a replication schedule between the primary AZ and the additional recovery AZ you specified in step e.
      Perform step d again in the Add Schedule window to add the replication schedule. The window auto-populates the Primary Location and the additional Recovery Location that you have selected in step b and step c.

      By default, recovery point creation begins immediately after you create the protection policy. If you want to specify when recovery point creation must begin, click Immediately at the top-right corner, and then, in the Start Time dialog box, do the following.

        1. Click Start protection at specific point in time .
        2. Specify the time at which you want to start taking recovery points.
        3. Click Save .
    7. Click Next .
      Clicking Next shows a list of VM categories where you can optionally check one or more VM categories to protect in the protection policy. DR configurations using Leap allows you to protect a guest VM by using only one protection policy. Therefore, VM categories specified in another protection policy are not in the list. If you protect a guest VM in another protection policy by specifying the VM category of the guest VM (category-based inclusion), and if you protect the guest VM from the VMs page in this policy (individual inclusion), the individual inclusion supersedes the category-based inclusion. Effectively, only the protection policy that protected the individual guest VM protects the guest VM.

      For example, the guest VM VM_SherlockH is in the category Department:Admin , and you add this category to the protection policy named PP_AdminVMs . Now, if you add VM_SherlockH from the VMs page to another protection policy named PP_VMs_UK , VM_SherlockH is protected in PP_VMs_UK and unprotected from PP_AdminVMs .

    8. If you want to protect the guest VMs category wise, check the VM categories that you want to protect from the list and click Add .
      Figure. Protection Policy Configuration: Add VM Categories Click to enlarge Protection Policy Configuration: Add VM Categories

      Prism Central includes built-in VM categories for frequently encountered applications (for example, MS Exchange and Oracle). If the VM category or value you want is not available, first create the category with the required values, or update an existing category so that it has the values you require. Doing so ensures that the VM categories and values are available for selection. You can add VMs to the category either before or after you configure the protection policy. If the guest VMs have a common characteristic, such as belonging to a specific application or location, create a VM category and add the guest VMs into the category.

      If you do not want to protect the guest VMs category wise, proceed to the next step without checking VM categories. You can add the guest VMs individually to the protection policy later from the VMs page (see Adding Guest VMs Individually to a Protection Policy).

    9. Click Create .
      The protection policy with an Asynchronous replication schedule is created. To verify the protection policy, see the Protection Policies page. If you check VM categories in step h, the protection policy starts generating recovery points of the guest VMs in those VM categories. To see the generated recovery points, click the hamburger icon at the top-left corner of the window and go to VM Recovery Points . Click the recovery points for its information. You can see the time estimated for the very first replication (seeding) to the recovery AZs.
      Figure. Recovery Points Overview Click to enlarge Recovery Points Overview

Application-consistent Recovery Point Conditions and Limitations

This topic describes the conditions and limitations for application-consistent recovery points that you can generate through a protection policy. For information about the operating systems that support the AOS version you have deployed, see the Compatibility Matrix.

  • Before taking an application-consistent recovery point, consider the workload type of your guest VM.

    Applications running in your guest VM must be able to quiesce I/O operations. For example, For example, you can quiesce I/O operations for database applications and similar workload types.

  • Before taking an application-consistent recovery point, install and enable Nutanix Guest Tools (NGT) on your guest VM.

    For installing and enabling NGT, see Nutanix Guest Tools in the Prism Web Console Guide .

    For guest VMs running on ESXi, consider these points.

  • Install and enable NGT on guest VMs running on ESXi also. Application-consistent recovery points fail for EFI boot-enabled Windows 2019 VMs running on ESXi without installing NGT.

  • (vSphere) If you do not enable NGT and then try to take an application-consistent recovery point, the system creates a Nutanix native recovery point with a single vSphere host-based recovery point. The system deletes the vSphere host-based recovery point. If you enable NGT and then try to take application-consistent recovery point, the system directly captures a Nutanix native recovery point.
  • Do not delete the .snapshot folder in the vCenter.

  • The following table lists the operating systems that support application-consistent recovery points with NGT installed.
Table 1. Supported Operating Systems (NGT Installed)
Operating system Version
Windows
  • Windows 2008 R2 through Windows 2019
Linux
  • CentOS 6.5 through 6.9 and 7.0 through 7.3
  • Red Hat Enterprise Linux (RHEL) 6.5 through 6.9 and 7.0 through 7.3.
  • Oracle Linux 6.5 and 7.0
  • SUSE Linux Enterprise Server (SLES) 11 SP1 through 11 SP4 and 12 SP1 through 12 SP3
  • Ubuntu 14.04

Application-consistent Recovery Points with Microsoft Volume Shadow Copy Service (VSS)

  • To take application-consistent recovery points on Windows guest VMs, enable Microsoft VSS services.

    When you configure a protection policy and select Take App-Consistent Recovery Point , the Nutanix cluster transparently invokes the VSS (also known as Shadow copy or volume snapshot service).

    Note: This option is available for ESXi and AHV only. However, you can use third-party backup products to invoke VSS for Hyper-V.
  • To take application-consistent recovery points on guest VMs that use VSS, systems invoke Nutanix native in-guest VmQuiesced Snapshot Service (VSS) agent. VSS framework takes application-consistent recovery points without causing VM stuns (temporary unresponsive VMs).
  • VSS framework enables third-party backup providers like Commvault and Rubrik to take application-consistent snapshots on Nutanix platform in a hypervisor-agnostic manner.

  • The default and only backup type for VSS snapshots is VSS_BT_COPY (copy backup).

    Third party Backup products can choose between VSS_BT_FULL (full backup )and VSS_BT_COPY (copy backup) backup types.

  • Guest VMs with delta, SATA, and IDE disks do not support Nutanix VSS recovery points.
  • Guest VMs with iSCSI attachments (LUNs) do not support Nutanix VSS recovery points.

    Nutanix VSS recovery points fail for such guest VMs.

  • Do not take Nutanix enabled application-consistent recovery points while using any third-party backup provider enabled VSS snapshots (for example, Veeam).

Pre-freeze and Post-thaw scripts

  • You can take application-consistent recovery points on NGT and Volume Shadow Copy Service (VSS) enabled guest VMs. However, some applications require more steps before or after the VSS operations to fully quiesce the guest VMs to an appropriate restore point or state in which the system can capture a recovery point. Such applications need pre-freeze and post-thaw scripts to run the necessary extra steps.
  • Any operation that the system must perform on a guest VM before replication or a recovery point capture is a pre-freeze operation. For example, if a guest VM hosts a database, you can enable hot backup of the database before replication using a pre-freeze script. Similarly, any operation that the system must perform on guest VM after replication or a recovery point capture is a post-thaw operation.
    Tip: Vendors such as CommVault provide pre-freeze and post-thaw scripts. You can also write your own pre-freeze and post-thaw scripts.
Script Requirements
  • For Windows VMs, you must administrator and have read, write, and execute permissions on the scripts.
  • For Linux VMs, you must have root ownership and root access with 700 permissions on the scripts.
  • For completion of any operation before or after replication or recovery point capture, you must have both the pre_freeze and post_thaw scripts for the operation.
  • Timeout for both the scripts is 60 seconds.
  • A script must return 0 to indicate a successful run. A non-zero return value implies that the script execution failed. The necessary log entries are available in the NGT logs.
    Tip: (AHV) For a non-zero return value from the pre-freeze script, the system captures a non application-consistent snapshot and raises an alert on the Prism web console. Similarly, for a non-zero return value from the post-thaw script, the system attempts to capture an application-consistent snapshot once again. If the attempt fails, the system captures a non application-consistent snapshot, and raises an alert on the Prism web console.
  • Irrespective of whether the pre-freeze script execution is successful, the corresponding post-thaw script runs.
Script Location
You can define Python or shell scripts or any executable or batch files at the following locations in Linux or Windows VMs. The scripts can contain commands and routines necessary to run specific operations on one or more applications.
  • In Windows VMs,
    • Batch script file path for pre_freeze scripts:
      C:\Program Files\Nutanix\Scripts\pre_freeze.bat
    • Batch script file path for post_thaw scripts:
      C:\Program Files\Nutanix\Scripts\post_thaw.bat
  • In Linux VMs,
    • Shell script file path for production failover:
      /usr/local/sbin/pre_freeze

      Replace pre_freeze with the script name (without extension).

    • Shell script file path for test failover:
      /usr/local/sbin/post_thaw

      Replace post_thaw with the script name (without extension).

      Note: The scripts must have root ownership and root access with 700 permissions.
Script Sample
Note: The following are only sample scripts and therefore must be modified to fit your deployment.
  • For Linux VMs
    #!/bin/sh
    #pre_freeze-script
    date >> '/scripts/pre_root.log'
    echo -e "\n attempting to run pre_freeze script for MySQL as root user\n" >> /scripts/pre_root.log
    if [ "$(id -u)" -eq "0" ]; then
    python '/scripts/quiesce.py' &
    echo -e "\n executing query flush tables with read lock to quiesce the database\n" >> /scripts/pre_freeze.log
    echo -e "\n Database is in quiesce mode now\n" >> /scripts/pre_freeze.log
    else
    date >> '/scripts/pre_root.log'
    echo -e "not root useri\n" >> '/scripts/pre_root.log'
    fi
    #!/bin/sh
    #post_thaw-script
    date >> '/scripts/post_root.log'
    echo -e "\n attempting to run post_thaw script for MySQL as root user\n" >> /scripts/post_root.log
    if [ "$(id -u)" -eq "0" ]; then
    python '/scripts/unquiesce.py'
    else
    date >> '/scripts/post_root.log'
    echo -e "not root useri\n" >> '/scripts/post_root.log'
    fi
  • For Windows VMs
    @echo off 
    echo Running pre_freeze script >C:\Progra~1\Nutanix\script\pre_freeze_log.txt
    @echo off 
    echo Running post_thaw script >C:\Progra~1\Nutanix\script\post_thaw_log.txt
Note: If any of these scripts prints excessive output to the console session, the script freezes. To avoid script freeze, perform the following.
  • Add @echo off to your scripts.
  • Redirect the script output to a log file.
If you receive a non-zero return code from the pre-freeze script, the system captures a non application-consistent recovery point and raises an alert on the Prism web console. If you receive a non-zero return code from the post-thaw script, the system attempts to capture an application-consistent snapshot once again. If that attempt fails, the system captures a non application-consistent snapshot, and raises an alert on the Prism web console.
Applications supporting application-consistent recovery points without scripts
Only the following applications support application-consistent recovery points without pre-freeze and post-thaw scripts.
  • Microsoft SQL Server 2008, 2012, 2016, and 2019
  • Microsoft Exchange 2010
  • Microsoft Exchange 2013
  • Microsoft Exchange 2016

  • Nutanix does not support application-consistent recovery points on Windows VMs that have mounted VHDX disks.
  • The system captures hypervisor-based recovery points only when you have VMware Tools running on the guest VM and the guest VM does not have any independent disks attached to it.

    If these requirements are not met, the system captures crash-consistent snapshots.

  • The following table provides detailed information on whether a recovery point is application-consistent or not depending on the operating systems and hypervisors running in your environment.
    Note:
    • Installed and active means that the guest VM has the following.
      • NGT installed.
      • VSS capability enabled.
      • Powered on.
      • Actively communicating with the CVM.
Table 2. Application-consistent Recovery Points
Server ESXi AHV
NGT status Result NGT status Result
Microsoft Windows Server edition Installed and active. Also pre-freeze and post-thaw scripts are present. Nutanix script-based VSS snapshots Installed and active. Also pre-freeze and post-thaw scripts are present. Nutanix script-based VSS snapshots
Installed and active Nutanix VSS-enabled snapshots. Installed and active Nutanix VSS-enabled snapshots
Not enabled Hypervisor-based application-consistent or crash-consistent snapshots. Not enabled Crash-consistent snapshots
Microsoft Windows Client edition Installed and active. Also pre-freeze and post-thaw scripts are present. Nutanix script-based VSS snapshots Installed and active. Also pre-freeze and post-thaw scripts are present. Nutanix script-based VSS snapshots
Not enabled Hypervisor-based snapshots or crash-consistent snapshots. Not enabled Crash-consistent snapshots
Linux VMs Installed and active. Also pre-freeze and post-thaw scripts are present. Nutanix script-based VSS snapshots Installed and active. Also pre-freeze and post-thaw scripts are present. Nutanix script-based VSS snapshots
Not enabled Hypervisor-based snapshots or crash-consistent snapshots. Not enabled Crash-consistent snapshots

Creating a Recovery Plan (Nutanix Disaster Recovery)

To orchestrate the failover (disaster recovery) of the protected guest VMs to the recovery AZ, create a recovery plan. After a failover, a recovery plan recovers the protected guest VMs to the recovery AZ. If you have configured two on-prem recovery AZs in a protection policy, create two recovery plans for DR—one for recovery to each recovery AZ. The recovery plan synchronizes continuously to the recovery AZ in a bidirectional way.

About this task

To create a recovery plan, do the following at the primary AZ. You can also create a recovery plan at a recovery AZ. The recovery plan you create or update at a recovery AZ synchronizes back to the primary AZ.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Policies > Recovery Plans in the left pane.
    Figure. Recovery Plan Configuration: Recovery Plans
    Click to enlarge Recovery Plan Configuration: Protection Policies

  3. Click Create Recovery Plan .
    Specify the following information in the Create Recovery Plan window.
    Figure. Recovery Plan Configuration: General Click to enlarge Recovery Plan Configuration: General

  4. In the General tab, enter Recovery Plan Name , Recovery Plan Description , Primary Location , Recovery Location , and click Next .
    From Primary Location and Recovery Location drop-down lists, you can select either the local AZ (AZ) or a non-local AZ to serve as your primary and recovery AZs respectively. Local AZ represents the local AZ (Prism Central). If you are configuring recovery plan to recover the protected guest VMs to another Nutanix cluster at the same AZ, select Local AZ from both Primary Location and Recovery Location drop-down lists.
  5. In the Power On Sequence tab, click + Add Entities to add the guest VMs to the start sequence.
    Figure. Recovery Plan Configuration: Add Entities
    Click to enlarge Recovery Plan Configuration: Adding Entities

    1. In the Search Entities by , select VM Name from the drop-down list to specify guest VMs by name.
    2. In the Search Entities by , select Category from the drop-down list to specify guest VMs by category.
    3. To add the guest VMs or VM categories to the stage, select the VMs or VM categories from the list.
      Note: The VMs listed in the search result are in the active state of replication.
    4. Click Add .
      The selected guest VMs are added to the start sequence in a single stage by default. You can also create multiple stages to add guest VMs and define the order of their power-on sequence. For more information about stages, see Stage Management.
      Caution: Do not include the guest VMs protected with Asynchronous, NearSync, and Synchronous replication schedules in the same recovery plan. You can include guest VMs protected with Asynchronous or NearSync replication schedules in the same recovery plan. However, if you combine these guest VMs with the guest VMs protected by Synchronous replication schedules in a recovery plan, the recovery fails.
    5. To automate in-guest script execution on the guest VMs during recovery, select the individual guest VMs or VM categories in the stage and click Manage Scripts .
      Note: In-guest scripts allow you to automate various task executions upon recovery of the guest VMs. For example, in-guest scripts can help automate the tasks in the following scenarios.

      • After recovery, the guest VMs must use new DNS IP addresses and also connect to a new database server that is already running at the recovery AZ.

        Traditionally, to achieve this new configuration, you would manually log on to the recovered VM and modify the relevant files. With in-guest scripts, you have to write a script to automate the required steps and enable the script when you configure a recovery plan. The recovery plan execution automatically invokes the script and performs the reassigning of DNS IP address and reconnection to the database server at the recovery AZ.

      • If guest VMs are part of domain controller AZA.com at the primary AZ AZ1 , and after the guest VMs recover at the AZ AZ2 , you want to add the recovered guest VMs to the domain controller AZB.com .

        Traditionally, to reconfigure, you would manually log on to the VM, remove the VM from an existing domain controller, and then add the VM to a new domain controller. With in-guest scripts, you can automate the task of changing the domain controller.

      Note: In-guest script execution requires NGT version 1.9 or newer installed on the VM. The in-guest scripts run as a part of the recovery plan only if they have executable permissions for the following.
      • Administrator user (Windows)
      • Root user (Linux)
      Note: You can define a batch or shell script that executes automatically in the guest VMs after their disaster recovery. Place two scripts—one for production failover and the other for test failover—at the following locations in the guest VMs with the specified name.
      • In Windows VMs,
        • Batch script file path for production failover:
          C:\Program Files\Nutanix\scripts\production\vm_recovery.bat
        • Batch script file path for test failover:
          C:\Program Files\Nutanix\scripts\test\vm_recovery.bat
      • In Linux VMs,
        • Shell script file path for production failover:
          /usr/local/sbin/production_vm_recovery
        • Shell script file path for test failover:
          /usr/local/sbin/test_vm_recovery
      Note: When an in-guest script runs successfully, it returns code 0 . Any non-zero error code signifies that the execution of the in-guest script was unsuccessful.
      Figure. Recovery Plan Configuration: In-guest Script Execution
      Click to enlarge Recovery Plan Configuration: In-guest Script execution

        1. To enable script execution, click Enable .

          A command prompt icon appears against the guest VMs or VM categories to indicate that in-guest script execution is enabled on those guest VMs or VM categories.

        2. To disable script execution, click Disable .
  6. In the Network Settings tab, map networks in the primary cluster to networks at the recovery cluster.
    Figure. Recovery Plan Configuration: Network Settings
    Click to enlarge Recovery Plan Configuration: Network Mapping

    Network mapping enables replicating the network configurations of the primary Nutanix clusters to the recovery Nutanix clusters, and recover guest VMs into the same subnet at the recovery Nutanix cluster. For example, if a guest VM is in the vlan0 subnet at the primary Nutanix cluster, you can configure the network mapping to recover that guest VM in the same vlan0 subnet at the recovery Nutanix cluster. To specify the source (primary Nutanix cluster) and destination (recovery Nutanix cluster) network information for network mapping, do the following in Local AZ (Primary) and PC 10.xx.xx.xxx (Recovery) panes.
    1. Under Production in Virtual Network or Port Group drop-down list, select the production subnet that contains the protected guest VMs. (optional) If the virtual network is a non-IPAM network, specify the gateway IP address and prefix length in the Gateway IP/Prefix Length field.
    2. Under Test Failback in Virtual Network or Port Group drop-down list, select the test subnet that you want to use for testing failback from the recovery Nutanix cluster. (optional) If the virtual network is a non-IPAM network, specify the gateway IP address and prefix length in the Gateway IP/Prefix Length field.
    3. To add more network mappings, click Add Networks at the top-right corner of the page, and then repeat the steps 6.a-6.b.
      Note: The primary and recovery Nutanix clusters must have identical gateway IP addresses and prefix length. Therefore you cannot use a test failover network for two or more network mappings in the same recovery plan.
    4. Click Done .
    Note: For ESXi, you can configure network mapping for both standard and distributed (DVS) port groups. For more information about DVS, see VMware documentation.
    Caution: Leap does not support VMware NSX-T datacenters. For more information about NSX-T datacenters, see VMware documentation.
  7. To perform VM-level static IP address mapping between the primary and the recovery AZs, click Advanced Settings , click Custom IP Mapping , and then do the following.
    Note: The Custom IP Mapping shows all the guest VMs with static IP address configured, NGT installed, and VNIC in the source subnet specified in the network mapping.
    1. To locate the guest VM, type the name of the guest VM in the filter field.
      A guest VM that has multiple NICs lists in multiple rows, allowing you to specify an IP address mapping for each VNIC. All the fields auto-populate with the IP addresses generated based on the offset IP address-mapping scheme.
    2. In the Test Failback field for the local AZ, Production field for the remote (recovery) AZ, and Test Failover for the remote AZ, edit the IP addresses.
      Perform this step for all the IP addresses that you want to map.
      Caution: Do not edit the IP address assigned to the VNIC in the local AZ. If you do not want to map static IP addresses for a particular VNIC, you can proceed with the default entries.
    3. Click Save .
    4. If you want to edit one or more VM-level static IP address mappings, click Edit , and then change the IP address mapping.
  8. If VM-level static IP address mapping is configured between the primary and the recovery Nutanix clusters and you want to use the default, offset-based IP address-mapping scheme, click Reset to Matching IP Offset .
  9. Click Done .
    The recovery plan is created. To verify the recovery plan, see the Recovery Plans page. You can modify the recovery plan to change the recovery location, add, or remove the protected guest VMs. For information about various operations that you can perform on a recovery plan, see Recovery Plan Management.
Stage Management

A stage defines the order in which the protected guest VMs start at the recovery cluster. You can create multiple stages to prioritize the start sequence of the guest VMs. In the Power On Sequence , the VMs in the preceding stage start before the VMs in the succeeding stages. On recovery, it is desirable to start some VMs before the others. For example, database VMs must start before the application VMs. Place all the database VMs in the stage before the stage containing the application VMs, in the Power On Sequence .

Figure. Recovery Plan Configuration: Power On Sequence Click to enlarge Recovery Plan Configuration: Power On Sequence

To Add a Stage in the Power-On Sequence and Add Guest VMs to It, Do the Following.

  1. Click +Add New Stage .
  2. Click +Add Entities .
  3. To add guest VMs to the current stage in the power-on sequence, do the following.
    1. In the Search Entities by , select VM Name from the drop-down list to specify guest VMs by name.
    2. In the Search Entities by , select Category from the drop-down list to specify guest VMs by category.
    3. To add the guest VMs or VM categories to the stage, select the guest VMs or VM categories from the list.
      Note: The VMs listed in the search result are in the active state of replication.
  4. Click Add .

To Remove a Stage from the Power-On Sequence, Do the Following.

Click Actions > Remove Stage
Note: You see Actions in a stage only when none of the VMs in the stage are selected. When one or more VMs in the stage are selected, you see More Actions .

To Change the Position of a Stage in the Power-On Sequence, Do the Following.

  • To move a stage up or down in the power-on sequence, click or respectively, at the top-right corner of the stage.
  • To expand or collapse a stage, click + or - respectively, at the top-right corner of the stage.
  • To move VMs to a different stage, select the VMs, do the following.
    1. Click More Actions > Move .
    2. Select the target stage from the list.
    Note: You see Move in the More Actions only when you have defined two or more stages.

To Set a Delay Between the Power-On Sequence of Two Stages, Do the Following.

  1. Click +Add Delay .
  2. Enter the time in seconds.
  3. Click Add .

To Add Guest VMs to an Existing Stage, Do the Following.

  1. Click Actions > Add Entities .
    Note: You see Actions in a stage only when none of the VMs in the stage are selected. When one or more VMs in the stage are selected, you see More Actions .
  2. To add VMs to the current stage in the power-on sequence, do the following.
    1. In the Search Entities by , select VM Name from the drop-down list to specify guest VMs by name.
    2. In the Search Entities by , select Category from the drop-down list to specify guest VMs by category.
    3. To add the guest VMs or VM categories to the stage, select the guest VMs or VM categories from the list.
      Note: The VMs listed in the search result are in the active state of replication.
  3. Click Add .

To Remove Guest VMs from an Existing Stage, Do the Following.

  1. Select the VMs from the stage.
  2. Click More Actions > Remove .
    Note: You see More Actions in a stage only when one or more VMs in the stage are selected. When none of the VMs in the stage are selected, you see Actions .

To Move Guest VMs to a Different Stage, Do the Following.

  1. Select the VMs from the stage.
  2. Click More Actions > Move .
    Note: You see More Actions in a stage only when one or more VMs in the stage are selected. When none of the VMs in the stage are selected, you see Actions .
  3. Select the target stage from the list.

Failover and Failback Management

You perform failover of the protected guest VMs when unplanned failure events (for example, natural disasters) or planned events (for example, scheduled maintenance) happen at the primary AZ (AZ) or the primary cluster. The protected guest VMs migrate to the recovery AZ where you perform the failover operations. On recovery, the protected guest VMs start in the Nutanix cluster you specify in the recovery plan that orchestrates the failover.

The following are the types of failover operations.

Test failover
To ensure that the protected guest VMs failover efficiently to the recovery AZ, you perform a test failover. When you perform a test failover, the guest VMs recover in the virtual network designated for testing purposes at the recovery AZ. However, the guest VMs at the primary AZ are not affected. Test failovers rely on the presence of VM recovery points at the recovery AZs.
Planned failover
To ensure VM availability when you foresee service disruption at the primary AZ, you perform a planned failover to the recovery AZ. For a planned failover to succeed, the guest VMs must be available at the primary AZ. When you perform a planned failover, the recovery plan first creates a recovery point of the protected guest VM, replicates the recovery point to the recovery AZ, and then starts the guest VM at the recovery AZ. The recovery point used for migration is retained indefinitely. After a planned failover, the guest VMs no longer run at the primary AZ.
Unplanned failover
To ensure VM availability when a disaster causing service disruption occurs at the primary AZ, you perform an unplanned failover to the recovery AZ. In an unplanned failover, you can expect some data loss to occur. The maximum data loss possible is equal to the least RPO you specify in the protection policy, or the data that was generated after the last manual recovery point for a given guest VM. In an unplanned failover, by default, the protected guest VMs recover from the most recent recovery point. However, you can recover from an earlier recovery point by selecting a date and time of the recovery point.

At the recovery AZ, the guest VMs can recover using the recovery points replicated from the primary AZ only. The guest VMs cannot recover using the local recovery points. For example, if you perform an unplanned failover from the primary AZ AZ1 to the recovery AZ AZ2 , the guest VMs recover at AZ2 using the recovery points replicated from AZ1 to AZ2 .

You can perform a planned or an unplanned failover in different scenarios of network failure. For more information about network failure scenarios, see Nutanix Disaster Recovery and Xi Leap Failover Scenarios.

At the recovery AZ after a failover, the recovery plan creates only the VM category that was used to include the guest VM in the recovery plan. Manually create the remaining VM categories at the recovery AZ and associate the guest VMs with those categories.

The recovered guest VMs generate recovery points as per the replication schedule that protects it even after recovery. The recovery points replicate back to the primary AZ when the primary AZ starts functioning. The approach for reverse replication enables you to perform failover of the guest VMs from the recovery AZ back to the primary AZ (failback). The same recovery plan applies to both the failover and the failback operations. Only, for failover, you must perform the failover operations on the recovery plan at the recovery AZ while for failback, you must perform the failover operations on the recovery plan at the primary AZ. For example, if a guest VM fails over from AZ1 (Local) to AZ2 , the failback fails over the same VMs from AZ2 (Local) back to AZ1 .

Nutanix Disaster Recovery and Xi Leap Failover Scenarios

You have the flexibility to perform a real or simulated failover for the full and partial workloads (with or without networking). The term virtual network is used differently on on-prem clusters and Xi Cloud Services. In Xi Cloud Services, the term virtual network is used to describe the two built-in virtual networks—production and test. Virtual networks on the on-prem clusters are virtual subnets bound to a single VLAN. Manually create these virtual subnets, and create separate virtual subnets for production and test purposes. Create these virtual subnets before you configure recovery plans. When configuring a recovery plan, you map the virtual subnets at the primary AZ to the virtual subnets at the recovery AZ.

Figure. Failover in Network Mapping Click to enlarge Failover in Network Mapping

The following are the various scenarios that you can encounter in Leap configurations for disaster recovery (DR) to an on-prem AZ (AZ) or to Xi Cloud (Xi Leap). Each scenario is explained with the required network-mapping configuration for Xi Leap. However, the configuration remains the same irrespective of disaster recovery (DR) using Leap or Xi Leap. You can either create a recovery plan with the following network mappings (see Creating a Recovery Plan (Nutanix Disaster Recovery)) or update an existing recovery plan with the following network mappings (see Updating a Recovery Plan).

Scenario 1: Leap Failover (Full Network Failover)

Full network failure is the most common scenario. In this case, it is desirable to bring up the whole primary AZ in the Xi Cloud. All the subnets must failover, and the WAN IP address must change from the on-prem IP address to the Xi WAN IP address. Floating IP addresses can be assigned to individual guest VMs, otherwise, everything use Xi network address translation (NAT) for external communication.

Perform the failover when the on-prem subnets are down and jump the host available on the public Internet through the floating IP address of Xi production network.

Figure. Full Network Failover Click to enlarge Full Network Failover

To set up the recovery plan that orchestrates the full network failover, perform the following.

  1. Open the Network Settings page to configure network mappings in a recovery plan.
  2. Select the Local AZ > Production > Virtual Network or Port Group .

    The selection auto-populates the Xi production and test failover subnets.

  3. Select the Outbound Internet Access switch to allow the Xi NAT to use for Internet access.
  4. Dynamically assign the floating IP addresses to the guest VMs you select in the recovery plan.

    Perform steps 1–4 for every subnet.

Figure. Recovery Plan Configuration: Network Settings Click to enlarge Recovery Plan Configuration: Network Settings

Scenario 2: Xi Network Failover (Partial Network Failover)

You want to failover one or more subnets from the primary AZ to Xi Cloud. The communications between the AZs happen through the VPN or using the external NAT or floating IP addresses. A use case of this type of scenario is that the primary AZ needs maintenance, but some of its subnets must see no downtime.

Perform partial failover when some subnets are active in the production networks at both on-prem and Xi Cloud, and jump the host available on the public Internet through the floating IP address of Xi production network.

On-prem guest VMs can connect to the guest VMs on the Xi Cloud Services.

Figure. Partial Network Failover Click to enlarge Partial Network Failover

To set up the recovery plan that orchestrates the partial network failover, perform the following.

  1. Open the Network Settings page to configure network mappings in a recovery plan.
  2. Select the Local AZ > Production > Virtual Network or Port Group .

    The selection auto-populates the Xi production and test failover subnets.

  3. Select the Outbound Internet Access switch to allow the Xi NAT to use for Internet access.
  4. Dynamically assign the floating IP addresses to the guest VMs you select in the recovery plan.

    Perform steps 1–4 for one or more subnets based on the maintenance plan.

Figure. Recovery Plan Configuration: Network Settings Click to enlarge Recovery Plan Configuration: Network Settings

Scenario 3: Xi Network Failover (Partial Subnet Network Failover)

You want to failover some guest VMs to Xi Cloud, while keeping the other guest VMs up and running at the on-prem cluster (primary AZ). A use case of this type of scenario is that the primary AZ needs maintenance, but some of its guest VMs must see no downtime.

This scenario requires changing IP addresses for the guest VMs running at Xi Cloud. Since you cannot have the subnet active on both the AZs, create a subnet to host the failed over guest VMs. Jump the host available on the public Internet through the floating IP address of Xi production network.

On-prem guest VMs can connect to the guest VMs on the Xi Cloud Services.

Figure. Partial Subnet Network Failover Click to enlarge Partial Subnet Network Failover

To set up the recovery plan that orchestrates the partial subnet network failover, perform the following.

  1. Open the Network Settings page to configure network mappings in a recovery plan.
  2. Select the Local AZ > Production > Virtual Network or Port Group .

    The selection auto-populates the Xi production and test failover subnets for a full subnet failover

    Note: In this case, you have created subnets on the Xi Cloud Services also. Choose that subnets to avoid full subnet failover (scenario 1).
  3. Select the Outbound Internet Access switch to allow the Xi NAT to use for Internet access.
  4. Dynamically assign the floating IP addresses to the guest VMs you select in the recovery plan.

    Perform steps 1–4 for one or more subnets based on the maintenance plan.

Figure. Recovery Plan Configuration: Network Settings Click to enlarge Recovery Plan Configuration: Network Settings

Scenario 4: Xi Network Failover (Test Failover and Failback)

You want to test all the preceding three scenarios by creating an isolated test network so that no routing or IP address conflict happens. Clone all the guest VMs from a local recovery point and bring up to test failover operations. Test failover test when all on-prem subnets are active and on-prem guest VMs can connect to the guest VMs at the Xi Cloud. Jump the host available on the public Internet through the floating IP address of Xi production network.

Figure. Test Failover & Failback Click to enlarge Test Failover & Failback

In this case, focus on the test failover section when creating the recovery plan. When you select a local AZ production subnet, it copies to the test network. You can go one step further and create a test subnet at the Xi Cloud.

Figure. Recovery Plan Configuration: Network Settings Click to enlarge Recovery Plan Configuration: Network Settings

After the guest VMs test failover to Xi Cloud, you can do a test failback to the primary AZ.
Note: Make a test subnet in advance for the failback to the on-prem cluster.
Figure. Recovery Plan Configuration: Network Settings Click to enlarge Recovery Plan Configuration: Network Settings

Failover and Failback Operations (Nutanix Disaster Recovery)

You can perform test failover, planned failover, and unplanned failover of the guest VMs protected with Asynchronous replication schedule across different Nutanix clusters at the same or different on-prem AZs (AZs). The steps to perform test, planned, and unplanned failover are largely the same irrespective of the replication schedules that protect the guest VMs.

Performing a Test Failover (Leap)

After you create a recovery plan, you can run a test failover periodically to ensure that the failover occurs smoothly when required. To perform a test failover, do the following procedure at the recovery AZ. If you have two recovery AZs for DR, perform the test at the AZ where you want to recover the guest VMs.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Policies > Recovery Plans in the left pane.
  3. Select the recovery plan that you want to test.
  4. Click Test from the Actions drop-down menu.
    Figure. Test Failover (Drop-down) Click to enlarge Recovery Plan Configuration: General

    Test Recovery Plan window shows. The window auto-populates the Failover From and Failover To locations from the recovery plan you select in step 3. Failover To location is Local AZ by default and is unavailable for editing.
    Figure. Test Recovery Plan Click to enlarge Test Recovery Plan

  5. Click + Add target clusters if you want to failover to specific Nutanix clusters at the recovery AZ.
    If you do not add target clusters, the recovery plan recovers the guest VMs to any eligible cluster at the recovery AZ.
  6. Click Test .
    The Test Recovery Plan dialog box lists the errors and warnings, if any, and allows you to stop or continue the test operation. If there are no errors or you resolve the errors in step 7, the guest VMs failover to the recovery cluster.
  7. If you see errors, do the following.
    • To review errors or warnings, click View Details in the description.

      Resolve the error conditions and then restart the test procedure.

    • Select one of the following.
      • To stop the failover operation, click Abort .
      • To continue the failover operation despite the warnings, click Execute Anyway .
        Note: You cannot continue the failover operation when the validation fails with errors.
Cleaning up Test VMs (Leap)

After testing a recovery plan, you can remove the test VMs that the recovery plan creates in the recovery test network. To clean up the test VMs, do the following at the recovery AZ where the test failover created the test VMs.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Policies > Recovery Plans in the left pane.
  3. Select the recovery plans whose test VMs you want to remove.
  4. Click Clean Up Test VMs from the Actions drop-down menu.
    Clean Up Test VMs dialog box shows with the name of the recovery plan you selected in step 3.
  5. Click Clean .
    Figure. Clean Up Test VMs Click to enlarge Clean Up Test VMs

Performing a Planned Failover (Leap)

If there is a planned event (for example, scheduled maintenance of guest VMs) at the primary AZ (AZ), perform a planned failover to the recovery AZ. To perform a planned failover, do the following procedure at the recovery AZ. If you have two recovery AZs for DR, perform the failover at the AZ where you want to recover the guest VMs.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Policies > Recovery Plans in the left pane.
  3. Select a recovery plan for the failover operation.
  4. Click Failover from the Actions drop-down menu.
    Note: If you select more than one recovery plan in step 3, the Failover action is available only when the selected recovery plans have the same primary and recovery locations.
    Specify the following information in the Failover from Recovery Plan window. The window auto-populates the Failover From and Failover To locations from the recovery plan you select in step 3.
    Figure. Planned Failover
    Click to enlarge Planned Failover

    1. Failover Type : Click Planned Failover .
      Warning: Do not check Live Migrate VMs . Live migration works only for the planned failover of the guest VMs protected in Synchronous replication schedule. If you check Live Migrate VMs for the planned failover of the guest VMs protected in Asynchronous or NearSync replication schedule, the failover task fails.
    2. Click + Add target clusters if you want to failover to specific Nutanix clusters at the recovery AZ.
      Figure. Planned Failover: Select Recovery Cluster
      Click to enlarge Planned Failover: Select Recovery Cluster

      If you do not add target clusters, the recovery plan recovers the guest VMs to any eligible cluster at the recovery AZ.
  5. Click Failover .
    The Failover from Recovery Plan dialog box lists the errors and warnings, if any, and allows you to stop or continue the failover operation. If there are no errors or you resolve the errors in step 6, the guest VMs failover to the recovery Nutanix cluster.
  6. If you see errors, do the following.
    • To review errors or warnings, click View Details in the description.

      Resolve the error conditions and then restart the failover procedure.

    • Select one of the following.
      • To stop the failover operation, click Abort .
      • To continue the failover operation despite the warnings, click Execute Anyway .
        Note: You cannot continue the failover operation when the validation fails with errors.
    Note:

    The entities of AHV/ESXi clusters recover at a different path on the ESXi clusters if their files conflict with the existing files on the recovery ESXi cluster. For example, there is a file name conflict if a VM (VM1) migrates to a recovery cluster that already has a VM (VM1) in the same container.

    However, the entities recover at a different path with VmRecoveredAtAlternatePath alert only if the following conditions are met.

    • Prism Element running on both the primary and the recovery Nutanix clusters are of version 5.17 or newer.
    • A path for the entity recovery is not defined while initiating the failover operation.
    • The protected entities do not have shared disk/s.

    If these conditions are not satisfied, the failover operation fails.

Performing an Unplanned Failover (Leap)

If there is an unplanned event (for example, a natural disaster or network failure) at the primary AZ (AZ), perform an unplanned failover to the recovery AZ. To perform an unplanned failover, do the following procedure at the recovery AZ. If you have two recovery AZs for DR, perform the failover at the AZ where you want to recover the guest VMs.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Policies > Recovery Plans in the left pane.
  3. Select a recovery plan for the failover operation.
  4. Click Failover from the Actions drop-down menu.
    Note: If you select more than one recovery plan in step 3, the Failover action is available only when the selected recovery plans have the same primary and recovery locations.
    Specify the following information in the Failover from Recovery Plan window. The window auto-populates the Failover From and Failover To locations from the recovery plan you select in step 3.
    Figure. Unplanned Failover
    Click to enlarge Unplanned Failover

    1. Failover Type : Click Unplanned Failover and do one of the following.
      • Click Recover from latest Recovery Point to use the latest recovery point for recovery.
      • Click Recover from specific point in time to use a recovery point taken at a specific point in time for recovery.
        Note: If you click Recover from specific point in time , select a Nutanix cluster that hosts the specific point in time recovery point (step 4.b). If you do not select a cluster, or select multiple clusters where the same recovery points exist, the guest VMs fail to recover efficiently because the system encounters more than one recovery point at the recovery AZ. For example, if a primary AZ AZ1 replicates the same recovery points to two clusters CLA and CLB at AZ AZ2 , select either the cluster CLA or the cluster CLB as the target cluster when you click to recover from a specific point in time. If you select both CLA and CLB , the guest VMs fail to recover.

    2. Click + Add target clusters if you want to failover to specific Nutanix clusters at the recovery AZ.
      Figure. Unplanned Failover: Select Recovery Cluster
      Click to enlarge Unplanned Failover: Select Recovery Cluster

      If you do not add target clusters, the recovery plan recovers the guest VMs to any eligible cluster at the recovery AZ.
    Note: If recovery plans contain VM categories, the VMs from those categories recover in the same category after an unplanned failover to the recovery AZ. Also, the recovery points keep generating at the recovery AZ for those recovered VMs. Since the VM count represents the number of recoverable VMs (calculated from recovery points), the recovered VMs and their newly created recovery points sum up. Their sum gives double the count of the originally recovered VMs on the recovery plans page. Now, if some VMs belonging to the given category at the primary or recovery AZ are deleted, the VM count at both AZs still stay the same until the recovery points of deleted VMs expire. For example, when two VMs have failed over, the recovery plans page at the recovery AZ shows four VMs (two replicated recovery points from source and two newly generated recovery points). The page shows four VMs even if the VMs are deleted from the primary or recovery AZ. The VM count synchronizes and becomes consistent in the subsequent RPO cycle conforming to the retention policy set in the protection policy (due to the expiration of recovery points).
  5. Click Failover .
    The Failover from Recovery Plan dialog box lists the errors and warnings, if any, and allows you to stop or continue the failover operation. If there are no errors or you resolve the errors in step 6, the guest VMs failover to the recovery Nutanix cluster.
  6. If you see errors, do the following.
    • To review errors or warnings, click View Details in the description.

      Resolve the error conditions and then restart the failover procedure.

    • Select one of the following.
      • To stop the failover operation, click Abort .
      • To continue the failover operation despite the warnings, click Execute Anyway .
        Note: You cannot continue the failover operation when the validation fails with errors.
    Note: To avoid conflicts when the primary AZ becomes active after the failover, shut down the guest VMs associated with this recovery plan. Manually power off the guest VMs on either primary or recovery AZ after the failover is complete. You can also block the guest VMs associated with this recovery plan through the firewall.

    Note:

    The entities of AHV/ESXi clusters recover at a different path on the ESXi clusters if their files conflict with the existing files on the recovery ESXi cluster. For example, there is a file name conflict if a VM (VM1) migrates to a recovery cluster that already has a VM (VM1) in the same container.

    However, the entities recover at a different path with VmRecoveredAtAlternatePath alert only if the following conditions are met.

    • Prism Element running on both the primary and the recovery Nutanix clusters are of version 5.17 or newer.
    • A path for the entity recovery is not defined while initiating the failover operation.
    • The protected entities do not have shared disk/s.

    If these conditions are not satisfied, the failover operation fails.

Performing Failback (Leap)

A failback is failover of the guest VMs from the recovery AZ (AZ) back to the primary AZ. The same recovery plan applies to both the failover and the failback operations. Only, for failover, you must perform the failover operations on the recovery plan at the recovery AZ while for failback, you must perform the failover operations on the recovery plan at the primary AZ.

About this task

To perform a failback, do the following procedure at the primary AZ.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Policies > Recovery Plans in the left pane.
  3. Select a recovery plan for the failover operation.
  4. Click Failover from the Actions drop-down menu.
    Note: If you select more than one recovery plan in step 3, the Failover action is available only when the selected recovery plans have the same primary and recovery locations.
    Specify the following information in the Failover from Recovery Plan window. The window auto-populates the Failover From and Failover To locations from the recovery plan you select in step 3.
    Figure. Unplanned Failover
    Click to enlarge Unplanned Failover

    1. Failover Type : Click Unplanned Failover and do one of the following.
      Tip: You can also click Planned Failover to perform planned failover procedure for a failback.
      • Click Recover from latest Recovery Point to use the latest recovery point for recovery.
      • Click Recover from specific point in time to use a recovery point taken at a specific point in time for recovery.
    2. Click + Add target clusters if you want to failover to specific Nutanix clusters at the primary AZ.
      If you do not add target clusters, the recovery plan recovers the guest VMs to any eligible cluster at the primary AZ.
    Note: If recovery plans contain VM categories, the VMs from those categories recover in the same category after an unplanned failover to the recovery AZ. Also, the recovery points keep generating at the recovery AZ for those recovered VMs. Since the VM count represents the number of recoverable VMs (calculated from recovery points), the recovered VMs and their newly created recovery points sum up. Their sum gives double the count of the originally recovered VMs on the recovery plans page. Now, if some VMs belonging to the given category at the primary or recovery AZ are deleted, the VM count at both AZs still stay the same until the recovery points of deleted VMs expire. For example, when two VMs have failed over, the recovery plans page at the recovery AZ shows four VMs (two replicated recovery points from source and two newly generated recovery points). The page shows four VMs even if the VMs are deleted from the primary or recovery AZ. The VM count synchronizes and becomes consistent in the subsequent RPO cycle conforming to the retention policy set in the protection policy (due to the expiration of recovery points).
  5. Click Failover .
    The Failover from Recovery Plan dialog box lists the errors and warnings, if any, and allows you to stop or continue the failover operation. If there are no errors or you resolve the errors in step 6, the guest VMs failover to the recovery cluster.
  6. If you see errors, do the following.
    • To review errors or warnings, click View Details in the description.

      Resolve the error conditions and then restart the failover procedure.

    • Select one of the following.
      • To stop the failover operation, click Abort .
      • To continue the failover operation despite the warnings, click Execute Anyway .
        Note: You cannot continue the failover operation when the validation fails with errors.
    Note: To avoid conflicts when the primary AZ becomes active after the failover, shut down the guest VMs associated with this recovery plan. Manually power off the guest VMs on either primary or recovery AZ after the failover is complete. You can also block the guest VMs associated with this recovery plan through the firewall.

    Note:

    The entities of AHV/ESXi clusters recover at a different path on the ESXi clusters if their files conflict with the existing files on the recovery ESXi cluster. For example, there is a file name conflict if a VM (VM1) migrates to a recovery cluster that already has a VM (VM1) in the same container.

    However, the entities recover at a different path with VmRecoveredAtAlternatePath alert only if the following conditions are met.

    • Prism Element running on both the primary and the recovery Nutanix clusters are of version 5.17 or newer.
    • A path for the entity recovery is not defined while initiating the failover operation.
    • The protected entities do not have shared disk/s.

    If these conditions are not satisfied, the failover operation fails.

Monitoring a Failover Operation (Leap)

After you trigger a failover operation, you can monitor failover-related tasks. To monitor a failover, perform the following procedure at the recovery AZ. If you have two recovery AZs for DR, perform the procedure at the AZ where you trigger the failover.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Policies > Recovery Plans in the left pane.
  3. Click the name of the recovery plan for which you triggered failover.
  4. Click the Tasks tab.
    The left pane displays the overall status. The table in the details pane lists all the running tasks and their individual statuses.
Leap Role-Based Access Control (RBAC)

You can configure RBAC policies allowing other Prism Central Active Directory users (non-administrator roles) to perform operations on recovery points and recovery plans. This section guides you to configure recovery plan RBAC policies. For information about RBAC policies for recovery points, see Controlling User Access (RBAC) in the Nutanix Security Guide . Perform the following steps to configure recovery plan RBAC policies.

Note: You can configure recovery plan RBAC policies for Leap deployments with single IAM v2 enabled Prism Central version 2022.4 or newer only.
  1. Create a custom role in Prism Central. See Creating a Custom Role.

    You must create a custom role because none of the in-built roles support recovery plan operations.

    Tip: To modify or delete a custom role, see Modifying a Custom Role. For more information about user role management, see Security and User Management in the Prism Central Guide .
    1. Assign permissions to the custom role. See Custom Role Permissions.
  2. Assign entities to the custom role. See Assigning a Role.
    The custom role is created. The Active Directory (AD) users or User Groups can now log on to Prism Central, view the assigned recovery plans, and perform recovery plan operations. For recovery plan operations, see Failover and Failback Operations (Nutanix Disaster Recovery).
    Note: After an entity failover, the Access Control Policies (ACP) where the access is based on ownership, project, or category is retained on the recovery AZ. The access to an entity is revoked in the following scenarios.
    • When you perform an unplanned failover of the entity.

      The entity access is revoked because the entity UUID changes after the unplanned failover.

    • When the entity access is cluster based.
Creating a Custom Role

About this task

To create a custom role, do the following:

Procedure

  1. Go to the roles dashboard (select Administration > Roles in the pull-down menu) and click the Create Role button.

    The Roles page appears. See Custom Role Permissions for a list of the permissions available for each custom role option.

  2. In the Roles page, do the following in the indicated fields:
    1. Role Name : Enter a name for the new role.
    2. Description (optional): Enter a description of the role.
      Note: All entity types are listed by default, but you can display just a subset by entering a string in the Filter Entities search field.
      Figure. Filter Entities Click to enlarge Filters the available entities

    3. Select an entity you want to add to this role and provide desired access permissions from the available options. The access permissions vary depending on the selected entity.

      For example, for the VM entity, click the radio button for the desired VM permissions:

      • No Access
      • View Access
      • Basic Access
      • Edit Access
      • Set Custom Permissions

      If you select Set Custom Permissions , click the Change link to display the Custom VM Permissions window, check all the permissions you want to enable, and then click the Save button. Optionally, check the Allow VM Creation box to allow this role to create VMs.

      Figure. Custom VM Permissions Window Click to enlarge displays the custom VM permissions window

    4. Recovery Plan : Click the radio button for the desired permissions for recovery plan operations ( No Access , View Access , Test Execution Access , Full Execution Access , or Set Custom Permissions ). If you specify custom permissions, click the Change link to display the Custom Recovery Plan Permissions window, check all the permissions you want to enable (see Custom Role Permissions), and then click the Save button.
      Figure. Custom Recovery Plan Permissions Window Click to enlarge displays the custom recovery plan permissions window

    5. Blueprint : Click the radio button for the desired blueprint permissions ( No Access , View Access , Basic Access , or Set Custom Permissions ). Check the Allow Blueprint Creation box to allow this role to create blueprints. If you specify custom permissions, click the Change link to display the Custom Blueprint Permissions window, check all the permissions you want to enable, and then click the Save button.
      Figure. Custom Blueprint Permissions Window Click to enlarge displays the custom blueprint permissions window

    6. Marketplace Item : Click the radio button for the desired marketplace permissions ( No Access , View marketplace and published blueprints , View marketplace and publish new blueprints , or Set custom permissions ). If you specify custom permissions, click the Change link to display the Custom Marketplace Item Permissions window, check all the permissions you want to enable, and then click the Save button.
      Note: The permission you enable for a Marketplace Item implicitly applies to a Catalog Item entity. For example, if you select No Access permission for the Marketplace Item entity while creating the custom role, the custom role will not have access to the Catalog Item entity as well.

      Figure. Custom Marketplace Permissions Window Click to enlarge displays the custom marketplace item permissions window

    7. Report : Click the radio button for the desired report permissions ( No Access , View Only , Edit Access , or Set Custom Permissions ). If you specify custom permissions, click the Change link to display the Custom Report Permissions window, check all the permissions you want to enable, and then click the Save button.
      Figure. Custom VM Permissions Window Click to enlarge displays the custom report permissions window

    8. Cluster : Click the radio button for the desired cluster permissions ( No Access or Cluster Access ).
    9. Subnet : Click the radio button for the desired subnet permissions ( No Access or Subnet Access ).
    10. Image : Click the radio button for the desired image permissions ( No Access , View Only , or Set Custom Permissions ). If you specify custom permissions, click the Change link to display the Custom Image Permissions window, check all the permissions you want to enable, and then click the Save button.
      Figure. Custom Image Permissions Window Click to enlarge displays the custom image permissions window

    11. OVA : Click the radio button for the desired OVA permissions ( No Access , View Only , Full Access or Set Custom Permissions ). If you specify custom permissions, click the Change link to display the Custom OVA Permissions window, check all the permissions you want to enable, and then click the Save button.
      Figure. Custom OVA Permissions Window Click to enlarge

    12. Object Store : Click the radio button for the desired object store permissions ( No Access , View Access , Full Access or Set Custom Permissions ). If you specify custom permissions, click the Change link to display the Custom Object Store Permissions window, check the Allow Object Store creation to allow creation of an object store, check all the permissions you want to enable, and then click the Save button.
      Note: For information about objects store permissions and workflows, see Role-Based Access Control Workflows for Objects in the Objects User Guide .
      Figure. Custom Object Store Permissions Click to enlarge

  3. Click Save to create the role. The page closes and the new role appears in the Roles view list.
Modifying a Custom Role

About this task

Perform the following procedure to modify or delete a custom role.

Procedure

  1. Go to the roles dashboard and select (check the box for) the desired role from the list.
  2. Do one of the following:
    • To modify the role, select Update Role from the Actions pull-down list. The Roles page for that role appears. Update the field values as desired and then click Save . See Creating a Custom Role for field descriptions.
    • To delete the role, select Delete from the Action pull-down list. A confirmation message is displayed. Click OK to delete and remove the role from the list.
Custom Role Permissions

A selection of permission options are available when creating a custom role.

The following table lists the permissions you can grant when creating or modifying a custom role. When you select an option for an entity, the permissions listed for that option are granted. If you select Set custom permissions , a complete list of available permissions for that entity appears. Select the desired permissions from that list.

Entity Option Permissions
App (application) No Access (none)
Basic Access Abort App Runlog, Access Console VM, Action Run App, Clone VM, Create AWS VM, Create Image, Create VM, Delete AWS VM, Delete VM, Download App Runlog, Update AWS VM, Update VM, View App, View AWS VM, View VM
Set Custom Permissions (select from list) Abort App Runlog, Access Console VM, Action Run App, Clone VM, Create App, Create AWS VM, Create Image, Create VM, Delete App, Delete AWS VM, Delete VM, Download App Runlog, Update App, Update AWS VM, Update VM, View App, View AWS VM, View VM
VM Recovery Point No Access (none)
View Only View VM Recovery Point
Full Access Delete VM Recovery Point, Restore VM Recovery Point, Snapshot VM, Update VM Recovery Point, View VM Recovery Point, Allow VM Recovery Point creation
Set Custom Permissions (Change) Abort App Runlog, Access Console VM, Action Run App, Clone VM, Create App, Create AWS VM, Create Image, Create VM, Delete App, Delete AWS VM, Delete VM, Download App Runlog, Update App, Update AWS VM, Update VM, View App, View AWS VM, View VM
Note:

You can assign permissions for the VM Recovery Point entity to users or user groups in the following two ways.

  • Manually assign permission for each VM where the recovery point is created.
  • Assign permission using Categories in the Role Assignment workflow.
Tip: When a recovery point is created, it is associated with the same category as the VM.
VM No Access (none)
View Access Access Console VM, View VM
Basic Access Access Console VM, Update VM Power State, View VM
Edit Access Access Console VM, Update VM, View Subnet, View VM
Full Access Access Console VM, Clone VM, Create VM, Delete VM, Export VM, Update VM, Update VM Boot Config, Update VM CPU, Update VM Categories, Update VM Description, Update VM Disk List, Update VM GPU List, Update VM Memory, Update VM NIC List, Update VM Owner, Update VM Power State, Update VM Project, View Cluster, View Subnet, View VM.
Set Custom Permissions (select from list) Access Console VM, Clone VM, Create VM, Delete VM, Update VM, Update VM Boot Config, Update VM CPU, Update VM Categories, Update VM Disk List, Update VM GPU List, Update VM Memory, Update VM NIC List, Update VM Owner, Update VM Power State, Update VM Project, View Cluster, View Subnet, View VM.

Granular permissions (applicable if IAM is enabled, see Granular Role-Based Access Control (RBAC)) for details.

Allow VM Power Off, Allow VM Power On, Allow VM Reboot, Allow VM Reset, Expand VM Disk Size, Mount VM CDROM, Unmount VM CDROM, Update VM Memory Overcommit, Update VM NGT Config, Update VM Power State Mechanism

Allow VM creation (additional option) (n/a)
Blueprint No Access (none)
View Access View Account, View AWS AZ, View AWS Elastic IP, View AWS Image, View AWS Key Pair, View AWS Machine Type, View AWS Region, View AWS Role, View AWS Security Group, View AWS Subnet, View AWS Volume Type, View AWS VPC, View Blueprint, View Cluster, View Image, View Project, View Subnet
Basic Access Access Console VM, Clone VM, Create App,Create Image, Create VM, Delete VM, Launch Blueprint, Update VM, View Account, View App, View AWS AZ, View AWS Elastic IP, View AWS Image, View AWS Key Pair, View AWS Machine Type, View AWS Region, View AWS Role, View AWS Security Group, View AWS Subnet, View AWS Volume Type, View AWS VPC, View Blueprint, View Cluster, View Image, View Project, View Subnet, View VM
Full Access Access Console VM, Clone Blueprint, Clone VM, Create App, Create Blueprint, Create Image, Create VM, Delete Blueprint, Delete VM, Download Blueprint, Export Blueprint, Import Blueprint, Launch Blueprint, Render Blueprint, Update Blueprint, Update VM, Upload Blueprint, View Account, View App, View AWS AZ, View AWS Elastic IP, View AWS Image, View AWS Key Pair, View AWS Machine Type, View AWS Region, View AWS Role, View AWS Security Group, View AWS Subnet, View AWS Volume Type, View AWS VPC, View Blueprint, View Cluster, View Image, View Project, View Subnet, View VM
Set Custom Permissions (select from list) Access Console VM, Clone VM, Create App, Create Blueprint, Create Image, Create VM, Delete Blueprint, Delete VM, Download Blueprint, Export Blueprint, Import Blueprint, Launch Blueprint, Render Blueprint, Update Blueprint, Update VM, Upload Blueprint, View Account, View App, View AWS AZ, View AWS Elastic IP, View AWS Image, View AWS Key Pair, View AWS Machine Type, View AWS Region, View AWS Role, View AWS Security Group, View AWS Subnet, View AWS Volume Type, View AWS VPC, View Blueprint, View Cluster, View Image, View Project, View Subnet, View VM
Marketplace Item No Access (none)
View marketplace and published blueprints View Marketplace Item
View marketplace and publish new blueprints Update Marketplace Item, View Marketplace Item
Full Access Config Marketplace Item, Create Marketplace Item, Delete Marketplace Item, Render Marketplace Item, Update Marketplace Item, View Marketplace Item
Set Custom Permissions (select from list) Config Marketplace Item, Create Marketplace Item, Delete Marketplace Item, Render Marketplace Item, Update Marketplace Item, View Marketplace Item
Report No Access (none)
View Only Notify Report Instance, View Common Report Config, View Report Config, View Report Instance
Full Access Create Common Report Config, Create Report Config, Create Report Instance, Delete Common Report Config, Delete Report Config, Delete Report Instance, Notify Report Instance, Run Report Config, Share Report Config, Share Report Instance, Update Common Report Config, Update Report Config, View Common Report Config, View Report Config, View Report Instance, View User, View User Group
Cluster No Access (none)
View Access View Cluster
Update Access Update Cluster
Full Access Update Cluster, View Cluster
VLAN Subnet No Access (none)
View Access View Subnet, View Virtual Switch
Edit Access Update Subnet, View Cluster, View Subnet, View Virtual Switch
Full Access Create Subnet, Delete Subnet, Update Subnet, View Cluster, View Subnet, View Virtual Switch
Image No Access (none)
View Only View Image
Set Custom Permissions (select from list) Copy Image Remote, Create Image, Delete Image, Migrate Image, Update Image, View Image
OVA No Access (none)
View Access View OVA
Full Access View OVA, Create OVA, Update OVA and Delete OVA
Set custom permissions Change View OVA, Create OVA, Update OVA and Delete OVA
Image Placement Policy No Access (none)
View Access View Image Placement Policy, View Name Category, View Value Category
Full Access Create Image Placement Policy, Delete Image Placement Policy, Update Image Placement Policy, View Image Placement Policy, View Name Category, View Value Category
Set Custom Permissions (select from list) Create Image Placement Policy, Delete Image Placement Policy, Update Image Placement Policy, View Image Placement Policy, View Name Category, View Value Category
File Server No Access (none)
Allow File Server creation
Note: The role has full access if you select Allow File Server creation .

The following table describe the permissions.

Note: By default, assigning certain permissions to a user role might implicitly assign more permissions to that role. However, the implicitly assigned permissions will not be displayed in the details page for that role. These permissions are displayed only if you manually assign them to that role.
Permission Description Assigned Implicilty By
Create App Allows to create an application.
Delete App Allows to delete an application.
View App Allows to view an application.
Action Run App Allows to run action on an application.
Download App Runlog Allows to download an application runlog.
Abort App Runlog Allows to abort an application runlog.
Access Console VM Allows to access the console of a virtual machine.
Create VM Allows to create a virtual machine.
View VM Allows to view a virtual machine.
Clone VM Allows to clone a virtual machine.
Delete VM Allows to delete a virtual machine.
Export VM Allows to export a virtual machine
Snapshot VM Allows to snapshot a virtual machine.
View VM Recovery Point Allows to view a vm_recovery_point.
Update VM Recovery Point Allows to update a vm_recovery_point.
Delete VM Recovery Point Allows to delete a vm_recovery_point.
Restore VM Recovery Point Allows to restore a vm_recovery_point.
Update VM Allows to update a virtual machine.
Update VM Boot Config Allows to update a virtual machine's boot configuration. Update VM
Update VM CPU Allows to update a virtual machine's CPU configuration. Update VM
Update VM Categories Allows to update a virtual machine's categories. Update VM
Update VM Description Allows to update a virtual machine's description. Update VM
Update VM GPU List Allows to update a virtual machine's GPUs. Update VM
Update VM NIC List Allows to update a virtual machine's NICs. Update VM
Update VM Owner Allows to update a virtual machine's owner. Update VM
Update VM Project Allows to update a virtual machine's project. Update VM
Update VM NGT Config Allows updates to a virtual machine's Nutanix Guest Tools configuration. Update VM
Update VM Power State Allows updates to a virtual machine's power state. Update VM
Update VM Disk List Allows to update a virtual machine's disks. Update VM
Update VM Memory Allows to update a virtual machine's memory configuration. Update VM
Update VM Power State Mechanism Allows updates to a virtual machine's power state mechanism. Update VM or Update VM Power State
Allow VM Power Off Allows power off and shutdown operations on a virtual machine. Update VM or Update VM Power State
Allow VM Power On Allows power on operation on a virtual machine. Update VM or Update VM Power State
Allow VM Reboot Allows reboot operation on a virtual machine. Update VM or Update VM Power State
Expand VM Disk Size Allows to expand a virtual machine's disk size. Update VM or Update VM Disk List
Mount VM CDROM Allows to mount an ISO to virtual machine's CDROM. Update VM or Update VM Disk List
Unmount VM CDROM Allows to unmount ISO from virtual machine's CDROM. Update VM or Update VM Disk List
Update VM Memory Overcommit Allows to update a virtual machine's memory overcommit configuration. Update VM or Update VM Memory
Allow VM Reset Allows reset (hard reboot) operation on a virtual machine. Update VM, Update VM Power State, or Allow VM Reboot
View Cluster Allows to view a cluster.
Update Cluster Allows to update a cluster.
Create Image Allows to create an image.
View Image Allows to view a image.
Copy Image Remote Allows to copy an image from local PC to remote PC.
Delete Image Allows to delete an image.
Migrate Image Allows to migrate an image from PE to PC.
Update Image Allows to update a image.
Create Image Placement Policy Allows to create an image placement policy.
View Image Placement Policy Allows to view an image placement policy.
Delete Image Placement Policy Allows to delete an image placement policy.
Update Image Placement Policy Allows to update an image placement policy.
Create AWS VM Allows to create an AWS virtual machine.
View AWS VM Allows to view an AWS virtual machine.
Update AWS VM Allows to update an AWS virtual machine.
Delete AWS VM Allows to delete an AWS virtual machine.
View AWS AZ Allows to view AWS Availability Zones.
View AWS Elastic IP Allows to view an AWS Elastic IP.
View AWS Image Allows to view an AWS image.
View AWS Key Pair Allows to view AWS keypairs.
View AWS Machine Type Allows to view AWS machine types.
View AWS Region Allows to view AWS regions.
View AWS Role Allows to view AWS roles.
View AWS Security Group Allows to view an AWS security group.
View AWS Subnet Allows to view an AWS subnet.
View AWS Volume Type Allows to view AWS volume types.
View AWS VPC Allows to view an AWS VPC.
Create Subnet Allows to create a subnet.
View Subnet Allows to view a subnet.
Update Subnet Allows to update a subnet.
Delete Subnet Allows to delete a subnet.
Create Blueprint Allows to create the blueprint of an application.
View Blueprint Allows to view the blueprint of an application.
Launch Blueprint Allows to launch the blueprint of an application.
Clone Blueprint Allows to clone the blueprint of an application.
Delete Blueprint Allows to delete the blueprint of an application.
Download Blueprint Allows to download the blueprint of an application.
Export Blueprint Allows to export the blueprint of an application.
Import Blueprint Allows to import the blueprint of an application.
Render Blueprint Allows to render the blueprint of an application.
Update Blueprint Allows to update the blueprint of an application.
Upload Blueprint Allows to upload the blueprint of an application.
Create OVA Allows to create an OVA.
View OVA Allows to view an OVA.
Update OVA Allows to update an OVA.
Delete OVA Allows to delete an OVA.
Create Marketplace Item Allows to create a marketplace item.
View Marketplace Item Allows to view a marketplace item.
Update Marketplace Item Allows to update a marketplace item.
Config Marketplace Item Allows to configure a marketplace item.
Render Marketplace Item Allows to render a marketplace item.
Delete Marketplace Item Allows to delete a marketplace item.
Create Report Config Allows to create a report_config.
View Report Config Allows to view a report_config.
Run Report Config Allows to run a report_config.
Share Report Config Allows to share a report_config.
Update Report Config Allows to update a report_config.
Delete Report Config Allows to delete a report_config.
Create Common Report Config Allows to create a common report_config.
View Common Report Config Allows to view a common report_config.
Update Common Report Config Allows to update a common report_config.
Delete Common Report Config Allows to delete a common report_config.
Create Report Instance Allows to create a report_instance.
View Report Instance Allows to view a report_instance.
Notify Report Instance Allows to notify a report_instance.
Notify Report Instance Allows to notify a report_instance.
Share Report Instance Allows to share a report_instance.
Delete Report Instance Allows to delete a report_instance.
View Account Allows to view an account.
View Project Allows to view a project.
View User Allows to view a user.
View User Group Allows to view a user group.
View Name Category Allows to view a category's name.
View Value Category Allows to view a category's value.
View Virtual Switch Allows to view a virtual switch.
Assigning a Role

About this task

In addition to configuring basic role maps (see Configuring Role Mapping), you can configure more precise role assignments (AHV only). To assign a role to selected users or groups that applies just to a specified set of entities, do the following:

Procedure

  1. Log on to Prism Central as "admin" user or any user with "super admin" access.
  2. Configure Active Directory settings.
    Note: You can skip this step if an active directory is already configured.
    Go to Prism Central Settings > Authentication , click + New Directory and add your preferred active directory.
  3. Click the hamburger menu and go to Administration > Roles .
    The page displays system defined and custom roles.
  4. Select the desired role in the roles dashboard, then click Actions > Manage Assignment .
  5. Click Add New to add Active Directory based users or user groups, or IDP users or user groups (or OUs) to this role.
    Figure. Role Assignment Click to enlarge role assignment view

    You are adding users or user groups and assigning entities to the new role in the next steps.

  6. In the Select Users or User Groups or OUs field, do the following:
    1. Select the configured AD or IDP from the drop-down.
      The drop-down displays a list of available types of user or user group such as Local User, AD based user or user groups, or SAML based user or user groups. Select Organizational Units or OU for AD or directories that use SAML based IDP for authentication.
      Figure. User, User Group or OU selection Click to enlarge Displaying the User, User Group or OU selection drop-down list.

    2. Search and add the users or groups in the Search User field.

      Typing few letters in the search field displays a list of users from which you can select, and you can add multiple user names in this field.

  7. In the Select Entities field, you can provide access to various entities. The list of available entities depends on the role selected in Step 4.

This table lists the available entities for each role:

Table 1. Available Entities for a Role
Role Entities
Consumer AHV VM, Image, Image Placement Policy, OVA, Subnets: VLAN
Developer AHV VM, Cluster, Image, Image Placement Policy, OVA, Subnets:VLAN
Operator AHV VM, Subnets:VLAN
Prism Admin Individual entity (one or more clusters), All Clusters
Prism Viewer Individual entity (one or more clusters), All Clusters
Custom role (User defined role) Individual entity, In Category (only AHV VMs)

This table shows the description of each entity:

Table 2. Description of Entities
Entity Description
AHV VM Allows you to manage VMs including create and edit permission
Image Allows you to access and manage image details
Image Placement Policy Allows you to access and manage image placement policy details
OVA Allows you to view and manage OVA details
Subnets: VLAN Allows you to view subnet details
Cluster Allows you to view and manage details of assigned clusters (AHV and ESXi clusters)
All Clusters Allows you to view and manage details of all clusters
VM Recovery Points Allows you to perform recovery operations with recovery points.
Recovery Plan (Single PC only)

Allows you to view, validate, and test recovery plans. Also allows you to clean up VMs created after recovery plan test.
Individual entity Allows you to view and manage individual entities such as AHV VM, Clusters, and Subnets:VLAN
  1. Repeat Step 5 and Step 6 for any combination of users/entities you want to define.
    Note: To allow users to create certain entities like a VM, you may also need to grant them access to related entities like clusters, networks, and images that the VM requires.
  2. Click Save .

Self-Service Restore

The self-service restore (also known as file-level restore) feature allows you to do a self-service data recovery from the Nutanix data protection recovery points with minimal intervention. You can perform self-service data recovery on both on-prem and Xi Cloud Services.

You must deploy NGT 2.0 or newer on guest VMs to enable self-service restore from Prism Central. For more information about enabling and mounting NGT, see Enabling and Mounting Nutanix Guest Tools in the Prism Web Console Guide . When you enable self-service restore and attach a disk by logging into the VM, you can recover files within the guest OS. If you fail to detach the disk from the VM, the disk is detached automatically from the VM after 24 hours.

Note:
  • You can enable self-service restore for a guest VM through a web interface or nCLI.
  • NGT performs the in-guest actions For more information about in-guest actions, see Nutanix Guest Tools in the Prism Web Console Guide .
  • Self-service restore supports only full snapshots generated from Asynchronous and NearSync replication schedules.
Self-Service Restore Requirements

The requirements of self-service restore of Windows and Linux VMs are as follows.

Self-Service Restore General Requirements

The following are the general requirements of self-service restore. Ensure that you meet the requirements before configuring self-service restore for guest VMs.

License Requirements

The AOS license required depends on the features that you want to use. For information about the AOS license required for self-service restore, see Software Options.

Hypervisor Requirements

AHV or ESXi

  • The AHV clusters must be running on AHV versions that come bundled with the supported version of AOS.
  • The ESXi clusters must be running on version ESXi 6.5 GA or newer.

Nutanix Software Requirements

Prism Centrals and their registered on-prem clusters (Prism Elements) must be running the following versions of AOS.

  • AOS 5.18 or newer with AHV.
  • AOS 5.18 or newer with ESXi.
  • AOS 5.19 or newer with Xi Cloud Services.
  • You have installed NGT 2.0 or newer. For more information about enabling and mounting NGT, see Enabling and Mounting Nutanix Guest Tools in the Prism Web Console Guide .
  • You have set disk.enableUUID=true in the .vmx file for the guest VMs running on ESXi.
  • You have configured Nutanix recovery points by adding guest VM to an Asynchronous protection policy.
  • You have attached an IDE/SCSI or SATA disk

Requirements for Guest VMs Running Windows OS

The following are the specific requirements of self-service restore for guest VMs running Windows OS. Ensure that you meet the requirements before proceeding.

  • You have enough logical drive letters to bring the disk online.
  • You have one of the following Windows OS as the guest OS.
    • Windows Server 2008 R2 or newer
    • Windows 7 through Windows 10

Requirements for Guest VMs Running Linux OS

The following are the specific requirements of self-service restore for guest VMs running Linux OS. Ensure that you meet the requirements before proceeding.

  • You have appropriate file systems to recover. Self-service restore supports only extended file systems (ext2, ext3, and ext4) and XFS file systems.
  • Logical Volume Manager (LVM) disks for which the volume group corresponds to only a single physical disk are mounted.
  • You have one of the following Linux OS as the guest OS.
    • CentOS 6.5 through 6.9 and 7.0 through 7.3
    • Red Hat Enterprise Linux (RHEL) 6.5 through 6.9 and 7.0 through 7.3
    • Oracle Linux 6.5 and 7.0
    • SUSE Linux Enterprise Server (SLES) 11 SP1 through 11 SP4 and 12 SP1 through 12 SP3
    • Ubuntu 14.04 for both AHV and ESXi
    • Ubuntu 16.10 for AHV only
Self-Service Restore Limitations

The limitations of self-service restore of Windows and Linux VMs are as follows.

Self-Service Restore General Limitations

The following are the general limitations of self-service restore.

  • Volume groups are not supported.
  • Snapshots created in AOS 4.5 or later releases are only supported.
  • PCI and delta disks are not supported.

Limitations of Guest VMs Running Windows OS

The following are the specific limitations of self-service restore for guest VMs running Windows OS.

  • File systems. Self-service restore does not support dynamic disks consisting of NTFS on simple volumes, spanned volumes, striped volumes, mirrored volumes, and RAID-5 volumes.
  • Only 64-bit OSes are supported.
  • Self-service restore does not support disks created as Microsoft Storage Space devices by using Microsoft Windows Server 2016 or newer.

Limitations of Guest VMs Running Linux OS

Whenever the snapshot disk has an inconsistent filesystem (as indicated by the fsck check), the disk is only attached and not mounted.

Enabling Self-Service Restore

After enabling NGT for a guest VM, you can enable the self-service restore for that guest VM. Also, you can enable the self-service restore for a guest VM while you are installing NGT on that guest VM.

Before you begin

For more information, see Enabling and Mounting Nutanix Guest Tools in the Prism Web Console Guide .

Ensure that you have installed and enabled NGT 2.0 or newer on the guest VM.

About this task

To enable self-service restore, perform the following procedure.

Procedure

  1. Log on to Prism Central.
  2. Click the hamburger icon at the top-left corner of the window. Go to Virtual Infrastructure > VMs in the left pane.
  3. Select the guest VM where you want to enable self-service restore.
  4. Click Manage NGT Applications from the Actions drop-down menu.
    Figure. Enabling Self-Service Restore Click to enlarge Enable Self -Service Restore

    Note: If the guest VM does not have NGT installed, click Install NGT from the Actions drop-down menu and select to enable Self Service Restore (SSR) .
  5. Click Enable below the Self Service Restore (SSR) panel.
  6. Click Confirm .
    Self-service restore feature is enabled on the guest VM. You can now restore the desired files from the guest VM.
Self-Service Restore for Windows VMs

You can restore the desired files from the guest VM through the web interface or by using the ngtcli utility of self-service restore.

Restoring a File through Web Interface (Windows VM)

After you install NGT in the Windows guest VM, you can restore the desired files from the VM through the web interface.

Before you begin

Ensure that you have configured your Windows VM to use NGT. For more information, see Installing NGT on Windows Machines in the Prism Web Console Guide .

About this task

To restore a file in Windows guest VMs by using web interface, perform the following.

Procedure

  1. Log in to the guest Windows VM by using administrator credentials.
  2. Click the Nutanix SSR icon on the desktop.
  3. Type the administrator credentials of the VM.
    Note: If you use:
    • NETBIOS domain name in username field (for example, domain\username ), then you will be able to log on to SSR only if your account is explicitly added to Administrators group on the server. If username is added to any domain group, which is then added to Administrators group, then logon will fail. Also, you must type NETBIOS domain name in capital letters (domain name has to be written in the same way as you see in the output of command net localgroup administrators ).
    • FQDN in username (for example domain.com\username ), then you will only be able to logon if username user is a member of the domain admins group.
    Note: The snapshots that are taken for that day are displayed. You also have an option to select the snapshots for the week, month, and the year. In addition, you can also define a custom range of dates and select the snapshot.
    Figure. Snapshot Selection Click to enlarge Select Snapshot

  4. Select the appropriate tab, This Week , This Month , This Year .
    You can also customize the selection by clicking Custom Range tab and selecting the date range in the From and To fields.
  5. Select the check box of the disks that you want to attach from the snapshot.
  6. Select Mount from the Disk Action drop-down menu.
    Figure. Mounting of Disks Click to enlarge disk mount

    The selected disk or disks are mounted and the relevant disk label is displayed.
  7. Go to the attached disk label drive in the VM and restore the desired files.
  8. To view the list of all the mounted snapshots, select Mounted Snapshots .
    This page displays the original snapshot drive letters and its corresponding current drive letters. The original drive letters get assigned to the disk at the time of the snapshot. Mounted drive letters are on which the snapshotted disk is mounted right now.
    Figure. List of Mounted Snapshots Click to enlarge mounted snapshots list

    1. To detach a disk, click the disk label and click Unmount .
      You can unmount all the disks at once by clicking Select All and then clicking Unmount .
  9. To detach a disk, select the check box of the disk that you want to unmount and then from the Disk Action drop-down menu, select Unmount .
Restoring a File through Ngtcli (Windows VM)

After you install NGT in the Windows guest VM, you can restore the desired files from the VM through the ngtcli utility.

Before you begin

Ensure that you have configured your Windows VM to use NGT. For more information, see Installing NGT on Windows Machines in the Prism Web Console Guide .

About this task

To restore a file in Windows guest VMs by using ngtcli, perform the following.

Procedure

  1. Log in to the guest Windows VM by using administrator credentials.
  2. Open the command prompt as an administrator.
  3. Go to the ngtcli directory in Program Files > Nutanix .
    > cd c:\Program Files\Nutanix\ngtcli
    Tip:
    > python ngtcli.py
    creates a terminal with auto-complete.
  4. Run the ngtcli.cmd command.
  5. List the snapshots and virtual disks that are present for the guest VM.
    ngtcli> ssr ls-snaps

    The snapshot ID, disk labels, logical drives, and create time of the snapshot is displayed. You can use this information and take a decision to restore the files from the relevant snapshot that has the data.

    List the snapshots with a specific number.
    ngtcli> ssr ls-snaps snapshot-count=count_value

    Replace count_value with the number that you want to list.

  6. Attach the disk from the snapshots.
    ngtcli> ssr attach-disk disk-label=disk_label snapshot-id=snap_id

    Replace disk_label with the name of the disk that you want to attach.

    Replace snap_id with the snapshot ID of the disk that you want to attach.

    For example, to attach a disk with snapshot ID 16353 and disk label scsi0:1, type the folllowing command.

    ngtcli> ssr attach-disk snapshot-id=16353 disk-label=scsi0:1
    After successfully running the command, a new disk with label gets attached to the guest VM.
    Note: If sufficient logical drive letters are not present, bringing disks online action fails. In this case, you should detach the current disk, create enough free slots by detaching other self-service disks and reattach the disk again.
  7. Go to the attached disk label drive and restore the desired files.
  8. Detach a disk.
    ngtcli> ssr detach-disk attached-disk-label=attached_disk_label

    Replace attached_disk_label with the name of the disk that you want to attach.

    Note: If the disk is not removed by the guest VM administrator, the disk is automatically removed after 24 hours.
  9. View all the attached disks to the VM.
    ngtcli> ssr list-attached-disks
Self-Service Restore for Linux VMs

The Linux guest VM user with sudo privileges can restore the desired files from the VM through the web interface or by using the ngtcli utility.

Restoring a File through Web Interface (Linux VM)

After you install NGT in the Linux guest VM, you can restore the desired files from the VM through the web interface.

Before you begin

About this task

To restore a file in Linux guest VMs by using web interface, perform the following.

Procedure

  1. Log in to the guest Linux VM as a user with sudo privileges.
  2. Click the Nutanix SSR icon on the desktop.
  3. Type the root or sudo user credentials of the VM.
    The snapshots that are taken for that day is displayed. You also have an option to select the snapshots for the week, month, and the year. In addition, you can also define a custom range of dates and select the snapshot. For example, in the following figure snapshot taken on this month is displayed.
    Figure. Snapshot Selection Click to enlarge Select Snapshot

  4. Select the appropriate tab, This Week , This Month , This Year .
    You can also customize the selection by clicking Custom Range tab and selecting the date range in the From and To fields.
  5. Select the check box of the disks that you want to attach from the snapshot.
  6. Select Mount from the Disk Action drop-down menu.

    The selected disk or disks are mounted and the relevant disk label is displayed.

    Figure. Mounting of Disks Click to enlarge

  7. Go to the attached disk label partitions in the VM and restore the desired files.
    Note: If the disk gets updated between the snapshots, the restore process may not work as expected. If this scenario occurs, you need to contact support to help with the restore process.
  8. To view the list of all the mounted snapshots, select Mounted Snapshots .
    This page displays the original snapshot drive letters and its corresponding current drive letters. The original drive letters gets assigned to the disk at the time of the snapshot. Mounted drive letters are on which the snapshotted disk is mounted right now.
    Figure. List of Mounted Snapshots Click to enlarge

    1. To detach a disk, click the disk label and click Unmount .
      You can unmount all the disks at once by clicking Select All and then clicking Unmount .
  9. To detach a disk, select the check box of the disk that you want to unmount and then from the Disk Action drop-down menu, select Unmount .
Restoring a File through Ngtcli (Linux VM)

After you install NGT in the Linux guest VM, you can restore the desired files from the VM through the ngtcli utility.

Before you begin

About this task

To restore a file in Linux guest VMs by using ngtcli, perform the following.

Procedure

  1. Log in to the guest Linux VM with sudo or root user credentials.
  2. Go to the ngtcli directory.
    > cd /usr/local/nutanix/ngt/ngtcli
  3. Run the python ngtcli.py command.
    Tip: This command creates a terminal with auto-complete.
  4. List the snapshots and virtual disks that are present for the guest VM.
    ngtcli> ssr ls-snaps

    The snapshot ID, disk labels, logical drives, and create time of the snapshot is displayed. You can use this information and take a decision to restore the files from the relevant snapshot that has the data.

    To list the snapshots with a specific number.
    ngtcli> ssr ls-snaps snapshot-count=count_value

    Replace count_value with the number that you want to list.

  5. Attach the disk from the snapshots.
    ngtcli> ssr attach-disk disk-label=disk_label snapshot-id=snap_id

    Replace disk_label with the name of the disk that you want to attach.

    Replace snap_id with the snapshot ID of the disk that you want to attach.

    For example, to attach a disk with snapshot ID 1343 and disk label scsi0:2,

    ngtcli> ssr attach-disk snapshot-id=1343 disk-label=scsi0:2

    After successfully running the command, a new disk with new label is attached to the guest VM.

  6. Go to the attached disk label partition and restore the desired files.
    Note: If the disk gets updated between the snapshots, the restore process may not work as expected. If this scenario occurs, you need to contact support to help with the restore process.
  7. Detach a disk.
    ngtcli> ssr detach-disk attached-disk-label=attached_disk_label

    Replace attached_disk_label with the name of the disk that you want to attach.

    For example, to remove the disk with disk label scsi0:3, type the following command.

    ngtcli> ssr detach-disk attached-disk-label=scsi0:3
    Note: If the disk is not removed by the guest VM administrator, the disk is automatically removed after 24 hours.
  8. View all the attached disks to the VM.
    ngtcli> ssr list-attached-disks

Protection with NearSync Replication Schedule and DR (Nutanix Disaster Recovery)

NearSync replication enables you to protect your guest VMs with an RPO of as low as 1 minute. A protection policy with a NearSync replication creates a recovery point in a minutely time interval (between 1–15 minutes), and replicates it to the recovery AZs (AZs) for High Availability. For guest VMs protected with NearSync replication schedule, you can perform disaster recovery (DR) to a different Nutanix cluster at same or different AZs. In addition to DR to Nutanix clusters of the same hypervisor type, you can also perform cross-hypervisor disaster recovery (CHDR)—disaster recovery from AHV clusters to ESXi clusters, or from ESXi clusters to AHV clusters.

Note: Nutanix provides multiple DR solutions to secure your environment. See Nutanix Disaster Recovery Solutions for the detailed representation of the DR offerings of Nutanix.

The following are the advantages of protecting your guest VMs with a NearSync replication schedule.

  • Protection for the mission-critical applications. Securing your data with minimal data loss if there is a disaster, and providing you with more granular control during the recovery process.
  • No minimum network latency or distance requirements.
  • Low stun time for guest VMs with heavy I/O applications.

    Stun time is the time of application freeze when the recovery point is taken.

  • Allows resolution to a disaster event in minutes.

To implement the NearSync feature, Nutanix has introduced a technology called lightweight snapshots (LWSs). LWS recovery points are created at the metadata level only, and they continuously replicate incoming data generated by workloads running on the active clusters. LWS recovery points are stored in the LWS store, which is allocated on the SSD tier. When you configure a protection policy with a NearSync replication schedule, the system allocates the LWS store automatically.

Note: The maximum LWS store allocation for each node is 360 GB. For the hybrid systems, it is 7% of the SSD capacity on that node.

Transitioning in and out of NearSync

When you create a NearSync replication schedule, the schedule remains an hourly schedule until its transition into a minutely schedule is complete.

To transition into NearSync (minutely) replication schedule, initial seeding of the recovery AZ with the data is performed, the recovery points are taken on an hourly basis, and replicated to the recovery AZ. After the system determines that the recovery points containing the seeding data have replicated within a specified amount of time (default is an hour), the system automatically transitions the replication schedule into NearSync schedule depending on the bandwidth and the change rate. After you transition into the NearSync replication schedule, you can see the configured minutely recovery points in the web interface.

The following are the characteristics of the process.

  • Until you are transitioned into NearSync replication schedule, you can see only the hourly recovery points in Prism Central.
  • If for any reason, a guest VM transitions out of NearSync replication schedule, the system raises alerts in the Alerts dashboard, and the minutely replication schedule transitions out to the hourly replication schedule. The system continuously tries to get into the minutely replication schedule that you have configured. If the transition is successful, the replication schedule automatically transitions back into NearSync, and alerts specific to this condition are raised in the Alerts dashboard.

To transition out of the NearSync replication schedule, you can do one of the following.

  • Delete the NearSync replication schedule that you have configured.
  • Update the NearSync replication schedule to use an hourly RPO.
  • Unprotect the guest VMs.
    Note: There is no transitioning out of the NearSync replication schedule on the addition or deletion of a guest VM.

Repeated transitioning in and out of NearSync replication schedule can occur because of the following reasons.

  • LWS store usage is high.
  • The change rate of data is high for the available bandwidth between the primary and the recovery AZs.
  • Internal processing of LWS recovery points is taking more time because the system is overloaded.

Retention Policy

Depending on the RPO (1–15 minutes), the system retains the recovery points for a specific time period. For a NearSync replication schedule, you can configure the retention policy for days, weeks, or months on both the primary and recovery AZs instead of defining the number of recovery points you want to retain. For example, if you desire an RPO of 1 minute and want to retain the recovery points for 5 days, the retention policy works in the following way.

  • For every 1 minute, a recovery point is created and retained for a maximum of 15 minutes.
    Note: The recent 15 recovery points are only visible in Prism Central and are available for the recovery operation.
  • For every hour, a recovery point is created and retained for 6 hours.
  • One daily recovery point is created and retained for 5 days.

You can also define recovery point retention in weeks or months. For example, if you configure a 3-month schedule, the retention policy works in the following way.

  • For every 1 minute, a recovery point is created and retained for 15 minutes.
  • For every hour, a recovery point is created and retained for 6 hours.
  • One daily recovery point is created and retained for 7 days.
  • One weekly recovery point is created and retained for 4 weeks.
  • One monthly recovery point is created and retained for 3 months.
Note:
  • You can define different retention policies on the primary and recovery AZs.
  • The system retains subhourly and hourly recovery points for 15 minutes and 6 hours respectively. Maximum retention time for days, weeks, and months is 7 days, 4 weeks, and 12 months respectively.
  • If you change the replication schedule from an hourly schedule to a minutely schedule (Asynchronous to NearSync), the first recovery point is not created according to the new schedule. The recovery points are created according to the start time of the old hourly schedule (Asynchronous). If you want to get the maximum retention for the first recovery point after modifying the schedule, update the start time accordingly for NearSync.

NearSync Replication Requirements (Nutanix Disaster Recovery)

The following are the specific requirements for protecting your guest VMs with NearSync replication schedule. Ensure that you meet the following requirements in addition to the general requirements of Nutanix Disaster Recovery .

For more information about the general requirements of Nutanix Disaster Recovery , see Nutanix Disaster Recovery Requirements.

For information about node, disk and Foundation configurations required to support NearSync replication schedules, see On-Prem Hardware Resource Requirements.

Hypervisor Requirements

AHV or ESXi

  • The AHV clusters must be running on version 20190916.189 or newer.
  • The ESXi clusters must be running on version ESXi 6.5 GA or newer.

Nutanix Software Requirements

Each on-prem AZ must have a Leap enabled Prism Central instance.

The primary and recovery Prism Centrals and their registered Nutanix clusters must be running the following versions of AOS.

  • AOS 5.17.1 or newer for DR to different Nutanix clusters at the same AZ.
  • AOS 5.17 or newer for DR to Nutanix clusters at the different AZs.

Cross Hypervisor Disaster Recovery (CHDR) Requirements

Guest VMs protected with NearSync replication schedule support cross-hypervisor disaster recovery. You can perform failover (DR) to recover guest VMs from AHV clusters to ESXi clusters or guest VMs from ESXi clusters to AHV clusters by considering the following requirements.

  • Both the primary and the recovery Nutanix clusters must be running AOS 5.18 or newer.
  • Install and configure Nutanix Guest Tools (NGT) on all the guest VMs. For more information, see Enabling and Mounting Nutanix Guest Tools in Prism Web Console Guide .

    NGT configures the guest VMs with all the required drivers for VM portability. For more information about general NGT requirements, see Nutanix Guest Tools Requirements and Limitations in Prism Web Console Guide .

  • CHDR supports guest VMs with flat files only.
  • CHDR supports IDE/SCSI disks only.
    Tip: From AOS 5.19.1, CHDR supports SATA disks also.
  • For all the non-boot SCSI disks of Windows guest VMs, set the SAN policy to OnlineAll so that they come online automatically.
  • In vSphere 6.7, guest VMs are configured with UEFI secure boot by default. Upon CHDR to an AHV cluster, these guest VMs do not start if the host does not support the UEFI secure boot feature. For more information about supportability of UEFI secure boot on Nutanix clusters, see the Compatibility Matrix.

  • For information about operating systems that support UEFI and Secure Boot, see UEFI and Secure Boot Support for CHDR.

  • Nutanix does not support vSphere inventory mapping (for example, VM folder and resource pools) when protecting workloads between VMware clusters.

  • Nutanix does not support vSphere snapshots or delta disk files. If you have delta disks attached to a VM and you proceed with failover, you get a validation warning and the VM does not recover. Contact Nutanix Support for assistance.
Table 1. Operating Systems Supported for CHDR (Asynchronous Replication)
Operating System Version Requirements and limitations
Windows
  • Windows 2008 R2 or newer versions
  • Windows 7 or newer versions
  • Only 64-bit operating systems are supported.
Linux
  • CentOS 6.5 and 7.0
  • RHEL 6.5 or newer and RHEL 7.0 or newer.
  • Oracle Linux 6.5 and 7.0
  • Ubuntu 14.04
  • SLES operating system is not supported.

Additional Requirements

  • Both the primary and the recovery Nutanix clusters must be of minimum three-nodes.
  • The recovery AZ container must have as much space as the protected VMs working size set of the primary AZ. For example, if you are protecting a VM that is using 30 GB of space on the container of the primary AZ, the same amount of space is required on the recovery AZ container.

NearSync Replication Limitations (Nutanix Disaster Recovery)

Consider the following specific limitations before protecting your guest VMs with NearSync replication schedule. These limitations are in addition to the general limitations of Nutanix Disaster Recovery .

For information about the general limitations of Nutanix Disaster Recovery , see Nutanix Disaster Recovery Limitations.

  • All files associated with the VMs running on ESXi must be located in the same folder as the VMX configuration file. The files not located in the same folder as the VMX configuration file might not recover on a recovery cluster. On recovery, the guest VM with such files fails to start with the following error message. Operation failed: InternalTaskCreationFailure: Error creating host specific VM change power state task. Error: NoCompatibleHost: No host is compatible with the virtual machine
  • Deduplication enabled on storage containers having guest VMs protected with NearSync replication schedule lowers the replication speed.
  • Cross hypervisor disaster recovery (CHDR) does not preserve hypervisor-specific properties (for example, multi-writer flags, independent persistent and non-persistent disks, changed block tracking (CBT), PVSCSI disk configurations).

  • On CHDR, NearSync replication schedules do not support retrieving recovery points from the recovery AZs.

    For example, if you have 1 day retention at the primary AZ and 5 days retention at the recovery AZ, and you want to go back to a recovery point from 5 days ago. NearSync replication schedule does not support replicating 5 days retention back from the recovery AZ to the primary AZ.

Creating a Protection Policy with a NearSync Replication Schedule (Nutanix Disaster Recovery)

To protect the guest VMs in a minutely replication schedule, configure a NearSync replication schedule while creating the protection policy. The policy takes recovery points of the protected guest VMs in the specified time intervals (1–15 minutes) and replicates them to the recovery AZ (AZ) for High Availability. To maintain the efficiency of minutely replication, the protection policy allows you to configure a NearSync replication schedule to only one recovery AZ. When creating a protection policy, you can specify only VM categories. If you want to include VMs individually, you must first create the protection p policy—which can also include VM categories and then include the VMs individually in the protection policy from the VMs page.

Before you begin

Ensure that the primary and the recovery AHV or ESXi clusters at the same or different AZs are NearSync capable. A cluster is NearSync capable if the capacity of each SSD in the cluster is at least 1.2 TB.

See NearSync Replication Requirements (Nutanix Disaster Recovery) and NearSync Replication Limitations (Nutanix Disaster Recovery) before you start.

About this task

To create a protection policy with a NearSync replication schedule, do the following at the primary AZ. You can also create a protection policy at the recovery AZ. Protection policies you create or update at a recovery AZ synchronize back to the primary AZ.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Policies > Protection Policies in the left pane.
    Figure. Protection Policy Configuration: Protection Policies
    Click to enlarge Protection Policy Configuration: Protection Policies

  3. Click Create Protection Policy .
    Specify the following information in the Create Protection Policy window.
    Figure. Protection Policy Configuration: Select Primary Location
    Click to enlarge Protection Policy Configuration: Select Primary Location

    1. Policy name : Enter a name for the protection policy.
      Caution: The name can be of only alphanumeric, dot, dash, and underscore characters.
    2. In the Primary Location pane, specify the following information.
        1. Location : From the drop-down list, select an AZ (AZ) that hosts the guest VMs to protect.

          The drop-down lists all the AZs paired with the local AZ. Local AZ represents the local AZ (Prism Central). For your primary AZ, you can check either the local AZ or a non-local AZ.

        2. Cluster : From the drop-down list, select the cluster that hosts the guest VMs to protect.

          The drop-down lists all the Nutanix clusters registered to Prism Central representing the selected AZ. If you want to protect the guest VMs from multiple Nutanix clusters in the same protection policy, select the clusters that host those guest VMs. All Clusters protects the guest VMs of all Nutanix clusters registered to Prism Central.

        3. Click Save .

          Clicking Save activates the Recovery Location pane. After saving the primary AZ configuration, you can optionally add a local schedule to retain the recovery points at the primary AZ.

        4. Click + Add Local Schedule if you want to retain recovery points locally in addition to retaining recovery points in a replication schedule (step d.iv). For example, you can create a local schedule to retain 15 minute recovery points locally and also an hourly replication schedule to retain recovery points and replicate them to a recovery AZ every 2 hours. The two schedules apply differently on the guest VMs.

          Specify the following information in the Add Schedule window.

          Figure. Protection Policy Configuration: Add Local Schedule
          Click to enlarge Protection Policy Configuration: Add Local Schedule

          1. Take Snapshot Every : Specify the frequency in minutes , hours , days , or weeks at which you want the recovery points to be taken locally.
          2. Retention Type : Specify one of the following two types of retention policy.
            • Linear : Implements a simple retention scheme at the local AZ. If you set the retention number to n, the local AZ retains the n recent recovery points.

              When you enter the frequency in minutes, the system selects the Roll-up retention type by default because minutely recovery points do not support Linear retention types.

            • Roll-up : Rolls up the recovery points into a single recovery point at the local AZ.

              For more information about the roll-up recovery points, see step d.iii.

          3. Retention on Local AZ:PE_A3_AHV : Specify the retention number for the local AZ.
          4. If you want to take application-consistent recovery points, check Take App-Consistent Recovery Point .

            Irrespective of the local or replication schedules, the recovery points are of the specified type. If you check Take App-Consistent Recovery Point , the recovery points generated are application-consistent and if you do not check Take App-Consistent Recovery Point , the recovery points generated are crash-consistent. If the time in the local schedule and the replication schedule match, the single recovery point generated is application-consistent.

            Note: See Application-consistent Recovery Point Conditions and Limitations before you take application-consistent snapshot.
          5. Click Save Schedule .
    3. In the Recovery Location pane, specify the following information.
      Figure. Protection Policy Configuration: Select Recovery Location
      Click to enlarge Protection Policy Configuration: Select Recovery Location

        1. Location : From the drop-down list, select the AZ (AZ) where you want to replicate the recovery points.

          The drop-down lists all the AZs paired with the local AZ. Local AZ represents the local AZ (Prism Central). Select Local AZ if you want to configure DR to a different Nutanix cluster at the same AZ.

          If you do not select a AZ, local recovery points that are created by the protection policy do not replicate automatically. You can, however, replicate the recovery points manually and use recovery plans to recover the guest VMs. For more information, see Protection and Manual DR (Nutanix Disaster Recovery).

        2. Cluster : From the drop-down list, select the cluster where you want to replicate the recovery points.

          The drop-down lists all the Nutanix clusters registered to Prism Central representing the selected AZ. You can select one cluster at the recovery AZ. To maintain the efficiency of minutely replication, a protection policy allows you to configure only one recovery AZ for a NearSync replication schedule. However, you can add another Asynchronous replication schedule for replicating recovery points to the same or different AZs. For more information to add another recovery AZ with a replication schedule, see step e.

          Note: Selecting auto-select from the drop-down list replicates the recovery points to any available cluster at the recovery AZ. Select auto-select from the drop-down list only if all the clusters at the recovery AZ are NearSync capable and are up and running. A cluster is NearSync capable if the capacity of each SSD in the cluster is at least 1.2 TB. All-flash clusters do not have any specific SSD sizing requirements.

          Caution: If the primary Nutanix cluster contains an IBM POWER Systems server, you can replicate recovery points to an on-prem AZ only if that on-prem AZ contains an IBM Power Systems server.
        3. Click Save .

          Clicking Save activates the + Add Schedule button between the primary and the recovery AZ. After saving the recovery AZ configuration, you can optionally add a local schedule to retain the recovery points at the recovery AZ.

        4. Click + Add Local Schedule if you want to retain recovery points locally in addition to retaining recovery points in a replication schedule (step d.iv). For example, you can create a local schedule to retain one hourly recovery points locally to supplement the hourly replication schedule. The two schedules apply differently on the guest VMs after failover, when the recovery points replicate back to the primary AZ.

          Specify the following information in the Add Schedule window.

          Figure. Protection Policy Configuration: Add Local Schedule
          Click to enlarge Protection Policy Configuration: Add Local Schedule

          1. Take Snapshot Every : Specify the frequency in minutes , hours , days , or weeks at which you want the recovery points to be taken locally.
          2. Retention Type : Specify one of the following two types of retention policy.
            • Linear : Implements a simple retention scheme at the local AZ. If you set the retention number to n, the local AZ retains the n recent recovery points.

              When you enter the frequency in minutes, the system selects the Roll-up retention type by default because minutely recovery points do not support Linear retention types.

            • Roll-up : Rolls up the recovery points into a single recovery point at the local AZ.

              For more information about the roll-up recovery points, see step d.iii.

          3. Retention on 10.xx.xx.xxx:PE_C1_AHV : Specify the retention number for the local AZ.
          4. If you want to take application-consistent recovery points, check Take App-Consistent Recovery Point .

            Irrespective of the local or replication schedules, the recovery points are of the specified type. If you check Take App-Consistent Recovery Point , the recovery points generated are application-consistent and if you do not check Take App-Consistent Recovery Point , the recovery points generated are crash-consistent. If the time in the local schedule and the replication schedule match, the single recovery point generated is application-consistent.

          5. Click Save Schedule .
    4. Click + Add Schedule to add a replication schedule between the primary and the recovery AZ.

      Specify the following information in the Add Schedule window. The window auto-populates the Primary Location and Recovery Location that you have selected in step b and step c.

      Figure. Protection Policy Configuration: Add Schedule (NearSync)
      Click to enlarge Protection Policy Configuration: Add Schedule (NearSync)

        1. Protection Type : Click Asynchronous .
        2. Take Snapshot Every : Specify the frequency in minutes (anywhere between 1-15 minutes) at which you want the recovery points to be taken.

          The specified frequency is the RPO. For more information about RPO, see Nutanix Disaster Recovery Terminology.

        3. Retention Type : When you enter the frequency in minutes in step ii, the system selects the Roll-up retention type by default because NearSync replication schedules do not support Linear retention types.
          Roll-up retention type rolls up the recovery points as per the RPO and retention period into a single recovery point at a AZ. For example, if you set the RPO to 1 hour, and the retention time to 5 days, the 24 oldest hourly recovery points roll up into a single daily recovery point (one recovery point = 24 hourly recovery points) after every 24 hours. The system keeps one day (of rolled-up hourly recovery points) and 4 days of daily recovery points.
          Note:
          • If the retention period is n days, the system keeps 1 day of RPO (rolled-up hourly recovery points) and n-1 days of daily recovery points.
          • If the retention period is n weeks, the system keeps 1 day of RPO, 1 week of daily and n-1 weeks of weekly recovery points.
          • If the retention period is n months, the system keeps 1 day of RPO, 1 week of daily, 1 month of weekly, and n-1 months of monthly recovery points.
          • If the retention period is n years, the system keeps 1 day of RPO, 1 week of daily, 1 month of weekly, and n-1 months of monthly recovery points.
          Note: The recovery points that are used to create a rolled-up recovery point are discarded.
          Tip: Use roll-up retention policies for anything with a longer retention period. Roll-up policies are more flexible and automatically handle recovery point aging/pruning while still providing granular RPOs for the first day.
        4. To specify the retention number for the primary and recovery AZs, do the following.
          • Retention on Local AZ: PE_A3_AHV : Specify the retention number for the primary AZ.

            This field is unavailable if you do not specify a recovery location.

          • Retention on 10.xx.xx.xxx:PE_C1_AHV : Specify the retention number for the recovery AZ.
        5. If you want to enable reverse retention of the recovery points, check Reverse retention for VMs on recovery location .

          Reverse retention maintains the retention numbers of recovery points even after failover to a recovery AZ in the same or different AZs. For example, if you retain two recovery points at the primary AZ and three recovery points at the recovery AZ, and you enable reverse retention, a failover event does not change the initial retention numbers when the recovery points replicate back to the primary AZ. The recovery AZ still retains two recovery points while the primary AZ retains three recovery points. If you do not enable reverse retention, a failover event changes the initial retention numbers when the recovery points replicate back to the primary AZ. The recovery AZ retains three recovery points while the primary AZ retains two recovery points.

          Maintaining the same retention numbers at a recovery AZ is required if you want to retain a particular number of recovery points, irrespective of where the guest VM is after its failover.

        6. If you want to take application-consistent recovery points, check Take App-Consistent Recovery Point .

          Application-consistent recovery points ensure that application consistency is maintained in the replicated recovery points. For application-consistent recovery points, install NGT on the guest VMs running on AHV clusters. For guest VMs running on ESXi clusters, you can take application-consistent recovery points without installing NGT, but the recovery points are hypervisor-based, and leads to VM stuns (temporary unresponsive VMs) after failover to the recovery AZs.

          Caution: Application-consistent recovery points fail for EFI-boot enabled Windows 2019 VM running on ESXi when NGT is not installed. Nutanix recommends installing NGT on guest VMs running on ESXi also.
        7. Click Save Schedule .
    5. Click + Add Recovery Location if you want to add an additional recovery AZ for the guest VMs in the protection policy.
      • To add an on-prem AZ for recovery, see Protection and DR between On-Prem AZs (Nutanix Disaster Recovery)
      • To add Xi Cloud Services for recovery, see Protection and DR between On-Prem AZ and Xi Cloud Service (Xi Leap).
      Figure. Protection Policy Configuration: Additional Recovery Location Click to enlarge Protection Policy Configuration: Additional Recovery Location

    6. Click + Add Schedule to add a replication schedule between the primary AZ and the additional recovery AZ you specified in step e.

      The Add Schedule window shows that auto-populates the Primary Location and the additional Recovery Location . Perform step d again to add the replication schedule.

      By default, recovery point creation begins immediately after you create the protection policy. If you want to specify when recovery point creation must begin, click Immediately at the top-right corner, and then, in the Start Time dialog box, do the following.

        1. Click Start protection at specific point in time .
        2. Specify the time at which you want to start taking recovery points.
        3. Click Save .
    7. Click Next .
      Clicking Next shows a list of VM categories where you can optionally check one or more VM categories to protect in the protection policy. DR configurations using Leap allows you to protect a guest VM by using only one protection policy. Therefore, VM categories specified in another protection policy are not in the list. If you protect a guest VM in another protection policy by specifying the VM category of the guest VM (category-based inclusion), and if you protect the guest VM from the VMs page in this policy (individual inclusion), the individual inclusion supersedes the category-based inclusion. Effectively, only the protection policy that protected the individual guest VM protects the guest VM.

      For example, the guest VM VM_SherlockH is in the category Department:Admin , and you add this category to the protection policy named PP_AdminVMs . Now, if you add VM_SherlockH from the VMs page to another protection policy named PP_VMs_UK , VM_SherlockH is protected in PP_VMs_UK and unprotected from PP_AdminVMs .

    8. If you want to protect the guest VMs category wise, check the VM categories that you want to protect from the list and click Add .
      Figure. Protection Policy Configuration: Add VM Categories Click to enlarge Protection Policy Configuration: Add VM Categories

      Prism Central includes built-in VM categories for frequently encountered applications (for example, MS Exchange and Oracle). If the VM category or value you want is not available, first create the category with the required values, or update an existing category so that it has the values you require. Doing so ensures that the VM categories and values are available for selection. You can add VMs to the category either before or after you configure the protection policy. If the guest VMs have a common characteristic, such as belonging to a specific application or location, create a VM category and add the guest VMs into the category.

      If you do not want to protect the guest VMs category wise, proceed to the next step without checking VM categories. You can add the guest VMs individually to the protection policy later from the VMs page (see Adding Guest VMs Individually to a Protection Policy).

    9. Click Create .
      The protection policy with a NearSync replication schedule is created. To verify the protection policy, see the Protection Policies page. If you check VM categories in step h, the protection policy starts generating recovery points of the guest VMs in those VM categories. To see the generated recovery points, click the hamburger icon at the top-left corner of the window and go to VM Recovery Points . Click the recovery points for its information. You can see the time estimated for the very first replication (seeding) to the recovery AZs.
      Figure. Recovery Points Overview Click to enlarge Recovery Points Overview

    Tip: DR using Leap with a NearSync replication schedule also allows you to recover the data of the minute just before the unplanned failover. For example, with a 10 minutely protection policy, you can use the internal lightweight snapshots (LWS) to recover the data of the ninth minute when there is an unplanned failover.

Creating a Recovery Plan (Nutanix Disaster Recovery)

To orchestrate the failover of the protected guest VMs to the recovery AZ, create a recovery plan. After a failover, a recovery plan recovers the protected guest VMs to the recovery AZ. If you have configured two recovery AZs in a protection policy, create two recovery plans for DR—one for recovery to each recovery AZ. The recovery plan synchronizes continuously to the recovery AZ in a bidirectional way.

For more information about creating a recovery plan, see Creating a Recovery Plan (Nutanix Disaster Recovery).

Failover and Failback Operations (Nutanix Disaster Recovery)

You can perform test failover, planned failover, and unplanned failover of the guest VMs protected with NearSync replication schedule across different Nutanix clusters at the same or different on-prem AZ (AZ). The steps to perform test, planned, and unplanned failover are largely the same irrespective of the replication schedules that protect the guest VMs.

Refer Failover and Failback Management for test, planned, and unplanned failover procedures.

Protection with Synchronous Replication Schedule (0 RPO) and DR

Synchronous replication enables you to protect your guest VMs with a zero recovery point objective (0 RPO). A protection policy with Synchronous replication schedule replicates all the writes on the protected guest VMs synchronously to the recovery AZ (AZs) for High Availability. The policy also takes recovery points of those protected VMs every 6 hours—the first snapshot is taken immediately—for raw node (HDD+SSD) size up to 120 TB. Since the replication is synchronous, the recovery points are crash-consistent only. For guest VMs (AHV) protected with Synchronous replication schedule, you can perform DR only to an AHV cluster at the same or different AZ. Replicating writes synchronously and also generating recovery points helps to eliminate data losses due to:

  • Unplanned failure events (for example, natural disasters and network failure).
  • Planned failover events (for example, scheduled maintenance).

Nutanix recommends that the round-trip latency (RTT) between AHV clusters be less than 5 ms for optimal performance of Synchronous replication schedules. Maintain adequate bandwidth to accommodate peak writes and have a redundant physical network between the clusters.

To perform the replications synchronously yet efficiently, the protection policy limits you to configure only one recovery AZ if you add a Synchronous replication schedule. If you configure Synchronous replication schedule for a guest VM, you cannot add an Asynchronous or NearSync schedule to the same guest VM. Similarly, if you configure an Asynchronous or a NearSync replication schedule, you cannot add a Synchronous schedule to the same guest VM.

If you unpair the AZs while the guest VMs in the Nutanix clusters are still in synchronization, the Nutanix cluster becomes unstable. Therefore, disable Synchronous replication and clear stale stretch parameters if any on both the primary and recovery Prism Element before unpairing the AZs. For more information about disabling Synchronous replication, see Synchronous Replication Management.

Note: Nutanix provides multiple disaster recovery (DR) solutions to secure your environment. See Nutanix Disaster Recovery Solutions for the detailed representation of the DR offerings of Nutanix.

Synchronous Replication Requirements

The following are the specific requirements for protecting your AHV guest VMs with Synchronous replication schedule. Ensure that you meet the following requirements in addition to the general requirements of Leap.

For information about the general requirements of Nutanix Disaster Recovery , see Nutanix Disaster Recovery Requirements.

For information about node, disk and Foundation configurations required to support Synchronous replication schedules, see On-Prem Hardware Resource Requirements.

Hypervisor Requirements

AHV

The AHV clusters must be running on version 20190916.189 or newer.

Note: Synchronous replication schedules support only AHV.

Nutanix Software Requirements

  • Each on-prem availability zone (AZ) must have a Leap enabled Prism Central instance.

    The primary and recovery Nutanix Clusters can be registered with a single Prism Central instance or each can be registered with different Prism Central instances.

  • The primary and recovery Prism Central and Prism Element on the registered Nutanix clusters must be running on the same AOS version.
    • AOS 5.17 or newer.
    • AOS 5.17.1 or newer to support Synchronous replications of UEFI secure boot enabled guest VMs.

    • AOS 5.19.2 or newer for DR to an AHV cluster in the same AZ (registered to the same Prism Central). For DR to an AHV cluster in the same AZ, Prism Central must be running version 2021.3 or newer.

Additional Requirements

  • For optimal performance, maintain the round trip latency (RTT) between Nutanix clusters to less than 5 ms. Also, maintain adequate bandwidth to accommodate peak writes and have a redundant physical network between the clusters.
  • The storage container name of the protected guest VMs must be the same on both the primary and recovery clusters. Therefore, a storage container must exist on the recovery cluster with the same name as the one on the primary cluster. For example, if the protected guest VMs are in the SelfServiceContainer storage container on the primary cluster, there must also be a SelfServiceContainer storage container on the recovery cluster.
  • For hardware and Foundation configurations required to support Synchronous replication schedules, see On-Prem Hardware Resource Requirements.

  • The clusters on the primary AZ and the recovery AZ communicate over the ports 2030, 2036, 2073, and 2090. Ensure that these ports have open access between both the primary and the recovery clusters (Prism Element). For the complete list of required ports, see Port Reference.
  • If the primary and the recovery clusters (Prism Element) are in different subnets, open the ports manually for communication.
    Tip: If the primary and the recovery clusters (Prism Element) are in the same subnet, you need not open the ports manually.
    • To open the ports for communication to the recovery cluster, run the following command on all CVMs of the primary cluster.

      nutanix@cvm$ allssh 'modify_firewall -f -r remote_cvm_ip,remote_virtual_ip -p 2030,2036,2073,2090 -i eth0'

      Replace remote_cvm_ip with the IP address of the recovery cluster CVM. If there are multiple CVMs, replace remote_cvm_ip with the IP addresses of the CVMs separated by comma.

      Replace remote_virtual_ip with the virtual IP address of the recovery cluster.

    • To open the ports for communication to the primary cluster, run the following command on all CVMs of the recovery cluster.

      nutanix@cvm$ allssh 'modify_firewall -f -r source_cvm_ip,source_virtual_ip -p 2030,2036,2073,2090 -i eth0'

      Replace source_cvm_ip with the IP address of the primary cluster CVM. If there are multiple CVMs, replace source_cvm_ip with the IP addresses of the CVMs separated by comma.

      Replace source_virtual_ip with the virtual IP address of the primary cluster.

    Note: Use the eth0 interface only. eth0 is the default CVM interface that shows up when you install AOS.

Synchronous Replication Limitations

Consider the following specific limitations before protecting your guest VMs with Synchronous replication schedule. These limitations are in addition to the general limitations of Nutanix Disaster Recovery .

For information about the general limitations of Leap, see Nutanix Disaster Recovery Limitations.

  • You cannot restore guest VMs with incompatible GPUs at the recovery cluster.
  • You cannot protect guest VMs configured as part of a network function chain.
  • You cannot protect guest VMs with affinity policies.
  • You cannot resize a guest VM disk while the guest VM is in replication. See KB-9986 for more information.

Creating a Protection Policy with the Synchronous Replication Schedule

To protect the guest VMs in an instant replication schedule, configure a Synchronous replication schedule while creating the protection policy. The policy replicates all the writes on the protected guest VMs synchronously to the recovery AZ (AZ) for High Availability. For a raw node (HDD+SSD) size up to 120 TB, the policy also takes crash-consistent recovery points of those guest VMs every 6 hours and replicates them to the recovery AZ—the first snapshot is taken immediately. To maintain the efficiency of synchronous replication, the protection policy allows you to add only one recovery AZ for the protected VMs. When creating a protection policy, you can specify only VM categories. If you want to protect guest VMs individually, you must first create the protection policy—which can also include VM categories, and then include the guest VMs individually in the protection policy from the VMs page.

Before you begin

See Synchronous Replication Requirements and Synchronous Replication Limitations before you start.

About this task

To create a protection policy with the Synchronous replication schedule, do the following at the primary AZ. You can also create a protection policy at the recovery AZ. Protection policies you create or update at a recovery AZ synchronize back to the primary AZ.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Policies > Protection Policies in the left pane.
    Figure. Protection Policy Configuration: Protection Policies
    Click to enlarge Protection Policy Configuration: Protection Policies

  3. Click Create Protection Policy .
    Specify the following information in the Create Protection Policy window.
    Figure. Protection Policy Configuration: Select Primary Location
    Click to enlarge Protection Policy Configuration: Select Primary Location

    1. Policy name : Enter a name for the protection policy.
      Caution: The name can be of only alphanumeric, dot, dash, and underscore characters.
    2. In the Primary Location pane, specify the following information.
        1. Location : From the drop-down list, select an AZ (AZ) that hosts the guest VMs to protect.

          The drop-down lists all the AZs paired with the local AZ. Local AZ represents the local AZ (Prism Central). For your primary AZ, you can check either the local AZ or a non-local AZ.

        2. Cluster : From the drop-down list, select the AHV cluster that hosts the VMs to protect.

          The drop-down lists all the Nutanix clusters registered to Prism Central representing the selected AZ. If you want to protect the guest VMs from multiple AHV clusters in the same protection policy, select the AHV clusters that host those guest VMs. All Clusters protects the guest VMs of all Nutanix clusters registered to Prism Central. Select All Clusters only if all the clusters are running AHV.

        3. Click Save .

          Clicking Save activates the Recovery Location pane. Do not add a local schedule to retain the recovery points locally. To maintain the replication efficiency, Synchronous replication allows only the replication schedule. If you add a local schedule, you cannot click Synchronous in step d.

    3. In the Recovery Location pane, specify the following information.
      Figure. Protection Policy Configuration: Select Recovery Location
      Click to enlarge Protection Policy Configuration: Select Recovery Location

        1. Location : From the drop-down list, select the AZ (AZ) where you want to replicate the recovery points.

          The drop-down lists all the AZs paired with the local AZ. Local AZ represents the local AZ (Prism Central). Select Local AZ if you want to configure DR to a different AHV cluster at the same AZ.

          If you do not select a AZ, local recovery points that are created by the protection policy do not replicate automatically. You can, however, replicate the recovery points manually and use recovery plans to recover the guest VMs. For more information, see Protection and Manual DR (Nutanix Disaster Recovery).

        2. Cluster : From the drop-down list, select the AHV cluster where you want to replicate the guest VM writes synchronously and recovery points.

          The drop-down lists all the Nutanix clusters registered to Prism Central representing the selected AZ. You can select one AHV cluster at the recovery AZ. Do not select an ESXi cluster because DR configurations using Leap support only AHV cluster. If you select an ESXi cluster and configure a Synchronous replication schedule, replications fail.

          Note: Selecting auto-select from the drop-down menu replicates the recovery points to any available cluster at the recovery AZ. Select auto-select from the drop-down list only if all the clusters at the recovery AZ are running on AHV and are up and running.
        3. Click Save .

          Clicking Save activates the + Add Schedule button between the primary and the recovery AZ. Do not add a local schedule to retain the recovery points locally. To maintain the replication efficiency, Synchronous replication allows only the replication schedule. If you add a local schedule, you cannot click Synchronous in step d.

    4. Click + Add Schedule to add a replication schedule between the primary and the recovery AZ.

      Specify the following information in the Add Schedule window. The window auto-populates the Primary Location and Recovery Location that you have selected in step b and step c.

      Figure. Protection Policy Configuration: Add Schedule (Synchronous)
      Click to enlarge Protection Policy Configuration: Add Schedule (Synchronous)

        1. Protection Type : Click Synchronous .
        2. Failure Handling : Select one of the following options to handle failure. For example, if the connection between the primary and the recovery AZ breaks and VM writes on the primary cluster stops replicating.
          • Manual : Select this option if you want to resume the VM writes on the primary AZ only when you manually disable Synchronous replication.
          • Automatic : Select this option to resume VM writes on the primary AZ automatically after the specified Timeout after seconds.
            Note: The minimum timeout period is 10 seconds. For a guest VM that is configured for AHV Metro recovery, the timeout period specified in the Witness configured recovery plan takes precedence. For example, if the timeout period in the protection policy is 10 seconds and then you specify 40 seconds in the Witness recovery plan (automatic), the timeout period for failure handling shall be 40 seconds only.
        3. Click Save Schedule .

          Clicking Save Schedule disables the + Add Recovery Location button at the top-right because to maintain the efficiency of synchronous replication, the policy allows you to add only one recovery AZ.

      By default, recovery point creation begins immediately after you create the protection policy. If you want to specify when recovery point creation must begin, click Immediately at the top-right corner, and then, in the Start Time dialog box, do the following.

        1. Click Start protection at specific point in time .
        2. Specify the time at which you want to start taking recovery points.
        3. Click Save .
    5. Click Next .
      Clicking Next shows a list of VM categories where you can optionally check one or more VM categories to protect in the protection policy. DR configurations using Leap allows you to protect a guest VM by using only one protection policy. Therefore, VM categories specified in another protection policy are not in the list. If you protect a guest VM in another protection policy by specifying the VM category of the guest VM (category-based inclusion), and if you protect the guest VM from the VMs page in this policy (individual inclusion), the individual inclusion supersedes the category-based inclusion. Effectively, only the protection policy that protected the individual guest VM protects the guest VM.

      For example, the guest VM VM_SherlockH is in the category Department:Admin , and you add this category to the protection policy named PP_AdminVMs . Now, if you add VM_SherlockH from the VMs page to another protection policy named PP_VMs_UK , VM_SherlockH is protected in PP_VMs_UK and unprotected from PP_AdminVMs .

    6. If you want to protect the guest VMs category wise, check the VM categories that you want to protect from the list and click Add .
      Prism Central includes built-in VM categories for frequently encountered applications (for example, MS Exchange and Oracle). If the VM category or value you want is not available, first create the category with the required values, or update an existing category so that it has the values you require. Doing so ensures that the VM categories and values are available for selection. You can add VMs to the category either before or after you configure the protection policy. If the guest VMs have a common characteristic, such as belonging to a specific application or location, create a VM category and add the guest VMs into the category.

      If you do not want to protect the guest VMs category wise, proceed to the next step without checking VM categories. You can add the guest VMs individually to the protection policy later from the VMs page (see Adding Guest VMs Individually to a Protection Policy).

    7. Click Create .
      The protection policy with Synchronous replication schedule is created. To verify the protection policy, see the Protection Policies page. If you check VM categories in step f, the protection policy starts generating recovery points of the guest VMs in those VM categories. To see the generated recovery points, click the hamburger icon at the top-left corner of the window and go to VM Recovery Points . Click the recovery points for its information. You can see the time estimated for the very first replication (seeding) to the recovery AZs.
      Figure. Recovery Points Overview Click to enlarge Recovery Points Overview

Creating a Recovery Plan (Nutanix Disaster Recovery)

To orchestrate the failover of the protected guest VMs to the recovery AZ, create a recovery plan. After a failover, a recovery plan recovers the protected guest VMs to the recovery AZ. If you have configured two recovery AZs in a protection policy, create two recovery plans for DR—one for recovery to each recovery AZ. The recovery plan synchronizes continuously to the recovery AZ in a bidirectional way.

For more information about creating a recovery plan, see Creating a Recovery Plan (Nutanix Disaster Recovery).

Synchronous Replication Management

Synchronous replication instantly replicates all writes on the protected guest VMs to the recovery cluster. Replication starts when you configure a protection policy and add the guest VMs to protect. You can manage the replication by enabling, disabling, pausing, or resuming the Synchronous replication on the protected guest VMs from the Prism Central.

Enabling Synchronous Replication

When you configure a protection policy with Synchronous replication schedule and add guest VMs to protect, the replication is enabled by default. However, if you have disabled the Synchronous replication on a guest VM, you have to enable it to start replication.

About this task

To enable Synchronous replication on a guest VM, perform the following procedure at the primary AZ (AZ). You can also perform the following procedure at the recovery AZ. The operations you perform at a recovery AZ synchronize back to the primary AZ.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Virtual Infrastructure > VMs > List in the left pane.
  3. Select the guest VMs on which you want to enable Synchronous replication.
  4. Click Protect from the Actions drop-down menu.
  5. Select the protection policy in the table to include the guest VMs in the protection policy.
  6. Click Protect .
Pausing Synchronous Replication

The protected guest VMs on the primary cluster stop responding when the recovery cluster is disconnected abruptly (for example, due to network outage or internal service crash). To come out of the unresponsive state, you can pause Synchronous replication on the guest VMs. Pausing Synchronous replication temporarily suspends the replication state of the guest VMs without completely disabling the replication relationship.

About this task

To pause Synchronous replication on a guest VM, perform the following procedure.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Virtual Infrastructure > VMs > List in the left pane.
  3. Select the guest VMs on which want to pause the Synchronous replication.
  4. Click Pause Synchronous Replication from the Actions drop-down menu.
Resuming Synchronous Replication

You can resume the Synchronous replication that you had paused to come out of the unresponsive state of the primary cluster. Resuming Synchronous replication restores the replication status and reconciles the state of the guest VMs. To resume Synchronous replication on a guest VM, perform the following procedure.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Virtual Infrastructure > VMs > List in the left pane.
  3. Select the guest VMs on which want to resume Synchronous replication.
  4. Click Resume Synchronous Replication from the Actions drop-down menu.

Failover and Failback Operations (Nutanix Disaster Recovery)

You can perform test failover, planned failover, and unplanned failover of the guest VMs protected with Synchronous replication schedule across the AHV clusters at the different on-prem AZ (AZ). The steps to perform test, planned, and unplanned failover are largely the same irrespective of the replication schedules that protects the guest VMs. Additionally, a planned failover of the guest VMs protected with Synchronous replication schedule also allows for live migration of the protected guest VMs.

Refer Failover and Failback Management for test, planned, and unplanned failover procedures.

Cross-Cluster Live Migration

Planned failover of the guest VMs protected with Synchronous replication schedule supports live migration to another AHV cluster. Live migration offers zero downtime for your applications during a planned failover event to the recovery cluster (for example, during scheduled maintenance).

Cross-Cluster Live Migration Requirements

The following are the specific requirements to successfully migrate your guest VMs with Live Migration.

Ensure that you meet the following requirements in addition to the requirements of Synchronous replication schedule (Synchronous Replication Requirements) and general requirements of Leap (Nutanix Disaster Recovery Requirements).

  • Stretch L2 networks across the primary and recovery AZs.

    Network stretch spans your network across different AZs. A stretched L2 network retains the IP addresses of guest VMs after their Live Migration to the recovery AZ.

  • Both the primary and recovery Nutanix clusters must have identical CPU types.

    The primary and recovery Nutanix clusters must have identical CPU feature set. If the CPU feature sets (set of CPU flags) are unidentical, Live Migration fails.

  • Both the primary and recovery Nutanix clusters must run on the same AHV version.
  • If the primary and the recovery Nutanix clusters (Prism Element) are in different subnets, open the ports 49250–49260 for communication. For the complete list of required ports, see Port Reference.
Cross-Cluster Live Migration Limitations

Consider the following limitation in addition to the limitations of Synchronous replication schedule (Synchronous Replication Limitations) and general limitations of Leap (Nutanix Disaster Recovery Limitations) before performing live migration of your guest VMs.

  • Live migration of guest VMs fails if the guest VMs are part of Flow security policies.
    Tip: To enable the guest VMs to retain the Flow security policies after the failover (live migration), revoke the policies on the guest VMs and Export them to the recovery AZ. At the recovery AZ, Import the policies. The guest VMs read the policies automatically after recovery.

Performing Cross-Cluster Live Migration

If due to a planned event (for example, scheduled maintenance of guest VMs) at the primary AZ (AZ), you want to migrate your applications to another AHV cluster without downtime, perform a planned failover with Live Migration to the recovery AZ.

About this task

To live migrate the guest VMs, do the following procedure at the recovery AZ.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Policies > Recovery Plans in the left pane.
  3. Select a recovery plan for the failover operation.
    Caution: The Recovery Plans page displays many recovery plans. Select the recovery plan that has Stretch Networks . If you select a recovery plan having Non-stretch networks , the migration fails. For more information about selection of stretch and non-stretch networks, see Creating a Recovery Plan (Nutanix Disaster Recovery).
  4. Click Failover from the Actions drop-down menu.
    Specify the following information in the Failover from Recovery Plan window. The window auto-populates the Failover From and Failover To locations from the recovery plan you select in step 3.
    Note: If you select more than one recovery plan in step 3, the Failover action is available only when the selected recovery plans have the same primary and recovery locations.
    Figure. Planned Failover
    Click to enlarge Planned Failover

    1. Failover Type : Click Planned Failover and check Live Migrate VMs .
    2. Click + Add target clusters if you want to failover to specific clusters at the recovery AZ.
      If you do not add target clusters, the recovery plan migrates the guest VMs to any AHV cluster at the recovery AZ.
  5. Click Failover .
    The Failover from Recovery Plan dialog box lists the errors and warnings, if any, and allows you to stop or continue the test operation. If there are no errors or you resolve the errors in step 6, the guest VMs migrate and start at the recovery cluster. The migration might show a network latency of 300-600 ms. You cannot see the migrated guest VMs on the primary cluster because those VMs come up at the recovery cluster after the migration.
  6. If you see errors, do the following.
    • To review errors or warnings, click View Details in the description.

      Resolve the error conditions and then restart the failover procedure.

    • Select one of the following.
      • To stop the failover operation, click Abort .
      • To continue the failover operation despite the warnings, click Execute Anyway .
        Note: You cannot continue the failover operation when the validation fails with errors.

AHV Metro (Witness Option)

The Witness option extends the capability of metro availability to AHV clusters using Nutanix Disaster Recovery . To automate the recovery process, the recovery plan configuration is enhanced to automatically handle the failure execution. The Witness is a service within Prism Central (including scale-out deployments) that monitors communication between the metro pair of clusters (primary and recovery AHV clusters). When the communication between both the clusters is interrupted for a configurable time interval, the service executes an unplanned failover depending on the failure type and the actions you have configured in the associated recovery plan.

Figure. AHV Metro Workflow Click to enlarge AHV Metro

Witness continuously reads the health status from the metro pair of AHV clusters. If the communication between the two clusters is unavailable for a set time period, the service pauses the Synchronous replication between the clusters. If the primary cluster is unavailable, the service can also automatically trigger an unplanned failover automatically to start the guest VMs specified in the Witness configured recovery plan at the recovery cluster.

You can configure AHV metro for the clusters registered to the same Prism Central (same AZ). If one of your clusters is running ESXi and you want the witness capability , see the Witness Option in Data Protection and Recovery with Prism Element . Nutanix recommends deploying the single Prism Central on a different AHV or ESXi cluster (that is, a different fault domain) to help avoid a single point of failure (see AHV Metro Requirements and Recommendations). Prism Central VMs residing in a separate fault domain provide an outside view that can distinguish a cluster failure from a network interruption between the Metro Availability clusters.

The following table describes the cluster failure scenarios and the witness response behavior. For more information about recovery workflows in each scenario, see AHV Metro Recovery Workflows.

Table 1. Witness Response Behaviors
Failure scenario Recovery plan Failure Execution Mode : Automatic (Witness) Recovery plan Failure Execution Mode : Manual or no recovery plan
Protection policy failure handling: Manual Protection policy failure handling: Automatic Protection policy failure handling: Manual Protection policy failure handling: Automatic
Primary cluster outage
  • Guest VMs fail over automatically (unplanned failover) to the recovery cluster after the timeout period set in the recovery plan. See Performing an Unplanned Failover (Leap) for more information.
    Note: The timeout period set in the recovery plan supersedes the automatic failure-handling value in the protection policy.
  • Administrative intervention is required to delete the guest VMs on the primary cluster when the primary cluster becomes functional.
  • Guest VM I/O operations freeze (read-only state).
  • Administrative intervention is required to manually perform an unplanned failover of the guest VMs. See Performing an Unplanned Failover (Leap) for more information.
  • The recovery plan must be configured for the efficient recovery of guest VMs. For information about creating recovery plans, see Creating a Recovery Plan (Nutanix Disaster Recovery).
Complete network failure in the primary cluster (connection loss between the primary cluster and Prism Central and recovery cluster)
  • Guest VMs fail over automatically (unplanned failover) to the recovery cluster after the timeout period set in the recovery plan. See Performing an Unplanned Failover (Leap) for more information.
    Note: The timeout period set in the recovery plan supersedes the automatic failure-handling value in the protection policy.
  • Administrative intervention is required to delete the guest VMs on the primary cluster when the primary cluster becomes functional.
  • Guest VM I/O operations freeze (read-only state).
  • Administrative intervention is required to pause the Synchronous replication.
    Note: Synchronous replication pauses only when there is network connectivity between Prism Central and the metro pair of clusters. Contact Nutanix Support to recover the guest VMs when there is no network connectivity.
  • Guest VM I/O operations continue to run on the primary cluster after synchronous replication is paused
  • Synchronous replication pauses after the timeout period set in the protection policy.
    Note: The synchronization status does not update until there is network connectivity between the primary cluster and Prism Central.
  • Guest VMs continue to run on the primary cluster.
Recovery cluster outage or complete network failure in the recovery cluster
  • Synchronous replication pauses on all the guest VMs after the timeout period set in the recovery plan.
  • Guest VMs continue to run on the primary cluster.
  • The ability to automatically recovering the guest VMs (unplanned failover) ceases because there is no communication with the recovery cluster.
  • Guest VM I/O operations freeze (read-only state).
  • Administrative intervention is required to pause the Synchronous replication.
  • Guest VMs I/O operations continue to run on the primary cluster after synchronous replication is paused
  • Synchronous replication pauses after the timeout period set in the protection policy.
  • Guest VMs I/O operations continue to run on the primary cluster.
Connection loss between the primary and recovery clusters
  • Synchronous replication pauses on all the guest VMs after the timeout period set in the recovery plan.
  • Guest VMs continue to run on the primary cluster.
  • The ability to automatically recovering the guest VMs (unplanned failover) ceases because there is no communication with the recovery cluster.
Guest VM I/O operations freeze until either of the following.
  • Connectivity between the primary and recovery clusters is functional
  • Administrative intervention to manually pause Synchronous replication
  • Synchronous replication pauses after the timeout period set in the protection policy.
  • Guest VMs I/O operations continue to run on the primary cluster.
Failure at both primary and recovery clusters Administrative intervention required to recover the clusters. Refer to the relevant failure scenario in this table to manage the guest VMs.
Witness failure No impact on Synchronous replication. However, the ability to automatically recover the guest VMs (unplanned failover) from cluster, storage, and network failures cease. Not applicable Not applicable
Connection loss between Prism Central (Witness) and the primary cluster
  • No impact on Synchronous replication.
  • Guest VMs continue to run on the primary cluster.
Not applicable Not applicable
Connection loss between Prism Central (Witness) and the recovery cluster
  • No impact on Synchronous replication.
  • Guest VMs continue to run on the primary cluster.
Not applicable Not applicable
Connection loss between Prism Central (Witness) and both the primary and recovery clusters. The primary and recovery clusters are connected.
  • No impact on Synchronous replication.
  • The ability to automatically recover the guest VMs (unplanned failover) cease because there is no communication between Prism Central (witness) and both the primary and recovery clusters.
  • Guest VMs continue to run on the primary cluster.
Not applicable Not applicable
Connection loss between Prism Central (witness) and both the primary and recovery clusters, and between primary cluster and recovery cluster
  • Guest VMs fail over automatically (unplanned failover) to the recovery cluster after the timeout period set in the recovery plan. See Performing an Unplanned Failover (Leap) for more information.
    Note: The timeout period set in the recovery plan supersedes the automatic failure-handling value in the protection policy.
  • Administrative intervention is required to delete the guest VMs on the primary cluster when the primary cluster becomes functional.
Not applicable Not applicable
Connection loss between both the metro pair of clusters, and between the primary cluster and Prism Central. Prism Central and the recovery cluster and are connected.
  • Guest VMs fail over automatically (unplanned failover) to the recovery cluster after the timeout period set in the recovery plan. See Performing an Unplanned Failover (Leap) for more information.
    Note: The timeout period set in the recovery plan supersedes the automatic failure-handling value in the protection policy.
  • Administrative intervention required to delete the guest VMs on the primary cluster when the primary cluster becomes functional.
Guest VM I/O operations freeze until either of the following.
  • Connectivity between the primary cluster and Prism Central is functional
  • Administrative intervention to manually pause Synchronous replication
  • Synchronous replication pauses after the timeout period set in the protection policy.
    Note: The synchronization status does not update until there is network connectivity between the primary cluster and Prism Central.
  • Guest VMs continue to run on the primary cluster.
Connection loss between both the metro pair of clusters, and between the recovery cluster and Prism Central. Prism Central and the primary cluster are connected.
  • Synchronous replication pauses on all the guest VMs after the timeout period set in the recovery plan.
  • Guest VMs continue to run on the primary cluster.
  • The ability to automatically recovering the guest VMs (unplanned failover) ceases because there is no communication with the recovery cluster.
Guest VM I/O operations freeze until either of the following.
  • Connectivity between the recovery cluster and Prism Central is functional
  • Administrative intervention to manually pause Synchronous replication
  • Synchronous replication pauses after the timeout period set in the protection policy.
  • Guest VMs I/O operations continue to run on the primary cluster.
Storage-only outage on the primary cluster
  • Guest VMs fail over automatically (unplanned failover) to the recovery cluster after the timeout period set in the recovery plan. See Performing an Unplanned Failover (Leap) for more information.
    Note: The timeout period set in the recovery plan supersedes the automatic failure-handling value in the protection policy.
  • Administrative intervention is required to delete the guest VMs on the primary cluster when the primary cluster becomes functional.
  • Guest VMs remain inaccessible until either of the following.
    • Storage-only outage is fixed on the primary cluster
    • Administrative intervention to manually perform an unplanned failover to the recovery cluster. See Performing an Unplanned Failover (Leap) for more information.
  • Recovery plan must be configured for the efficient recovery of guest VMs. For information about creating recovery plans, see Creating a Recovery Plan (Nutanix Disaster Recovery)
Storage-only outage on the recovery cluster
  • Synchronous replication pauses on all the guest VMs after the timeout period set in the recovery plan.
  • Guest VMs continue to run on the primary cluster.
  • The ability to automatically recovering the guest VMs (unplanned failover) ceases because there is no communication with the recovery cluster.
Guest VM I/O operations freeze until either of the following.
  • Storage-only outage on the recovery cluster is fixed
  • Administrative intervention to manually pause the Synchronous replication.
  • Synchronous replication pauses after the timeout period set in the protection policy.
  • Guest VMs continue to run on the primary cluster.
AHV Metro Requirements and Recommendations

These requirements and recommendations help you to successfully configure a witness to automate recovery (failover) when the primary AHV cluster is unavailable.

AHV Metro Requirements

Ensure that you meet these requirements in addition to Synchronous Replication Requirements and the applicable Nutanix Disaster Recovery Requirements before you start.

Nutanix Software Requirements

  • Prism Central must run version pc.2021.7 or newer.

    For Prism Central and AOS version compatibility, see Software Interoperability

  • The primary and recovery AHV clusters must be registered to the same Prism Central deployment.
  • The primary and recovery AHV clusters registered to Prism Central must be running a minimum AOS version of 6.0..
  • The AHV or ESXi cluster hosting Prism Central must be running a minimum AOS version 6.0.

Recovery Plan Requirements

  • A guest VM can be part of one recovery plan only with witness configuration.

    The cluster generates an alert when a guest VM is in multiple recovery plans with witness configuration.

  • You can add the guest VMs protected with Synchronous replication only to the recovery plans with witness configuration.

    You cannot add guest VMs protected with Asynchronous and NearSync replication to the recovery plans with witness configuration. The cluster generates an alert when guest VMs protected with Asynchronous and NearSync replication are in the recovery plans with witness configuration.

Additional Requirements

  • When the recovery AHV cluster becomes available after the break in the synchronous replication between the AHV clusters, manually resume the synchronous replication of the guest VMs from the primary cluster. For more information about resuming synchronous replication, see Resuming Synchronous Replication.
  • When the primary AHV cluster becomes available after the unplanned failover of guest VMs, delete the guest VMs on the primary cluster if the unplanned failover was successful.

AHV Metro Recommendations

See Nutanix Disaster Recovery Recommendations.

  • If the Prism Central instance that you use for protection and disaster recovery to AHV clusters at the same availability zone (AZ) becomes unavailable, the witness service also becomes unavailable. To help avoid a single point of failure, Nutanix recommends deploying the single Prism Central in a different AZ (that is, a different fault domain).
  • To avoid conflicts after the unplanned failover of the guest VMs, shut down the guest VMs associated with this recovery plan if the primary AZ becomes active. Manually power off the guest VMs on either primary or recovery AZ after the failover is complete.
AHV Metro Limitations

Consider the following limitations before configuring AHV Metro to automate the failover recovery when the primary AHV cluster is unavailable. These limitations are in addition to Synchronous replication limitations and applicable general limitations of Leap.

See Synchronous Replication Limitations and Nutanix Disaster Recovery Limitations.

  • You cannot use witness in a configuration with two or more Prism Central deployments across availability zones (AZs).
  • You cannot use witness to automate the unplanned failover recovery of the guest VMs protected with Asynchronous or NearSync replication.
  • You cannot use witness to automate the unplanned failover recovery to Xi Cloud Services.
  • You can only create the recovery plan with witness configuration after creating a protection policy, and having the entities in Synced status.
AHV Metro Recovery Workflows

When the Witness is enabled (automatic mode) in Prism Central, the recovery workflow of the guest VMs in Witness configured recovery plans depends on the nature of the failure. This section describes the recovery workflows of such guest VMs running on the primary cluster in various failure scenarios.

Recovering from the primary cluster outage or complete network failure in the primary cluster

When the primary cluster (Prism Element A) becomes unavailable, the recovery cluster (Prism Element B) detects the outage and acquires the lock from the Witness. The system then performs an unplanned failover of the specified guest VMs (in Witness configured recovery plans) to the recovery cluster (Prism Element B) after the specified timeout period.
Note: If the guest VMs also exist on the recovery cluster (Prism Element B), the system does the following:
  1. Pauses the Synchronous replication of those guest VMs to the primary cluster (Prism Element A).
  2. Performs an unplanned failover of the guest VMs to the recovery cluster (Prism Element B) after the specified timeout period.

When primary cluster Prism Element A is operational again, you must manually resume the Synchronous replication back to Prism Element A.

Figure. Primary cluster outage or complete network failure in the primary cluster
Click to enlarge Primary Cluster Outage

Recovering from the recovery cluster outage or complete network failure in the recovery cluster

Figure. Recovery cluster outage or complete network failure in the recovery cluster
Click to enlarge Recover Cluster Outage

When the recovery cluster Prism Element B becomes unavailable, the primary cluster Prism Element Adetects the outage and acquires the lock from the Witness. The system then pauses the Synchronous replication to Prism Element B. The guest VMs on Prism Element A are up and running. However, any modifications to the guest VMs do not synchronize with the recovery cluster. If Prism Element A is unavailable now, an unplanned failover cannot happen automatically. If you try the unplanned failover manually, it fails. When Prism Element B is operational again, you must manually resume the Synchronous replication to Prism Element B.

Recovering from connection loss between the primary and recovery clusters

Figure. Connection loss between the primary and recovery clusters
Click to enlarge Connection loss

When there is a connection loss between the metro pair of clusters (Prism Element A and Prism Element B) but the network connections remain good between each cluster and Prism Central (Witness), both Prism Element A and Prism Element B attempt to acquire the Witness lock. However, Prism Element A acquires the lock first and the system pauses the Synchronous replication to Prism Element B. The guest VMs on Prism Element A are up and running. If Prism Element A is unavailable now, an unplanned failover cannot happen automatically. If you try the unplanned failover manually, it fails. When Prism Element B is operational again, you must manually resume the Synchronous replication to Prism Element B.

Recovering from Witness failure

Figure. Witness Failure
Click to enlarge Witness Failure

When Prism Central (or the Witness) is unavailable, the cluster triggers an alert in both Prism Element A and Prism Element B but metro availability is otherwise unaffected. When Prism Central (or the Witness) becomes operational again, the Witness workflow resumes automatically without any intervention.

Recovering from complete network failure

Figure. Connection loss between Prism Central (witness) and both the primary and recovery clusters, and between primary cluster and recovery cluster
Click to enlarge All connection loss

When there is a complete network failure (the metro pair of clusters cannot connect to each other or to Prism Central), Prism Element A or Prism Element B cannot acquire the Witness lock. The guest VMs on Prism Element A freeze and I/O operations fail. If Prism Element A is unavailable now, an unplanned failover cannot happen automatically. If you try the unplanned failover manually, it fails. When the connections are restored, you must manually resume the Synchronous replication.

Recovering from a double network failure (primary cluster isolated)

When there is a double network failure—where the connection is lost both between the metro pair of clusters, and between Prism Element A and Prism Central, but Prism Element B and Prism Central are connected—both Prism Element A and Prism Element B attempt to acquire the Witness lock. Because Prism Element A cannot connect to Prism Central, Prism Element B acquires the lock after the built-in delay passes and performs the unplanned failover to bring the guest VMs on Prism Element B. The guest VMs on Prism Element A freeze and I/O operations fail.

Recovering from a double network failure (recovery cluster isolated)

When there is a double network failure—where the connection is lost both between the metro pair of clusters, and between Prism Element B and Prism Central, but Prism Element A and Prism Central are connected—Prism Element A acquires the Witness lock after the built-in delay passes and pauses the Synchronous replication to Prism Element B. The guest VMs on Prism Element A are up and running. When the connections are restored, you must resume the Synchronous replication to Prism Element A.

Recovering from a Storage-only outage on the primary cluster

A storage-only outage is defined as a condition where the hosts are up and running but have lost connectivity to the storage on the primary cluster (the storage on primary cluster goes into an inaccessible state). In the event of a storage-only outage on Prism Element A, the system performs an unplanned failover of the guest VMs to Prism Element B.

Figure. Storage-only outage on the primary cluster
Click to enlarge Primary Storage Outage

Recovering from a Storage-only outage on the recovery cluster

In the event of a storage-only outage on Prism Element B (the storage goes into an inaccessible state), the system pauses the Synchronous replication but the I/O operations continue on Prism Element A.

Figure. Storage-only outage on the recovery cluster
Click to enlarge Recovery Storage Outage

Configuring AHV Metro

Before you begin

See AHV Metro Requirements and Recommendations and AHV Metro Limitations.

About this task

Procedure

  1. Log on to the Prism Central web console.
  2. Click the settings button (gear icon) at the top-right corner of the window.
  3. Click Witness in the General section on the left pane.
    Witness is enabled by default in Prism Central. The cluster information is visible if the cluster hosting Prism Central is also registered to Prism Central. Click View Usage History to see unplanned failover events triggered by the witness in the past.
  4. Create a recovery plan to automatically trigger an unplanned failover to recover the AHV cluster.
    For more information about creating a recovery plan, see Creating a Recovery Plan (Nutanix Disaster Recovery). The procedure to create a recovery plan with witness configuration is identical except as follows.
    1. In the General tab, select the Primary Location and the Recovery Location as Local AZ .
    2. Select Automatic in Failure Execution Mode to enable the witness to break the replication and trigger an unplanned failover to the recovery AHV cluster in the specified timeout seconds.
    Note: Set Execute failover after disconnectivity of to value between 30 to 300 seconds. The timeout period you specify here supersedes the timeout period specified in the protection policy for automatic failure handling. For example, if the timeout period in the protection policy is 10 seconds and then you specify 40 seconds here, the timeout period for failure handling shall be 40 seconds only.
    Note: Failure Execution Mode appears only when the Primary Location and the Recovery Location are the same.
    A recovery plan with witness configuration is created. To confirm the witness configuration, click the settings button (gear icon), then click Witness in the General section on the left pane. The page shows the number of recovery plans using the witness under Monitoring (On this PC) . Click the number to see the list of recovery plans using the witness.

Converting a Multi-AZ Deployment to Single-AZ

To use disaster recovery (DR) features that support only single Prism Central (AZ) managed deployments, you can convert your multi-AZ deployment to single-AZ deployment. For example, in two AZ deployments where each Prism Central (Prism Central A, Prism Central B) instance hosts one Prism Element cluster (Prism Element A, Prism Element B) , you can perform the following procedure to convert to a single-AZ deployment (Prism Central A managing both Prism Element A, Prism Element B) .

Before you begin

This procedure converts deployments protected with Synchronous replication schedules. See Synchronous Replication Requirements for the supported Prism Central and AOS versions. To avoid the single point of failure in such deployments, Nutanix recommends installing the single Prism Central at a different AZ (different fault domain).

Perform this procedure to convert deployments protected in Asynchronous and NearSync replications schedules also. The conversion procedure for deployments protected in Asynchronous and NearSync replications schedules are identical except that the protection status (step 2 in the described procedure) of Asynchronous and NearSync replications schedules is available only in Focus > Data Protection .

Figure. Focus
Click to enlarge Focus

Procedure

  1. Log on to the web console of Prism Central A .
  2. Modify all the protection policies and recovery plans that refer to Prism Element B and Prism Central B .
    1. Modify the protection policies to either remove all the references to Prism Element B and Prism Central B or remove all the guest VMs from the policy.
      For more information about updating a protection policy, see Updating a Protection Policy.
    2. Modify the recovery plans to remove all the references to Prism Element B and Prism Central B .
      Note: If you do not modify the recovery plans, the recovery plans become invalid after the unregistration of Prism Element B with Prism Central B in step 2. For more information about updating a recovery plan, see Updating a Recovery Plan.
    3. Ensure that there are no issues (in Alerts ) with the modified protection policies and recovery plans.
      Note: Before unregistering Prism Element B from Prism Central B in step 2, ensure that no guest VM is protected to and from Prism Element B .
  3. Unprotect all the guest VMs replicating to and from Prism Element B and Prism Central B .
    Note: If the guest VMs are protected by VM categories, update or delete the VM categories from the protection policies and recovery plans.
    To see the unprotect status of the guest VMs, click Focus > Data Protection
  4. Ensure that the guest VMs unprotect completely.
    • To ensure all the stretch states are deleted, log on to Prism Element B through SSH as the nutanix user and run the following command.
      nutanix@cvm$ stretch_params_printer
      Empty response indicates that all stretch states are deleted.
    • To ensure all the stretch states between Prism Central B and Prism Element B are deleted, log on to Prism Central B through SSH as nutanix user and run the following commands.
      pcvm$ mcli
      mcli> mcli dr_coordinator.list
      Empty response indicates that all stretch states are deleted.
  5. Unregister Prism Element B from Prism Central B .
    After unregistering Prism Element B from Prism Central B , the system deletes all Prism Central B attributes and policies applied to guest VMs on Prism Element B (for example, VM categories).
  6. Register Prism Element B to Prism Central A .
    After registering Prism Element B to Prism Central A , reconfigure all Prism Central B attributes and policies applied to entities on the Prism Element B (for example, VM categories).
  7. Modify the protection policies and recovery plans to refer to Prism Central A and Prism Element B .
  8. Unpair Prism Central B .
    To ensure all the stretch states between Prism Central A and Prism Central B are deleted, log on to both Prism Central A and Prism Central B through SSH and run the following command. .
    pcvm$ mcli
    mcli> mcli dr_coordinator.list
    Empty response indicates that all stretch states are deleted.
    Multi-AZ deployment is converted to Single-AZ. Prism Element A and Prism Element B is registered to single Prism Central ( Prism Central A ) managed deployment.

Protection Policy Management

A protection policy automates the creation and replication of recovery points. When creating a protection policy, you specify replication schedules, retention policies for the recovery points, and the guest VMs you want to protect. You also specify a recovery AZ (maximum 2) if you want to automate recovery point replication to the recovery availability zones (AZs).

When you create, update, or delete a protection policy, it synchronizes to the recovery AZs and works bidirectionally. The recovery points generated at the recovery AZs replicate back to the primary AZ when the primary AZ starts functioning. For information about how Leap determines the list of AZs for synchronization, see Entity Synchronization Between Paired AZs.

Adding Guest VMs Individually to a Protection Policy

You can also protect guest VMs individually in a protection policy from the VMs page, without the use of a VM category. To protect guest VMs individually in a protection policy, perform the following procedure.

About this task

Note: If you protect a guest VM individually, you can remove the guest VM from the protection policy only by using the procedure in Removing Guest VMs Individually from a Protection Policy.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Compute & Storage > VMs in the left pane.
    Figure. Virtual Infrastructure: VMs
    Click to enlarge Protect Guest VMs Individually: Virtual Infrastructure

  3. Select the guest VMs that you want to add to a protection policy.
  4. Click Protect from the Actions drop-down menu.
    Figure. Protect Guest VMs Individually: Actions
    Click to enlarge Protect Guest VMs Individually: Actions

  5. Select the protection policy in the table to protect the selected guest VMs.
    Figure. Protect Guest VMs Individually: Protection Policy Selection
    Click to enlarge Protect Guest VMs Individually: Protection Policy Selection

  6. Click Protect .

Removing Guest VMs Individually from a Protection Policy

You can remove guest VMs individually from a protection policy from the VMs page. To remove guest VMs individually from a protection policy, perform the following procedure.

About this task

Note: If a guest VM is protected under a VM category, you cannot remove the guest VM from the protection policy by the following procedure. You can remove the guest VM from the protection policy only by dissociating the guest VM from the VM category.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Compute & Storage > VMs in the left pane.
    Figure. Virtual Infrastructure: VMs
    Click to enlarge Protect Guest VMs Individually: Virtual Infrastructure

  3. Select the guest VMs that you want to remove from a protection policy.
  4. Click UnProtect from the Actions drop-down menu.

Cloning a Protection Policy

If the requirements of the protection policy that you want to create are similar to an existing protection policy, you can clone the existing protection policy and update the clone. To clone a protection policy, perform the following procedure.

About this task

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Data Protection > Protection Policies in the left pane.
    Figure. Protection Policy Configuration: Protection Policies
    Click to enlarge Protection Policy Configuration: Protection Policies

  3. Select the protection policy that you want to clone.
  4. Click Clone from the Actions drop-down menu.
  5. Make the required changes on the Clone Protection Policy page.
    For information about the fields on the page, see:
    • Creating a Protection Policy with an Asynchronous Replication Schedule (Nutanix Disaster Recovery)
    • Creating a Protection Policy with a NearSync Replication Schedule (Nutanix Disaster Recovery)
    • Creating a Protection Policy with the Synchronous Replication Schedule
  6. Click Save .

Updating a Protection Policy

You can modify an existing protection policy in Prism Central. To update an existing protection policy, perform the following procedure.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Data Protection > Protection Policies in the left pane.
    Figure. Protection Policy Configuration: Protection Policies
    Click to enlarge Protection Policy Configuration: Protection Policies

  3. Select the protection policy that you want to update.
  4. Click Update from the Actions drop-down menu.
  5. Make the required changes on the Update Protection Policy page.
    For information about the fields on the page, see:
    • Creating a Protection Policy with an Asynchronous Replication Schedule (Nutanix Disaster Recovery)
    • Creating a Protection Policy with a NearSync Replication Schedule (Nutanix Disaster Recovery)
    • Creating a Protection Policy with the Synchronous Replication Schedule
  6. Click Save .

Finding the Protection Policy of a Guest VM

You can use the data protection focus on the VMs page to determine the protection policies to which a guest VM belongs. To determine the protection policy, perform the following procedure.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Virtual Infrastructure > VMs > List in the left pane.
  3. Click Data Protection from the Focus menu at the top-right corner.
    The Protection Policy column that is displayed shows the protection policy to which the guest VMs belong.
    Figure. Focus
    Click to enlarge Focus

Recovery Plan Management

A recovery plan orchestrates the recovery of protected VMs at the recovery AZ. Recovery plans are predefined procedures (runbooks) that use stages to enforce VM power-on sequence. You can also specify the inter-stage delays to recover applications.

When you create, update, or delete a recovery plan, it synchronizes to the recovery AZs and works bidirectionally. For information about how Leap determines the list of AZs for synchronization, see Entity Synchronization Between Paired AZs. After a failover from the primary AZ to a recovery AZ, you can failback to the primary AZ by using the same recovery plan.

Recovery plans are independent of protection policies and do not reference protection policies in their configuration information. Also, they do not create recovery points. While the process of planned failover includes the creation of a recovery point so that the latest data can be used for recovery, unplanned and test failovers rely on the availability of the required recovery points at the recovery AZ. A recovery plan therefore requires the guest VMs in the recovery plan to also be associated with a protection policy.

Adding Guest VMs Individually to a Recovery Plan

You can also add guest VMs individually to a recovery plan from the VMs page, without the use of a VM category. To add VMs individually to a recovery plan, perform the following procedure.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Compute & Storage > VMs in the left pane.
    Figure. Virtual Infrastructure: VMs
    Click to enlarge Protect Guest VMs Individually: Virtual Infrastructure

  3. Select the guest VMs that you want to add to a recovery plan.
  4. Click Add to Recovery Plan from the Actions drop-down menu.
  5. Select the recovery plan where you want to add the guest VMs in the Add to Recovery Plan page.
    Tip: Click +Create New if you want to create another recovery plan to add the selected guest VM. For more information about creating a recovery plan, see Creating a Recovery Plan (Nutanix Disaster Recovery).
  6. Click Add .
    The Update Recovery Plan page appears. Make the required changes to the recovery plan. For information about the various fields and options, see Creating a Recovery Plan (Nutanix Disaster Recovery).

Removing Guest VMs Individually from a Recovery Plan

You can also remove guest VMs individually from a recovery plan. To remove guest VMs individually from a recovery plan, perform the following procedure.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Data Protection > Recovery Plans in the left pane.
  3. Click the hamburger icon at the top-left corner of the window. Go to Data Protection > Recovery Plans in the left pane.
  4. Select the recovery plan from which you want to remove guest VM.
  5. Click Update from the Actions drop-down menu.
    The Update Recovery Plan page appears. Make the required changes to the recovery plan. For information about the various fields and options, see Creating a Recovery Plan (Nutanix Disaster Recovery).

Updating a Recovery Plan

You can update an existing recovery plan. To update a recovery plan, perform the following procedure.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Data Protection > Recovery Plans in the left pane.
  3. Select the recovery plan that you want to update.
  4. Click Update from the Actions drop-down menu.
    The Update Recovery Plan dialog box appears. Make the required changes to the recovery plan. For information about the various fields and options, see Creating a Recovery Plan (Nutanix Disaster Recovery).

Validating a Recovery Plan

You can validate a recovery plan from the recovery AZ. Recovery plan validation does not perform a failover like the test failover does, but reports warnings and errors. To validate a recovery plan, perform the following procedure.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Data Protection > Recovery Plans in the left pane.
  3. Select the recovery plan that you want to validate.
  4. Click Validate from the Actions drop-down menu.
  5. In the Validate Recovery Plan page, do the following.
    1. In Primary Location , select the primary location.
    2. In Recovery Location , select the recovery location.
    3. Click Proceed .
    The validation process lists any warnings and errors.
  6. Click Back .
    A summary of the validation is displayed. You can close the dialog box.
  7. To return to the detailed results of the validation, click the link in the Validation Errors column.
    The selected recovery plan is validated for its correct configuration. The updated recovery plan starts synchronizing to the recovery Prism Central.

Protection and Manual DR (Nutanix Disaster Recovery)

Manual data protection involves manually creating recovery points, replicating recovery points, and recovering the guest VMs at the recovery AZ. You can also automate some of these tasks. For example, the last step—that of manually recovering guest VMs at the recovery AZ—can be performed by a recovery plan while the underlying recovery point creation and replication can be performed by protection policies. Conversely, you can configure protection policies to automate recovery point creation and replication and then recover guest VMs manually at the recovery AZ.

Creating Recovery Points Manually (Out-of-Band Snapshots)

About this task

To create recovery points manually, do the following.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Compute & Storage > VMs > List in the left pane.
  3. Select the guest VMs for which you want to create a recovery point.
  4. Click Create Recovery Point from the Actions drop-down menu.
  5. To verify that the recovery point is created, click the name of the VM, click the Recovery Points tab, and verify that a recovery point is created.

Replicating Recovery Points Manually

You can manually replicate recovery points only from the AZ (AZ) where the recovery points exist.

About this task

To replicate recovery points manually, do the following.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Compute & Storage > VMs > List in the left pane.
  3. Click the guest VMs whose recovery point you want to replicate, and then click Recovery Points in the left.
    The Recovery Points view lists all the recovery points of the VM.
  4. Select the recovery points that you want to replicate.
  5. Click Replicate from the Actions drop-down menu.
    In the Replicate dialog box, do these steps.
    1. In Recovery Location , select the location where you want to replicate the recovery point.
    2. In Target Cluster , select the cluster where you want to replicate the recovery point.
    3. Click Replicate Recovery Point .

Manual Recovery of Guest VMs

After manually creating recovery points, and then replicating those recovery points, you can manually recover the guest VMs from those recovery points by either of the following methods.
Note: You can manually recover guest VMs from both manually and automatically created recovery points.
Cloning the guest VM from a recovery point (out-of-place restore)
Out-of-place restore enables you to create a guest VM clone (with different UUIDs) from each selected recovery point of the guest VM. The operation creates a copy of the guest VM in the same Nutanix cluster without overwriting the original guest VM. The guest VM clone remain separate from the original guest VM. For example, if you select cloning of a guest VM , a new VM with different UUID is created on the specified AZ.
Reverting the guest VM to an older recovery point (in-place restore)
In-place restore enables you to revert the guest VM to its previous recovery point state. The operation recreates the guest VM in the same Nutanix cluster by overwriting the original guest VM. The new guest VM retains the properties like UUIDs, network routing, MAC addresses, hostnames of the original guest VM. For example, if you select reverting a guest VM, a new VM is created on the same AZ with the same properties.
Note: You can manually revert a guest VM protected with synchronous replication using the manually created recovery points (Prism Central) only. Using snapshots (Prism Element), you can only clone a guest VM (out-of-place restore).

Recovering a Guest VM Manually (Clone)

About this task

To recover a guest VM manually by cloning it from a recovery point, do the following.

Note: This method is available as Restore in the previous versions of AOS.

Before you begin

Consider the following limitations before performing cloning operation on guest VMs.
  • The guest VMs recover without a vNIC if the recovery is performed at the remote AZ.
  • The guest VMs recover without VM categories.
  • You must reconfigure Nutanix Guest Tools (NGT) on each recovered guest VM.

    For installing and configuring NGT and its requirements and limitations, see Nutanix Guest Tools in the Prism Web Console Guide .

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Compute & Storage > VMs > List in the left pane.
  3. Click the guest VM that you want to recover (clone), and then click Recovery Points .
    The Recovery Points view lists all the recovery points of the guest VM.
    Tip: You can also click the hamburger icon at the top-left corner and go to Data Protection > VM Recovery Points
  4. Select the recovery point from which you want to recover (clone) the guest VM.
    If you want to recover (clone) multiple states of the guest VM, select multiple recovery points.
    Figure. Select One or More Recovery Points
    Click to enlarge Select One or More Recovery Points

  5. Click Clone (Previously Restore) from the Actions drop-down menu.
    In the Clone dialog box, do the following.
    1. In the text box provided for specifying a name for the guest VM clone, specify a new name or do nothing to use the automatically generated name.
    2. Click Clone .
      The guest VM is cloned from each selected recovery point. The guest VM clones list by the name you specified in step 5.a in VMs > List . The guest VM clones are turned off and you have to start the guest VMs manually.

Recovering a Guest VM Manually (Revert)

About this task

To recover a guest VM manually by reverting it to an older recovery point, do the following.

Before you begin

Before you can perform an in-place restore (revert) of a guest VM, ensure that you meet the following Nutanix Software requirements.

  • The Nutanix cluster must be running AOS version 6.1 or newer.
  • Prism Central is running on version pc.2022.1 or newer.

For successful and efficient in-place restore (revert) of the guest VM along with its configuration, refer to the following tables. Each of the following tables describes settings and the possible change scenarios that can fail the revert operation.

Table 1. NGT This table describes the NGT configuration on the original guest VM and the selected recovery point. It describes the scenarios where changes in NGT configuration fail the revert operation or restore the guest VM without NGT configuration.
Entity Original guest VM Selected recovery point Original guest VM after the recovery point (before revert) Revert operation Guest VM after revert
Guest VM NGT configured NGT configured. The NGT information is the same as that on the original guest VM. No change to the NGT configuration Succeeds Need not reconfigure NGT on the restored guest VM.

NGT configured NGT not configured, or the NGT information is different than that on the original guest VM. NGT configured Fails because there is a mismatch between NGT information on the original guest VM and the selected recovery point.

Not applicable
NGT configured NGT configured NGT not configured Succeeds NGT on the restored guest VM is disabled. You must enable it manually. For enabling NGT, see Nutanix Guest Tools in the Prism Web Console Guide .
NGT not configured NGT not configured, or the NGT information is different than that on the original guest VM. No change to the NGT configuration Succeeds Not applicable
NGT configured NGT configured but the NGT information available is of an older NGT version. Newer NGT version configured Fails because there is a mismatch between NGT information on the original guest VM and the selected recovery point. Not applicable
Table 2. VM Categories This table describes the VM categories attached with the original guest VM and the selected recovery point. It describes the scenarios where changes in VM categories fail the revert operation or restore the guest VM without VM categories.
Entity Original guest VM Selected recovery point Revert operation Guest VM after revert
Guest VM VM category configured (for example, catX:valX) VM category configured but its information is unidentical (for example, catY:valY) Succeeds Retains the VM category of the original guest VM (catX:valX).
VM category not configured. VM category configured (for example, catY:valY) Succeeds Does not retain VM category because the original guest VM did not have VM category configured. You must attach a VM category manually.
Table 3. Virtual Networks This table describes the virtual networks attached with the original guest VM and the selected recovery point. It describes the scenarios where changes in virtual networks fail the revert operation or restore the guest VM without virtual networks.
Entity Original guest VM Selected recovery point Network available during revert operation Revert operation Guest VM after revert
Guest VM Network configured (for example, vLANx) Network configured (for example, vLANx) The Nutanix cluster has vLANx configured. Succeeds Not applicable
Network configured (for example, vLANx) Network configured is different (for example, vLANy) The Nutanix cluster has both vLANx and vLANy configured. Succeeds Retains the network configured on the selected recovery point (vLANy).
No network configured Network configured (for example, vLANy) The Nutanix cluster has vLANy configured. Succeeds Not applicable
No network configured Network configured (for example, vLANy) The Nutanix cluster has a diiferent network (for example, vLANx) configured. Fails Not applicable
Table 4. vNUMA nodes and vGPU AHV Features This table describes the AHV Features (vNUMA nodes and vGPU) attached with the original guest VM and the selected recovery point. It describes the scenarios where changes in vNUMA nodes and vGPU configuration fail the revert operation or restore the guest VM without vNUMA nodes and vGPU configuration.
Entity Original guest VM Selected recovery point Features available during revert operation Revert operation Guest VM after revert
Guest VM vNUMA nodes and vGPU configured vNUMA nodes and vGPU configured. The feature information is the same as that on the original guest VM. The Nutanix cluster vNUMA nodes and vGPU configured Succeeds Not applicable
vNUMA nodes and vGPU configured vNUMA nodes and vGPU configured. The feature information is different than that on the original guest VM. The Nutanix cluster vNUMA nodes and vGPU configured Succeeds Does not retain vNUMA nodes and vGPU configuration.

vNUMA nodes and vGPU not configured vNUMA nodes and vGPU configured Yes Succeeds Retains vNUMA nodes and vGPU configuration. If you encounter missing internal configuration, contact Nutanix Support.
vNUMA nodes and vGPU not configured vNUMA nodes and vGPU configured No Fails because the Nutanix cluster doesn't support GPU. Not applicable
Table 5. Role Based Access Control (RBAC) This table describes the user roles supporting in-place restore (revert) of a guest VM.
Entity Legacy Administrator Cluster Administrator (Prism Administrator)
Guest VM Supported Not Supported

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Compute & Storage > VMs > List in the left pane.
  3. Click the guest VM that you want to recover (revert) to a recovery point, and then click Recovery Points .
    The Recovery Points view lists all the recovery points of the guest VM.
    Tip: You can also click the hamburger icon at the top-left corner and go to Data Protection > VM Recovery Points
  4. Select the recovery point from which you want to recover (revert) the guest VM.
    Note: Do not select more than one recovery point. If you select two or more recovery points, you cannot perform Revert operation.
  5. Click Revert from the Actions drop-down menu.
    Note: You can see the Revert option only if you have administrator role assigned to your local user or directory user through role mapping. For more information about role mapping, see Controlling User Access (RBAC) in the Nutanix Security Guide .
    In the Revert dialog box, verify the recovery point details and then click Revert .
    The selected guest VM turns off. It is unregistered from the inventory, updated with files copied from the selected recovery point, and then registered to the inventory. The recovered (reverted) guest VM lists by the name (Revert-Recoverypoint) in Compute & Storage > VMs > List . If there are more than one guest VM by the name (Revert-Recoverypoint) , see the timestamps of the recovered (reverted) guest VMs to identify the correct guest VMs. The recovered (reverted) guest VM is turned off and you have to start the VM manually.

    After reverting the guest VM to its previous state, some settings of the original guest VM needs reconfiguration for the recovered guest VMs to work efficiently. The following table describes the settings of the original guest VM and whether or not they need reconfiguration after revert.

Entity Synchronization Between Paired AZs

When paired with each other, AZs (AZs) synchronize disaster recovery (DR) configuration entities. Paired AZs synchronize the following DR configuration entities.

Protection Policies
A protection policy is synchronized whenever you create, update, or delete the protection policy.
Recovery Plans
A recovery plan is synchronized whenever you create, update, or delete the recovery plan. The list of AZs (AZs) to which the on-prem must synchronize a recovery plan is derived from the guest VMs that are included in the recovery plan. The guest VMs used to derive the AZ list are VM categories and individually added guest VMs.

If you specify VM categories in a recovery plan, Leap determines which protection policies use those VM categories, and then synchronizes the recovery plans to the AZs specified in those protection plans.

If you include guest VMs individually (without VM categories) in a recovery plan, Leap uses the recovery points of those guest VMs to determine which protection policies created those recovery points, and then synchronizes the recovery plans to the AZs (AZs) specified in those protection policies. If you create a recovery plan for VM categories or guest VMs that are not associated with a protection policy, Leap cannot determine the AZ list and therefore cannot synchronize the recovery plan. If a recovery plan includes only individually added guest VMs and a protection policy associated with a guest VM has not yet created guest VM recovery points, Leap cannot synchronize the recovery plan to the AZ specified in that protection policy. However, recovery plans are monitored every 15 minutes for the availability of recovery points that can help derive AZ information. When recovery points become available, the paired on-prem AZ derives the AZ by the process described earlier and synchronizes the recovery plan to the AZ.

VM Categories used in Protection Policies and Recovery Plans
A VM category is synchronized when you specify the VM category in a protection policy or recovery plan.
Issues such as a loss of network connectivity between paired AZs or user actions such as unpairing of AZs followed by repairing of those availability zones can affect VM synchronization.
Tip: Nutanix recommends to unprotect all the VMs on the AZ before unpairing it to avoid getting into a state where the entities have stale configurations after repairing of AZs.

If you update guest VMs in either or both AZs before such issues are resolved or before unpaired AZs are paired again, VM synchronization is not possible. Also, during VM synchronization, if a guest VM cannot be synchronized because of an update failure or conflict (for example, you updated the same VM in both AZs during a network connectivity issue), no further VMs are synchronized. Entity synchronization can resume only after you resolve the error or conflict. To resolve a conflict, use the Entity Sync option, which is available in the web console. Force synchronization from the AZ that has the desired configuration. Forced synchronization overwrites conflicting configurations in the paired AZ.
Note: Forced synchronization cannot resolve errors arising from conflicting values in guest VM specifications (for example, the paired AZ already has a VM with the same name).

If you do not update entities before a connectivity issue is resolved or before you pair the AZs again, the synchronization behavior described earlier resumes. Also, pairing previously unpaired AZs trigger an automatic synchronization event. For recommendations to avoid facing such issues, see Entity Synchronization Recommendations (Leap).

Entity Synchronization Recommendations (Leap)

Consider the following recommendations to avoid inconsistencies and the resulting synchronization issues.

  • During network connectivity issues, do not update entities at both the availability zones (AZs) in a pair. You can safely make updates at any one AZ. After the connectivity issue is resolved, force synchronization from the AZ in which you made updates. Failure to adhere to this recommendation results in synchronization failures.

    You can safely create entities at either or both the AZs as long as you do not assign the same name to entities at the two AZs. After the connectivity issue is resolved, force synchronization from the AZ where you created entities.

  • If one of the AZs becomes unavailable, or if any service in the paired AZ is down perform force synchronization from the paired AZ after the issue is resolved.

Forcing Entity Synchronization (Leap)

Entity synchronization, when forced from an AZ (AZ), overwrites the corresponding entities in paired AZs. Forced synchronization also creates, updates, and removes those entities from paired AZs.

About this task

The AZ (AZ) to which a particular entity is forcefully synchronized depends on which AZ requires the entity (see Entity Synchronization Between Paired AZs). To avoid inadvertently overwriting required entities, ensure to force VM synchronization from the AZ in which the entities have the desired configuration.

If a AZ is paired with two or more AZs (AZs), you cannot select one or more AZs with which to synchronize entities.

To force entity synchronization, do the following.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the settings button (gear icon) at the top-right corner of the window.
  3. Click Entity Sync in the left pane.
  4. In the Entity Sync dialog box, review the message at the top of the dialog box, and then do the following.
    1. To review the list of entities that will be synchronized to an AZ , click the number of ENTITIES adjacent to an availability zone.
    2. After you review the list of entities, click Back .
  5. Click Sync Entities .

Protection and DR between On-Prem AZ and Xi Cloud Service (Xi Leap)

Xi Leap is essentially an extension of Leap to Xi Cloud Services. Xi Leap protects your guest VMs and orchestrates their disaster recovery (DR) to Xi Cloud Services when events causing service disruption occur at the primary AZ (AZ). For protection of your guest VMs, protection policies with Asynchronous and NearSync replication schedules generate and replicate recovery points to Xi Cloud Services. Recovery plans orchestrate DR from the replicated recovery points to Xi Cloud Services.

Protection policies create a recovery point—and set its expiry time—in every iteration of the specified time period (RPO). For example, the policy creates a recovery point every 1 hour for an RPO schedule of 1 hour. The recovery point expires at its designated expiry time based on the retention policy—see step 3 in Creating a Protection Policy with Asynchronous Replication Schedule (Xi Leap). If there is a prolonged outage at a AZ, the Nutanix cluster retains the last recovery point to ensure you do not lose all the recovery points. For NearSync replication (lightweight snapshot), the Nutanix cluster retains the last full hourly snapshot. During the outage, the Nutanix cluster does not clean up the recovery points due to expiry. When the Nutanix cluster comes online, it cleans up the recovery points that are past expiry immediately.

If a guest VM is removed from a protection policy, Delete all the recovery points associated with the guest VM. If the recovery points are not deleted explicitly, the recovery points adhere to the expiration period set in the protection policy and will continue to incur charges until the expiry. To stop the charges immediately, log on to Xi Cloud Services and delete all of these explicitly.

For High Availability of a guest VM, Leap can enable replication of recovery points to one or more AZs. A protection policy can replicate recovery points to maximum two AZs. One of the two AZs can be in cloud (Xi Cloud Services). For replication to Xi Cloud Services, you must add a replication schedule between the on-prem AZ and Xi Cloud Services. You can set up the on-prem AZ and Xi Cloud Services in the following arrangements.

Figure. The Primary Nutanix Cluster at on-prem AZ and recovery Xi Cloud Services
Click to enlarge Disaster recovery to an on-prem AZ and Xi Cloud

Figure. The Primary Xi Cloud Services and recovery Nutanix Cluster at on-prem AZ
Click to enlarge Disaster recovery to an on-prem AZ and Xi Cloud

The replication schedule between an on-prem AZ and Xi Cloud Services enables DR to Xi Cloud Services. To enable performing DR to Xi Cloud Services, you must create a recovery plan. In addition to performing DR from AHV clusters to Xi Cloud Services (only AHV), you can also perform cross-hypervisor disaster recovery (CHDR)—DR from ESXi clusters to Xi Cloud Services.

The protection policies and recovery plans you create or update synchronize continuously between the on-prem AZ and Xi Cloud Services. The reverse synchronization enables you to create or update entities (protection policies, recovery plans, and guest VMs) at either the primary or the recovery AZ.

This section describes protection of your guest VMs and DR from Xi Cloud Services to a Nutanix cluster at the on-prem AZ. In Xi Cloud Services, you can protect your guest VMs and DR to a Nutanix cluster at only one on-prem AZ. For information about protection of your guest VMs and DR to Xi Cloud Services, see Protection and DR between On-Prem AZs (Nutanix Disaster Recovery).

Xi Leap Requirements

The following are the general requirements of Xi Leap. Along with the general requirements, there are specific requirements for protection with the following supported replication schedules.

  • For information about the on-prem node, disk and Foundation configurations required to support Asynchronous and NearSync replication schedules, see On-Prem Hardware Resource Requirements.
  • For specific requirements of protection with Asynchronous replication schedule (1 hour or greater RPO), see Asynchronous Replication Requirements (Xi Leap).
  • For specific requirements of protection with NearSync replication schedule (1–15 minutes RPO), see NearSync Replication Requirements (Xi Leap).

License Requirements

The AOS license required depends on the features that you want to use. For information about the features that are available with an AOS license, see Software Options.

Hypervisor Requirements

The underlying hypervisors required differ in all the supported replication schedules. For more information about underlying hypervisor requirements for the supported replication schedules, see:

  • Asynchronous Replication Requirements (Xi Leap)
  • NearSync Replication Requirements (Xi Leap)

Nutanix Software Requirements

  • Each on-prem AZ (AZ) must have a Leap enabled Prism Central instance. To enable Leap in Prism Central, see Enabling Nutanix Disaster Recovery for On-Prem AZ.
    Note: If you are using ESXi, register at least one vCenter Server to Prism Central. You can also register two vCenter Servers, each to a Prism Central at different AZs. If you register both the Prism Central to the single vCenter Server, ensure that each ESXi cluster is part of different datacenter object in vCenter.

  • The on-prem Prism Central and its registered Nutanix clusters (Prism Element) must be running on the supported AOS versions. For more information about the required versions for the supported replication schedules, see:
    • Asynchronous Replication Requirements (Xi Leap)
    • NearSync Replication Requirements (Xi Leap)
    Tip:

    Nutanix supports replications between the all the latest supported LTS and STS released AOS versions. To check the list of the latest supported AOS versions, see KB-5505. To determine if the AOS versions currently running on your clusters are EOL, see the EOL document .

    Upgrade the AOS version to the next available supported LTS/STS release. To determine if an upgrade path is supported, check the Upgrade Paths page before you upgrade the AOS.

    Note: If both clusters have different AOS versions that are EOL, upgrade the cluster with lower AOS version to match the cluster with higher AOS version and then perform the upgrade to the next supported LTS version.

    For example, the clusters are running AOS versions 5.5.x and 5.10.x respectively. Upgrade the cluster on 5.5.x to 5.10.x. After both the clusters are on 5.10.x, proceed to upgrade each cluster to 5.15.x (supported LTS). Once both clusters are on 5.15.x you can upgrade the clusters to 5.20.x or newer.

    Nutanix recommends that both the primary and the replication clusters or AZs run the same AOS version.

User Requirements

You must have one of the following roles in Xi Cloud Services.

  • User admin
  • Prism Central admin
  • Prism Self Service admin
  • Xi admin

Firewall Port Requirements

To allow two-way replication between an on-prem Nutanix cluster and and Xi Cloud Services, you must enable certain ports in your external firewall. To know about the required ports, see Disaster Recovery - Leap in Port Reference.

Networking Requirements

Requirements for static IP address preservation after failover
You can preserve one IP address of a guest VM (with static IP address) for its failover (DR) to an IPAM network. After the failover, the other IP addresses of the guest VM have to be reconfigured manually. To preserve an IP address of a guest VM (with static IP address), ensure that:
Caution: By default, you cannot preserve statically assigned DNS IP addresses after failover (DR) of guest VMs. However, you can create custom in-guest scripts to preserve the statically assigned DNS IP addresses. For more information, see Creating a Recovery Plan (Xi Leap).
  • Both the primary and the recovery Nutanix clusters run AOS 5.11 or newer.
  • The protected guest VMs have Nutanix Guest Tools (NGT) version 1.5 or newer installed.

    For information about installing NGT, see Nutanix Guest Tools in Prism Web Console Guide .

  • The protected guest VMs have at least one empty CD-ROM slot.

    The empty CD-ROM is required for mounting NGT at the recovery AZ.

  • The protected guest VMs have NetworkManager command-line tool (nmcli) version 0.9.10.0 or newer installed.
    Also, the NetworkManager must manage the networks on Linux VMs. To enable NetworkManager on a Linux VM, in the interface configuration file, set the value of the NM_CONTROLLED field to yes . After setting the field, restart the network service on the VM.
    Tip: In CentOS, the interface configuration file is /etc/sysconfig/network-scripts/ifcfg-eth0 .
Requirements for static IP address mapping of guest VMs between source and target virtual networks
You can explicitly define IP addresses for protected guest VMs that have static IP addresses at the primary AZ. On recovery, such guest VMs retain the explicitly defined IP address. To map static IP addresses of guest VMs between source and target virtual networks, ensure that:
  • Both the primary and the recovery Nutanix clusters run AOS 5.17 or newer.
  • The protected guest VMs have static IP addresses at the primary AZ.
  • The protected guest VMs have Nutanix Guest Tools (NGT) version 1.5 or newer installed.

    For information about installing NGT, see Nutanix Guest Tools in Prism Web Console Guide .

  • The protected guest VMs have at least one empty CD-ROM slot.

    The empty CD-ROM is required for mounting NGT at the recovery AZ.

  • The protected guest VMs can reach the Controller VM from both the AZs.
  • The recovery plan selected for failover has VM-level IP address mapping configured.
Virtual Network Design Requirements
You can design the virtual subnets that you plan to use for DR to the recovery AZ so that they can accommodate the guest VMs running in the source virtual network.
  • To use a virtual network as a recovery virtual network, ensure that the virtual network meets the following requirements.
    • The network prefix is the same as the network prefix of the source virtual network. For example, if the source network address is 192.0.2.0/24, the network prefix of the recovery virtual network must also be 24.
    • The gateway IP address is the same as the gateway IP address in the source network. For example, if the gateway IP address in the source virtual network 192.0.2.0/24 is 192.0.2.10, the last octet of the gateway IP address in the recovery virtual network must also be 10.
  • To use a single Nutanix cluster as a target for DR from multiple primary Nutanix clusters, ensure that the number of virtual networks on the recovery cluster is equal to the sum of the number of virtual networks on the individual primary Nutanix clusters. For example, if there are two primary Nutanix clusters, with one cluster having m networks and the other cluster having n networks, ensure that the recovery cluster has m + n networks. Such a design ensures that all recovered VMs attach to a network.
  • After the recovery of guest VMs to Xi Cloud Services, ensure that the router in your primary AZ stops advertising the subnet that hosted the guest VMs.
  • The protected guest VMs and Prism Central VM must be on different networks.

    If protected guest VMs and Prism Central VM are on the same network, the Prism Central VM becomes inaccessible when the route to the network is removed after failover.

  • Xi Cloud Services supports the following third-party VPN gateway solutions.
    • CheckPoint
    • Cisco ASA
    • PaloAlto
      Note: If you are using the Palo Alto VPN gateway solution, set the MTU value to 1356 in the Tunnel Interface settings. The replication fails for the default MTU value ( 1427 ).

    • Juniper SRX
    • Fortinet
    • SonicWall
    • VyOS

Additional Requirements

  • Both the primary and recovery Nutanix clusters must have an external IP address.
  • Both the primary and recovery Prism Centrals and Nutanix clusters must have a data services IP address.
  • The Nutanix cluster that hosts the Prism Centrals must meet the following requirements.
    • The Nutanix cluster must be registered to the Prism Central instance.
    • The Nutanix cluster must have an iSCSI data services IP address configured on it.
    • The Nutanix cluster must also have sufficient memory to support a hot add of memory to all Prism Central nodes when you enable Leap. A small Prism Central instance (4 vCPUs, 16 GB memory) requires a hot add of 4 GB, and a large Prism Central instance (8 vCPUs, 32 GB memory) requires a hot add of 8 GB. If you enable Nutanix Flow, each Prism Central instance requires an extra hot-add of 1 GB.
  • Each node in a scaled-out Prism Central instance must have a minimum of 4 vCPUs and 16 GB memory.

    For more information about the scaled-out deployments of a Prism Central, see Nutanix Disaster Recovery Terminology.

  • The protected guest VMs must have Nutanix VM mobility drivers installed.

    Nutanix VM mobility drivers are required for accessing the guest VMs after failover. Without Nutanix VM mobility drivers, the guest VMs become inaccessible after a failover.

Xi Leap Limitations

Consider the following general limitations before configuring protection and disaster recovery (DR) with Xi Leap. Along with the general limitations, there are specific limitations of protection with the following supported replication schedules.

  • For specific limitations of protection with Asynchronous replication schedule (1 hour or greater RPO), see Asynchronous Replication Limitations (Xi Leap).
  • For specific limitations of protection with NearSync replication schedule (1–15 minutes RPO), see NearSync Replication Limitations (Xi Leap).

Virtual Machine Limitations

  • You cannot start or replicate the following guest VMs at Xi Cloud Services.

    • VMs configured with a GPU resource.
    • VMs configured with four or more vNUMA sockets.
    • VMs configured with more than 24 vCPUs.
    • VMs configured with more than 128 GB memory.
  • You cannot deploy witness VMs.
  • You cannot protect multiple guest VMs that use disk sharing (for example, multi-writer sharing, Microsoft Failover Clusters, Oracle RAC).

  • You cannot protect VMware fault tolerance enabled guest VMs.

  • You cannot recover vGPU console enabled guest VMs efficiently.

    When you perform DR of vGPU console-enabled guest VMs, the VMs recover with the default VGA console (without any alert) instead of vGPU console. The guest VMs fail to recover when you perform cross-hypervisor disaster recovery (CHDR).

  • You cannot recover guest VMs with vGPU.

    However, you can manually restore guest VMs with vGPU.

  • You cannot configure NICs for a guest VM across both the virtual private clouds (VPC).

    You can configure NICs for a guest VM associated with either production or test VPC.

Volume Groups Limitation

You cannot protect volume groups.

Network Segmentation Limitation

You cannot apply network segmentation for management traffic (any traffic not on the backplane network) in Xi Leap.

You get an error when you try to enable network segmentation for management traffic on a Leap enabled Nutanix Cluster or enable Leap in a network segmentation enabled Nutanix cluster. For more information about network segmentation, see Securing Traffic Through Network Segmentation in the Security Guide .
Note: However, you can apply network segmentation for backplane traffic at the primary and recovery clusters. Nutanix does not recommend this because when you perform a planned failover of guest VMs having network segmentation for backplane enabled, the guest VMs fail to recover and the guest VMs at the primary AZ are removed.

Virtual Network Limitations

Although there is no limit to the number of VLANs that you can create, only the first 500 VLANs list in drop-down of Network Settings while creating a recovery plan. For more information about VLANs in the recovery plan, see Nutanix Virtual Networks.

Xi Leap Configuration Maximums

For the maximum number of entities you can configure with different replication schedules and perform failover (disaster recovery), see Nutanix Configuration Maximums. The limits have been tested for Xi Leap production deployments. Nutanix does not guarantee the system to be able to operate beyond these limits.

Tip: Upgrade your NCC version to 3.10.1 to get configuration alerts.

Xi Leap Recommendations

Nutanix recommends the following best practices for configuring protection and disaster recovery (DR) with Xi Leap.

General Recommendations

  • Create all entities (protection policies, recovery plans, and VM categories) at the primary AZ (AZ).
  • Upgrade Prism Central before upgrading the Nutanix clusters (Prism Elements) registered to it.

Recommendation for Migrating Protection Domains to Protection Policies

You can protect a guest VM either with legacy DR solution (protection domain-based) or with Leap. To protect a legacy DR-protected guest VM with Leap, you must migrate the guest VM from protection domain to a protection policy. During the migration, do not delete the guest VM snapshots in the protection domain. Nutanix recommends keeping the guest VM snapshots in the protection domain until the first recovery point for the guest VM is available on Prism Central. For more information, see Migrating Guest VMs from a Protection Domain to a Protection Policy.

Recommendation for Virtual Networks

  • Map the networks while creating a recovery plan in Prism Central.
  • Recovery plans do not support overlapping subnets in a network-mapping configuration. Do not create virtual networks that have the same name or overlapping IP address ranges.

Xi Leap Service-Level Agreements (SLAs)

Xi Leap is essentially an extension of Leap to Xi Cloud Services. Xi Leap enables protection of your guest VMs and disaster recovery (DR) to Xi Cloud Services. With reverse synchronization, Xi Leap can protect you guest VMs and enable DR from Xi Cloud Services to a Nutanix cluster at an on-prem AZ (AZ). A Nutanix cluster is essentially an AHV or an ESXi cluster running AOS. In addition to performing DR from AHV clusters to Xi Cloud Services (only AHV), you can also perform cross-hypervisor disaster recovery (CHDR)—DR from ESXi clusters to Xi Cloud Services.

You can protect your guest VMs with the following replication schedules.

  • Asynchronous (1 hour or greater RPO). For information about protection with Asynchronous replication in Xi Leap, see Protection with Asynchronous Replication and DR (Xi Leap).
  • NearSync (1–15 minute RPO). For information about protection with NearSync replication in Xi Leap, see Protection with NearSync Replication and DR (Xi Leap).

Xi Leap Views

The disaster recovery views enable you to perform CRUD options on the following types of Leap VMs.

  • Configured entities (for example, AZs, protection policies, and recovery plans)
  • Created entities (for example, VMs, and recovery points)

Some views available in the Xi Cloud Services differ from the corresponding view in on-prem Prism Central. For example, the option to connect to an AZ is on the AZs page in an on-prem Prism Central, but not on the AZs page in Xi Cloud Services. However, the views of both user interfaces are largely the same. This chapter describes the views of Xi Cloud Services.

AZs View in Xi Cloud Services

The AZs view lists all of your paired AZs.

The following figure is a sample view, and the tables describe the fields and the actions that you can perform in this view.

Figure. AZs View Click to enlarge AZs View

Table 1. AZs View Fields
Field Description
Name Name of the AZ.
Region Region to which the AZ belongs.
Type Type of AZ. AZs in Xi Cloud Services are shown as being of type Xi. AZs that are backed by on-prem Prism Central instances are shown to be of type physical. The AZ that you are logged in to is shown as a local AZ.
Connectivity Status Status of connectivity between the local AZ and the paired AZ.
Table 2. Workflows Available in the AZs View
Workflow Description
Connect to AZ (on-prem Prism Central only) Connect to an on-prem Prism Central or to a Xi Cloud Services for data replication.
Table 3. Actions Available in the Actions Menu
Action Description
Disconnect Disconnect the remote AZ. When you disconnect an availability zone, the pairing is removed.

Protection Policies View in Xi Cloud Services

The Protection Policies view lists all of configured protection policies from all AZs.

The following figure is a sample view, and the tables describe the fields and the actions that you can perform in this view.

Figure. Protection Policies View Click to enlarge Protection Policies View

Table 1. Protection Policies View Fields
Field Description
Name Name of the protection policy.
Primary Location Replication source AZ for the protection policy.
Recovery Location Replication target AZ for the protection policy.
RPO Recovery point objective for the protection policy.
Remote Retention Number of retention points at the remote AZ.
Local Retention Number of retention points at the local AZ.
Table 2. Workflows Available in the Protection Policies View
Workflow Description
Create protection policy Create a protection policy.
Table 3. Actions Available in the Actions Menu
Action Description
Update Update the protection policy.
Clone Clone the protection policy.
Delete Delete the protection policy.

Recovery Plans View in Xi Cloud Services

The Recovery Plans view lists all of configured recovery plans from all AZs.

The following figure is a sample view, and the tables describe the fields and the actions that you can perform in this view.

Figure. Recovery Plans View Click to enlarge Recovery Plans View

Table 1. Recovery Plans View Fields
Field Description
Name Name of the recovery plan.
Source Replication source AZ for the recovery plan.
Destination Replication target AZ for the recovery plan.
Entities Sum of the following VMs:
  • Number of local, live VMs that are specified in the recovery plan.
  • Number of remote VMs that the recovery plan can recover at this AZ.
Last Validation Status Status of the most recent validation of the recovery plan.
Last Test Status Status of the most recent test performed on the recovery plan.
Table 2. Workflows Available in the Recovery Plans View
Workflow Description
Create Recovery Plan Create a recovery plan.
Table 3. Actions Available in the Actions Menu
Action Description
Validate Validates the recovery plan to ensure that the VMs in the recovery plan have a valid configuration and can be recovered.
Test Test the recovery plan.
Update Update the recovery plan.
Failover Perform a failover.
Delete Delete the recovery plan.

Dashboard Widgets in Xi Cloud Services

The Xi Cloud Services dashboard includes widgets that display the statuses of configured protection policies and recovery plans. If you have not configured these VMs, the widgets display a summary of the steps required to get started with Leap.

To view these widgets, click the Dashboard tab.

The following figure is a sample view of the dashboard widgets.

Figure. Dashboard Widgets for Xi Leap Click to enlarge Dashboard Widgets

Enabling Leap for On-Prem AZ

To perform disaster recovery (DR) from Xi Cloud Services to a Nutanix cluster at an on-prem AZ (AZ), enable Leap at the on-prem AZ (Prism Central) only. You need not enable Leap in the Xi Cloud Services portal; Xi Cloud Services does that by default for you. Without enabling Leap, you can configure protection policies and recovery plans that synchronize to the on-prem AZ but you cannot perform failover and failback operations.

To enable Leap at the on-prem AZ, see Enabling Nutanix Disaster Recovery for On-Prem AZ.

Xi Leap Environment Setup

You can set up a secure environment to enable replication between an on-prem AZ and Xi Cloud Services with virtual private network (VPN). To configure the required environment, perform the following steps.

  1. Pair your on-prem AZ (AZ) with Xi Cloud Services. For more information about pairing, see Pairing AZs (Xi Leap).
  2. Set up an on-prem VPN solution.
  3. Enable VPN on the production virtual private cloud by using the Xi Cloud Services portal.
  4. Set up a VPN client as a VM in Xi Cloud Services to enable connectivity to the applications that have failed over to the Xi Cloud Services.
  5. Configure policy-based routing (PBR) rules for the VPN to successfully work with the Xi Cloud Services. If you have a firewall in the Xi Cloud Services and a floating IP address is assigned to the firewall, create a PBR policy in the Xi Cloud Services to configure the firewall as the gateway to the Internet. For example, specify 10.0.0.2/32 (private IP address of the firewall) in the Subnet IP . For more information, see Policy Configuration in Xi Infrastructure Service Administration Guide .
  6. Configure the custom DNS in your virtual private cloud in the Xi Cloud Services. For more information, see Virtual Private Cloud Management in Xi Infrastructure Service Administration Guide .
Note: For more information about Xi Cloud Services, see Xi Infrastructure Service Administration Guide.

Pairing AZs (Xi Leap)

To perform disaster recovery (DR) from Xi Cloud Services to a Nutanix cluster at an on-prem AZ (AZ), pair the on-prem AZ (Prism Central) only to Xi Cloud Services. For reverse synchronization, you need not pair again from Xi Cloud Services portal; Xi Cloud Services captures the paring configuration from the on-prem AZ that pairs Xi Cloud Services.

To pair an on-prem AZ with Xi Cloud Services, see Pairing AZs (Nutanix Disaster Recovery).

VPN Configuration (On-prem and Xi Cloud Services)

Xi Cloud Services enables you to set up a secure VPN connection between your on-prem AZs and Xi Cloud Services to enable end-to-end disaster recovery services of Leap. A VPN solution between your on-prem AZ and Xi Cloud Services enables secure communication between your on-prem Prism Central instance and the production virtual private cloud (VPC) in Xi Cloud Services. If your workload fails over to Xi Cloud Services, the communication between the on-prem resources and failed over resources in Xi Cloud Services takes place over an IPSec tunnel established by the VPN solution.

Note: Set up the VPN connection before data replication begins.

You can connect multiple on-prem AZs to Xi Cloud Services. If you have multiple remote AZs, you can set up secure VPN connectivity between each of your remote AZs and Xi Cloud Services. With this configuration, you do not need to force the traffic from your remote AZ through your main AZ to Xi Cloud Services.

A VPN solution to connect to Xi Cloud Services includes a VPN gateway appliance in the Xi Cloud and a VPN gateway appliance (remote peer VPN appliance) in your on-prem AZ. A VPN gateway appliance learns about the local routes, establishes an IPSec tunnel with its remote peer, exchanges routes with its peer, and directs network traffic through the VPN tunnel.

After you complete the VPN configuration in the Xi Cloud Services portal, Nutanix creates a virtual VPN gateway appliance in the Xi Cloud. To set up a remote peer VPN gateway appliance in your on-prem AZ, you can either use the On Prem - Nutanix VPN solution (provided by Nutanix) or use a third-party VPN solution:

  • On Prem - Nutanix (recommended): If you select this option, Nutanix creates a VPN gateway VM (remote peer appliance) on your on-prem cluster, connects the appliance to your network, and establishes an IPsec tunnel with the VPN gateway that is running in the Xi Cloud.

    The Nutanix VPN controller runs as a service in the Xi Cloud and on the on-prem Nutanix cluster and is responsible for the creation, setup, and lifecycle maintenance of the VPN gateway appliance (in the Xi Cloud and on-prem). The VPN controller deploys the virtual VPN gateway appliance in the Xi Cloud after you complete the VPN configuration in the Xi Cloud Services portal. The on-prem VPN controller deploys the virtual VPN gateway appliance on the on-prem cluster in the subnet you specify when you configure a VPN gateway in the Xi Cloud Services portal.

    The virtual VPN gateway appliance in the Xi Cloud and VPN gateway VM (peer appliance) in your on-prem cluster each consume 1 physical core, 4 GB RAM, and 10 GB storage.

  • On Prem - Third Party : If you select this option, you must manually set up a VPN solution as an on-prem VPN gateway (peer appliance) that can establish an IPsec tunnel with the VPN gateway VM in the Xi Cloud. The on-prem VPN gateway (peer appliance) can be a virtual or hardware appliance. See On-Prem - Third-Party VPN Solution for a list of supported third-party VPN solutions.

VPN Configuration Entities

To set up a secure VPN connection between your on-prem AZs and Xi Cloud Services, configure the following entities in the Xi Cloud Services portal:

  • VPN Gateway : Represents the gateway of your VPN appliances.

    VPN gateways are of the following types:

    • Xi Gateway : Represents the Xi VPN gateway appliance
    • On Prem - Nutanix Gateway : Represents the VPN gateway appliance at your on-prem AZ if you are using the on-prem Nutanix VPN solution.
    • On Prem - Third Party Gateway : Represents the VPN gateway appliance at your on-prem AZ if you are using your own VPN solution (provided by a third-party vendor).
  • VPN Connection : Represents the VPN IPSec tunnel established between a VPN gateway in the Xi Cloud and VPN gateway in your on-prem AZ. When you create a VPN connection, you select a Xi gateway and on-prem gateway between which you want to create the VPN connection.

You configure a VPN gateway in the Xi Cloud and at each of the on-prem AZs you want to connect to the Xi Cloud. You then configure a VPN connection between a VPN gateway in the Xi Cloud and VPN gateway in your on-prem AZ.

Single-AZ Connection

If you want to connect only one on-prem AZ to the Xi Cloud, configure the following entities in the Xi Cloud Services portal:

  1. One Xi gateway to represent the Xi VPN gateway appliance
  2. One on-prem gateway (On-prem - Nutanix Gateway or on-prem - third-party Gateway) to represent the VPN gateway appliance at your on-prem AZ
  3. One VPN connection to connect the two VPN gateways
Figure. Single-AZ Connection Click to enlarge

Multi-AZ Connection

If you want to connect multiple on-prem AZs to the Xi Cloud, configure the following entities in the Xi Cloud Services portal:

  1. One Xi gateway to represent the Xi VPN gateway appliance
  2. On-prem gateways (On-prem - Nutanix Gateway or on-prem - third-party Gateway) for each on-prem AZ
  3. VPN connections to connect the Xi gateway and the on-prem gateway at each on-prem AZ

For example, if you want to connect two on-prem AZs to the Xi Cloud, configure the following:

  1. One Xi gateway
  2. Two on-prem gateways for the two on-prem AZs
  3. Two VPN connections
Figure. Multi-AZ Connection for less the 1 Gbps Bandwidth Click to enlarge

One Xi VPN gateway provides 1 Gbps of aggregate bandwidth for IPSec traffic. Therefore, connect only as many on-prem VPN gateways to one Xi VPN gateway to accommodate 1 Gbps of aggregate bandwidth.

If you require an aggregate bandwidth of more than 1 Gbps, configure multiple Xi VPN gateways.

Figure. Multi-AZ Connection for more the 1 Gbps Bandwidth Click to enlarge

On-Prem - Nutanix VPN Solution

You can use the on-prem - Nutanix VPN solution to set up VPN between your on-prem AZ and Xi Cloud Services. If you select this option, you are using an end-to-end VPN solution provided by Nutanix and you do not need to use your own VPN solution to connect to Xi Cloud Services.

After you complete the VPN configuration in the Xi Cloud Services portal, Nutanix creates a virtual VPN gateway appliance in the Xi Cloud. The On Prem - Nutanix VPN solution creates a VPN gateway VM (remote peer appliance) on your on-prem cluster, connects the appliance to your network, and establishes an IPsec tunnel with the VPN gateway VM that is running in the Xi Cloud.

Following is the workflow if you choose the On Prem - Nutanix VPN solution to set up a VPN connection between your on-prem AZ and Xi Cloud Services.

  1. Create one or more Xi VPN gateways.
  2. The VPN controller running in Xi Cloud Services creates a VPN gateway VM in the Xi Cloud. The Xi VPN gateway VM runs in your (tenant) overlay network.
  3. Create one or more on-prem VPN gateways.

    Create a VPN gateway for each on-prem AZ that you want to connect to the Xi Cloud.

  4. Create one or more VPN connections.

    Create a VPN connection between each on-prem AZ (on-prem VPN gateway) and Xi Cloud (Xi gateway).

  5. The VPN controller creates a VPN gateway VM on the on-prem cluster in the subnet you specify when you create an on-prem VPN gateway. The VPN gateway VM becomes the peer appliance to the VPN gateway VM in the Xi Cloud.
  6. Both the VPN appliances are now configured, and the appliances now proceed to perform the following:
    1. An on-prem router communicates the on-prem routes to the on-prem VPN gateway by using iBGP or OSPF.
    2. The Xi VPN controller communicates the Xi subnets to the Xi VPN gateway VM.
    3. The on-prem VPN gateway VM then establishes a VPN IPsec tunnel with the Xi VPN gateway VM. Both appliances establish an eBGP peering session over the IPsec tunnel and exchange routes.
    4. The on-prem VPN gateway VM publishes the Xi subnet routes to the on-prem router by using iBGP or OSPF.
Nutanix VPN Solution Requirements

In your on-prem AZ, ensure the following before you configure VPN on Xi Cloud Services:

  1. The Prism Central instance and cluster are running AOS 5.11 or newer for AHV and AOS 5.19 or newer for ESXi.

  2. A router with iBGP, OSPF, or Static support to communicate the on-prem routes to the on-prem VPN gateway VM.
  3. Depending on whether you are using iBGP or OSPF, ensure that you have one of the following:
    • Peer IP (for iBGP): The IP address of the on-prem router to exchange routes with the VPN gateway VM.
    • Area ID (for OSPF): The OSPF area ID for the VPN gateway in the IP address format.
  4. Determine the following details for the deployment of the on-prem VPN gateway VM.
    • Subnet UUID : The UUID of the subnet of the on-prem cluster in which you want to install the on-prem VPN gateway VM. Log on to your on-prem Prism Central web console to determine the UUID of the subnet.
    • Public IP address of the VPN Gateway Device : A public WAN IP address that you want the on-prem gateway to use to communicate with the Xi VPN gateway appliance.
    • VPN VM IP Address : A static IP address that you want to allocate to the on-prem VPN gateway VM.
    • IP Prefix Length : The subnet mask in CIDR format of the subnet on which you want to install the on-prem VPN gateway VM.
    • Default Gateway IP : The gateway IP address for the on-prem VPN gateway appliance.
    • On Prem Gateway ASN : ASN must not be the same as any of your on-prem BGP ASNs. If you already have a BGP environment in your on-prem AZ, the customer gateway is the ASN for your organization. If you do not have a BGP environment in your on-prem AZ, you can choose any number. For example, you can choose a number in the 65000 range.
Firewall Port Requirements for On-Prem AZ

Configure rules for ports in your on-prem firewall depending on your deployment scenario.

On-Prem Behind a Network Address Translation or Firewall Device

In this scenario, the IPSec tunnel terminates behind a network address translation (NAT) or firewall device. For NAT to work, open UDP ports 500 and 4500 in both directions. Ports 1024–1034 are ephemeral ports used by the CVMs.

Enable the on-prem VPN gateway to allow the traffic according to the rules described in the Port Rules table.

IPSec Terminates on the Firewall Device

In this scenario, you do not need to open the ports for NAT (500 and 4500), but enable the on-prem VPN gateway to allow the traffic according to the rules described in the Port Rules table.

In the following table, PC subnet refers to the subnet where your on-prem Prism Central is running. The Xi infrastructure load balancer route is where the traffic for Xi CVMs and PC is located. You receive this information when you begin using Xi Cloud Services.

Table 1. Port Rules
Source address Destination address Source port Destination port
PC subnet Load balancer route advertised Any 1024–1034
Xi infrastructure load balancer route PC and CVM subnet Any

2020

2009

9440

The following port requirements are applicable only if you are using the Nutanix VPN solution.
Nutanix VPN VM 8.8.8.8 and 8.8.4.4 IP addresses of the DNS server VPN VM DNS UDP port 53
Nutanix VPN VM time.google.com, 0.pool.ntp.org, 1.pool.ntp.org, 2.pool.ntp.org of the NTP server VPN VM NTP UDP port 123
Nutanix VPN VM ICMP ping to NTP servers NA NA
CVM IP address in AHV clusters HTTPS request to the Internet AHV hosts HTTPS port 443
CVM IP address in ESXi clusters HTTPS and FTP requests to the Internet ESXi hosts HTTPS port 443 and FTP 21
Creating a Xi VPN Gateway

Create a VPN gateway to represent the Xi VPN gateway appliance.

About this task

Perform the following to create a Xi VPN gateway.

Procedure

  1. In the Xi Cloud Services portal, go to Explore -> Networking -> VPN Gateways .
  2. Click Create VPN Gateway .
    Figure. Create a Xi Gateway Click to enlarge

    The Create VPN Gateway window appears.

  3. Do the following in the indicated fields.
    1. Name: Enter a name for the VPN gateway.
    2. VPC: Select the production VPC.
    3. Type: Select Xi Gateway.
    4. Routing Protocol : Select eBGP or Static to set up a routing protocol between the Xi and on-prem gateways.
    5. ( eBGP only): Select this option if you want to set up the eBGP routing protocol between the Xi and on-prem gateways. Do the following in the indicated fields.
      • In the ASN field, set an ASN for the Xi gateway. Ensure that the Xi gateway ASN is different from that on-prem gateway ASN.
      • In the eBGP Password field, set up a password for the eBGP session that is established between the on-prem VPN gateway and Xi VPN gateway. The eBGP password can be any string, preferably alphanumeric.
    6. ( Static only) If you select this option, manually set up static routes between the Xi and on-prem gateways.

      For more information, see Adding a Static Route in Xi Infrastructure Service Administration Guide .

  4. Click Save .
    The Xi gateway you create is displayed in the VPN Gateways page.
Creating an On-Prem VPN Gateway (Nutanix)

Create a VPN gateway to represent the on-prem VPN gateway appliance.

About this task

Perform the following to create an on-prem VPN gateway.

Procedure

  1. In the Xi Cloud Services portal, go to Explore -> Networking -> VPN Gateways .
  2. Click Create VPN Gateway .
    Figure. Create a on-prem Gateway Click to enlarge

    The Create VPN Gateway window appears.

  3. Do the following in the indicated fields.
    1. Name : Enter a name for the VPN gateway.
    2. Type : Select On Prem - Nutanix .
    3. Automatically add route in PC and PE CVMs to enable replication : Select this option to automatically enable traffic between the on-prem CVMs and CVMs in Xi Cloud Services. If you select this option, a route to the CVMs in Xi Cloud Services is added with the on-prem VPN gateway as the next-hop. Therefore, even if you choose to have static routes between your on-prem router and the on-prem gateway, you do not need to manually add those static routes (see step g).

      A route to Xi CVMs is added with the on-prem VPN gateway as the next-hop.

      Note: This option is only for the CVM-to-CVM (on-prem CVM and Xi Cloud CVMs) traffic.
    4. Under Routing Protocol (between Xi Gateway and On Prem Nutanix Gateway) , do the following to set up the eBGP routing protocol between the Xi and on-prem gateways:
      • In the ASN field, enter the ASN for your on-prem gateway. If you do not have a BGP environment in your on-prem AZ, you can choose any number. For example, you can choose a number in the 65000 range. Ensure that the Xi gateway ASN and on-prem gateway ASN are not the same.
      • In the eBGP Password field, enter the same eBGP password as the Xi gateway.
    5. Subnet UUID : Enter the UUID of the subnet of the on-prem cluster in which you want to install the on-prem VPN gateway VM. Log on to your on-prem Prism Central web console to determine the UUID of the subnet.
    6. Under IP Address Information , do the following in the indicated fields:
      • Public IP Address of the On Premises VPN Gateway Device : Enter a public WAN IP address for the on-prem VPN gateway VM.
      • VPN VM IP Address : Enter a static IP address that you want to allocate to the on-prem VPN gateway VM.
      • IP Prefix Length : Enter the subnet mask of mask length 24 of the subnet on which you want to install the on-prem VPN gateway VM.
      • Default Gateway IP : Enter the gateway IP address of the subnet on which you want to install the on-prem VPN gateway VM.
    7. Under Routing Protocol Configuration , do the following in the indicated fields:
      • In the Routing Protocol drop-down list, select the dynamic routing protocol ( OSPF , iBGP , or Static ) to set up the routing protocol between the on-prem router and on-prem gateway.
      • ( Static only) If you select Static , manually add these routes in Xi Cloud Services.

        For more information, see Adding a Static Route in Xi Infrastructure Service Administration Guide .

        Note: You do not need to add static routes for CVM-to-CVM traffic (see step c).
      • ( OSPF only) If you select OSPF , in the Area ID field, type the OSPF area ID for the VPN gateway in the IP address format. In the Password Type field, select MD5 and type a password for the OSPF session.
      • ( iBGP only) If you select iBGP , in the Peer IP field, type the IP address of the on-prem router to exchange routes with the VPN gateway VM. In the Password field, type the password for the iBGP session.
  4. Click Save .
    The on-prem gateway you create is displayed in the VPN Gateways page.
Creating a VPN Connection

Create a VPN connection to establish a VPN IPSec tunnel between a VPN gateway in the Xi Cloud and VPN gateway in your on-prem AZ. Select the Xi gateway and on-prem gateway between whom you want to create the VPN connection.

About this task

Perform the following to create a VPN connection.

Procedure

  1. In the Xi Cloud Services portal, go to Explore -> Networking -> VPN Connections .
  2. Click Create VPN Connection .
    Figure. Create a VPN Connection Click to enlarge

    The Create VPN Connection window appears.

  3. Do the following in the indicated fields:
    1. Name : Enter a name for the VPN connection.
    2. Description : Enter a description for the VPN connection.
    3. IPSec Secret . Enter an alphanumeric string as the IPSec string for the VPN connection.
    4. Xi Gateway : Select the Xi gateway for which you want to create this VPN connection.
    5. On Premises Gateway : Select the on-prem gateway for which you want to create this VPN connection.
    6. Dynamic Route Priority : This is not a mandatory field. Set this field if you have multiple routes to the same destination. For example, consider you have VPN connection 1 and VPN connection 2 and you want VPN connection 1 to take precedence over VPN connection 2, set the priority for VPN connection 1 higher than VPN connection 2. Higher the priority number, higher is the precedence of that connection. You can set a priority number from 10 through 1000.
      See the Routes Precedence section in Routes Management in Xi Infrastructure Service Administration Guide for more information.
  4. Click Save .
    The VPN connection you create is displayed in the VPN Connections page.

On-Prem - Third-Party VPN Solution

You can use your own VPN solution to connect your on-prem AZ to Xi Cloud Services. If you select this option, you must manually set up a VPN solution by using a supported third-party VPN solution as an on-prem VPN gateway (peer appliance) that can establish an IPsec tunnel with the VPN gateway VM in the Xi Cloud.

Following is the workflow if you want to use a third-party VPN solution to set up a VPN connection between your on-prem AZ and Xi Cloud Services.

  1. Create one or more Xi VPN gateways.
  2. The VPN controller running in Xi Cloud Services creates a VPN gateway VM in the Xi Cloud. The Xi VPN gateway VM runs in your (tenant) overlay network.
  3. Create one or more on-prem VPN gateways.

    Create a VPN gateway for each on-prem AZ that you want to connect to the Xi Cloud.

  4. Create one or more VPN connections.

    Create a VPN connection to create an IPSec tunnel between each on-prem AZ (on-prem VPN gateway) and Xi Cloud (Xi gateway).

  5. Configure a peer VPN gateway appliance (hardware or virtual) in your on-prem AZ. Depending upon your VPN solution, you can download detailed instructions about how to configure your on-prem VPN gateway appliance. For more information, see Downloading the On-Prem VPN Appliance Configuration.

    Xi Cloud Services supports the following third-party VPN gateway solutions.

    • CheckPoint
    • Cisco ASA
    • PaloAlto
      Note: If you are using the Palo Alto VPN gateway solution, set the MTU value to 1356 in the Tunnel Interface settings. The replication fails for the default MTU value (1427).
    • Juniper SRX
    • Fortinet
    • SonicWall
    • VyOS
Third-Party VPN Solution Requirements

Ensure the following in your on-prem AZ before you configure VPN in Xi Cloud Services.

  1. A third-party VPN solution in your on-prem AZ that functions as an on-prem VPN gateway (peer appliance).
  2. The on-prem VPN gateway appliance supports the following.
    • IPSec IKEv2
    • Tunnel interfaces
    • External Border Gateway Protocol (eBGP)
  3. Note the following details of the on-prem VPN gateway appliance.
    • On Prem Gateway ASN : Assign an ASN for your on-prem gateway. If you already have a BGP environment in your on-prem AZ, the customer gateway is the ASN for your organization. If you do not have a BGP environment in your on-prem AZ, you can choose any number. For example, you can choose a number in the 65000 range.
    • Xi Gateway ASN : Assign an ASN for the Xi gateway. The Xi gateway ASN must not be the same as the on-prem gateway ASN.
    • eBGP Password : The eBGP password is the shared password between the Xi gateway and on-prem gateway. Set the same password for both the gateways.
    • Public IP address of the VPN Gateway Device : Ensure that the public IP address of the on-prem VPN gateway appliance can reach the public IP address of Xi Cloud Services.
  4. The on-prem VPN gateway appliance can route the traffic from the on-prem CVM subnets to the established VPN tunnel.
  5. Ensure that the following ports are open in your on-prem VPN gateway appliance.
    • IKEv2: Port number 500 of the payload type UDP.
    • IPSec: Port number 4500 of the payload type UDP.
    • BGP: Port number 179 of the payload type TCP.
Firewall Port Requirements for On-Prem AZ

Configure rules for ports in your on-prem firewall depending on your deployment scenario.

On-Prem Behind a Network Address Translation or Firewall Device

In this scenario, the IPSec tunnel terminates behind a network address translation (NAT) or firewall device. For NAT to work, open UDP ports 500 and 4500 in both directions. Ports 1024–1034 are ephemeral ports used by the CVMs.

Enable the on-prem VPN gateway to allow the traffic according to the rules described in the Port Rules table.

IPSec Terminates on the Firewall Device

In this scenario, you do not need to open the ports for NAT (500 and 4500), but enable the on-prem VPN gateway to allow the traffic according to the rules described in the Port Rules table.

In the following table, PC subnet refers to the subnet where your on-prem Prism Central is running. The Xi infrastructure load balancer route is where the traffic for Xi CVMs and PC is located. You receive this information when you begin using Xi Cloud Services.

Table 1. Port Rules
Source address Destination address Source port Destination port
PC subnet Load balancer route advertised Any 1024–1034
Xi infrastructure load balancer route PC and CVM subnet Any

2020

2009

9440

The following port requirements are applicable only if you are using the Nutanix VPN solution.
Nutanix VPN VM 8.8.8.8 and 8.8.4.4 IP addresses of the DNS server VPN VM DNS UDP port 53
Nutanix VPN VM time.google.com, 0.pool.ntp.org, 1.pool.ntp.org, 2.pool.ntp.org of the NTP server VPN VM NTP UDP port 123
Nutanix VPN VM ICMP ping to NTP servers NA NA
CVM IP address in AHV clusters HTTPS request to the Internet AHV hosts HTTPS port 443
CVM IP address in ESXi clusters HTTPS and FTP requests to the Internet ESXi hosts HTTPS port 443 and FTP 21
Creating a Xi VPN Gateway

Create a VPN gateway to represent the Xi VPN gateway appliance.

About this task

Perform the following to create a Xi VPN gateway.

Procedure

  1. In the Xi Cloud Services portal, go to Explore -> Networking -> VPN Gateways .
  2. Click Create VPN Gateway .
    Figure. Create a Xi Gateway Click to enlarge

    The Create VPN Gateway window appears.

  3. Do the following in the indicated fields.
    1. Name: Enter a name for the VPN gateway.
    2. VPC: Select the production VPC.
    3. Type: Select Xi Gateway.
    4. Routing Protocol : Select eBGP or Static to set up a routing protocol between the Xi and on-prem gateways.
    5. ( eBGP only): Select this option if you want to set up the eBGP routing protocol between the Xi and on-prem gateways. Do the following in the indicated fields.
      • In the ASN field, set an ASN for the Xi gateway. Ensure that the Xi gateway ASN is different from that on-prem gateway ASN.
      • In the eBGP Password field, set up a password for the eBGP session that is established between the on-prem VPN gateway and Xi VPN gateway. The eBGP password can be any string, preferably alphanumeric.
    6. ( Static only) If you select this option, manually set up static routes between the Xi and on-prem gateways.

      For more information, see Adding a Static Route in Xi Infrastructure Service Administration Guide .

  4. Click Save .
    The Xi gateway you create is displayed in the VPN Gateways page.
Creating an On-Prem VPN Gateway (Third-Party)

Create a VPN gateway to represent the on-prem VPN gateway appliance.

Before you begin

Ensure that you have all the details about your on-prem VPN appliance as described in Third-Party VPN Solution Requirements.

About this task

Perform the following to create an on-prem VPN gateway.

Procedure

  1. In the Xi Cloud Services portal, go to Explore -> Networking -> VPN Gateways .
  2. Click Create VPN Gateway .
    Figure. Create an On Prem Gateway Click to enlarge

  3. Do the following in the indicated fields.
    1. Name : Enter a name for the VPN gateway.
    2. Type : Select On Prem - Third Party .
    3. IP Address of your Firewall or Router Device performing VPN : Enter the IP address of the on-prem VPN appliance.
    4. Routing Protocol : Select eBGP or Static to set up a routing protocol between the Xi and on-prem gateways.
    5. ( eBGP only) If you select eBGP, do the following:
      • In the ASN field, enter the ASN for your on-prem gateway. If you do not have a BGP environment in your on-prem AZ, you can choose any number. For example, you can choose a number in the 65000 range. Ensure that the Xi gateway ASN and on-prem gateway ASN are not the same.
      • In the eBGP Password field, enter the same eBGP password as the Xi gateway.
    6. ( Static only) If you select Static , manually set up static routes between the Xi and on-prem gateways.

      For more information, see Adding a Static Route in Xi Infrastructure Service Administration Guide .

Creating a VPN Connection

Create a VPN connection to establish a VPN IPSec tunnel between a VPN gateway in the Xi Cloud and VPN gateway in your on-prem AZ. Select the Xi gateway and on-prem gateway between whom you want to create the VPN connection.

About this task

Perform the following to create a VPN connection.

Procedure

  1. In the Xi Cloud Services portal, go to Explore -> Networking -> VPN Connections .
  2. Click Create VPN Connection .
    Figure. Create a VPN Connection Click to enlarge

    The Create VPN Connection window appears.

  3. Do the following in the indicated fields:
    1. Name : Enter a name for the VPN connection.
    2. Description : Enter a description for the VPN connection.
    3. IPSec Secret . Enter an alphanumeric string as the IPSec string for the VPN connection.
    4. Xi Gateway : Select the Xi gateway for which you want to create this VPN connection.
    5. On Premises Gateway : Select the on-prem gateway for which you want to create this VPN connection.
    6. Dynamic Route Priority : This is not a mandatory field. Set this field if you have multiple routes to the same destination. For example, consider you have VPN connection 1 and VPN connection 2 and you want VPN connection 1 to take precedence over VPN connection 2, set the priority for VPN connection 1 higher than VPN connection 2. Higher the priority number, higher is the precedence of that connection. You can set a priority number from 10 through 1000.
      See the Routes Precedence section in Routes Management in Xi Infrastructure Service Administration Guide for more information.
  4. Click Save .
    The VPN connection you create is displayed in the VPN Connections page.
Downloading the On-Prem VPN Appliance Configuration

Depending upon your VPN solution, you can download detailed instructions about how to configure your on-prem VPN gateway appliance.

About this task

Perform the following to download the instructions to configure your on-prem VPN gateway appliance.

Procedure

  1. In the Xi Cloud Services portal, go to Explore -> Networking -> VPN Gateways .
  2. Click an on-prem VPN gateway.
  3. In the details page, click On Prem Gateway Configuration .
    Figure. On-prem VPN Gateway Appliance Configuration Click to enlarge

  4. Select the type and version of your on-prem VPN gateway appliance and click Download .
  5. Follow the instructions in the downloaded file to configure the on-prem VPN gateway appliance.

VPN Gateway Management

You can see the details of each VPN gateway, update the gateway, or delete the gateway.

All your VPN gateways are displayed in the VPN Gateways page.

Displaying the Details of a VPN Gateway

You can display the details such as the type of gateway, VPC, IP addresses, protocols, and connections associated with the gateways.

About this task

Perform the following to display the details of a VPN gateway.

Procedure

  1. In the Xi Cloud Services portal, go to Explore -> Networking -> VPN Gateways .
    A list of all your VPN gateways is displayed. The VPN gateways table displays details such as the name, type, VPC, status, public IP address, and VPN connections associated with each VPN gateway.
    Figure. VPN Gateways List Click to enlarge

  2. Click the name of a VPN gateway to display additional details of that VPN gateway.
  3. In the details page, click the name of a VPN connection to display the details of that VPN connection associated with the gateway.
Updating a VPN Gateway

The details that you can update in a VPN gateway depend on the type of gateway (Xi gateway or On Prem gateway).

About this task

Perform the following to update a VPN gateway.

Procedure

  1. In the Xi Cloud Services portal, go to Explore -> Networking -> VPN Gateways .
  2. Do one of the following:
    • Select the checkbox next to the name of the VPN gateway and, in the Actions drop-down list, click Update .
      Figure. Use the Actions drop-down list Click to enlarge

    • Click the name of the VPN gateway and, in the details page that appears, click Update .

    The Update VPN Gateway dialog box appears.

  3. Update the details as required.
    The fields are similar to the Create VPN Gateway dialog box. For more information, see Creating a Xi VPN Gateway, Creating an On-Prem VPN Gateway (Nutanix), or Creating an On-Prem VPN Gateway (Third-Party) depending on the type of gateway you are updating.
  4. Click Save .
Deleting a VPN Gateway

If you want to delete a VPN gateway, you must first delete all the VPN connections associated with the gateway and only then you can delete the VPN gateway.

About this task

Perform the following to delete a VPN gateway.

Procedure

  1. In the Xi Cloud Services portal, go to Explore -> Networking -> VPN Gateways .
  2. Do one of the following:
    • Select the checkbox next to the name of the VPN gateway and, in the Actions drop-down list, click Delete .
    • Click the name of the VPN gateway and, in the details page that appears, click Delete .
  3. Click OK in the confirmation message that appears to delete the VPN gateway.
VPN Gateway Upgrades

Nutanix deployment can detect and install upgrades for the on-prem Nutanix Gateways and the Xi VPN Gateways in the XI Cloud Services based disaster recovery (DR) solutions (Xi Leap configurations).

For onprem Nutanix VPN Gateways (deployments that do not use Xi Cloud Services), the upgrades need to be detected and installed on the respective PC on which each Nutanix VPN Gateway is installed.

For Xi Cloud Services based Xi VPN Gateways, the upgrades are automatically detected on the Xi PC. If not installed within a certain amount of time, the Xi PC automatically installs the upgrades to protect Xi Cloud Services. For onprem Nutanix Gateways connected to Xi PC, the onprem PC detects the upgrades but the Xi PC installs the upgrades.

For more information, see Detecting Upgrades for VPN Gateways.

When PC (onprem PC or Xi PC) detects the upgrades, it displays a banner on the VPN Gateways tab of the VPN page. The banner notifies you that a VPN upgrade is available after you have run LCM inventory. The table on the VPN Gateways tab also displays an alert (exclamation mark) icon for the VPN gateways that the upgrade applies to. The hover message for the icon informs you that an upgrade is available for that VPN Gateway.

Figure. Upgrade Banner Click to enlarge Displaying sample VPN Gateway tab.

For more information, see Upgrading the Xi-based VPN Gateways and Upgrading the PC-managed Onprem Nutanix VPN Gateways.

For information about identifying the VPN Gateway, see Identifying the VPN Gateway Version.

Identifying the VPN Gateway Version

About this task

To identify the current VPN Gateway version, do the following:

Procedure

  • On the VPN Gateway tab, click the VPN Gateway name link text to open the VPN Gateway details page.

    In the VPN Gateway table, the VPN Gateway name is a clickable link text.

    Figure. VPN Gateway Details Click to enlarge Displays sample VPN Gateway details page with clickable version number.

  • Click the version number link text to open the VPN Version dialog box.

    On the VPN Gateway details page, the version identifier or number is clickable.

    The VPN Version dialog box provides information about the current version. It informs you if the current version is the latest. If the current version is the latest, the Update button in the dialog box is unavailable (not clickable). If an upgrade is available, the Update button is available (clickable).

    Figure. VPN Version Dialog Box Click to enlarge Displaying VPN Version dialog box

Detecting Upgrades for VPN Gateways

About this task

Prism Central can detect whether new VPN Gateway upgrades are available, or not, for Nutanix VPN Gateways. You can then install the upgrade.

Note:

Xi PC detects Xi VPN Gateway upgrades automatically. You can directly upgrade the Xi VPN Gateway when the notification banner appears on the VPN Gateway page.

If you do not upgrade the Xi VPN Gateway version when a new version is available, the Xi PC automatically installs the upgrades to protect Xi Cloud Services.

Procedure

Click Perform Inventory on on-prem PC LCM page, for the Xi-managed on-prem Nutanix VPN Gateways.

In the on-prem deployment of Leap where the primary AZs are on-prem and recovery AZ is Xi Cloud Services, you need to Perform Inventory in LCM on the on-prem Prism Central to detect new versions or upgrades of the Onprem Nutanix VPN Gateway.

Note:

Nutanix recommends that you select Enable LCM Auto Inventory in the LCM page in on-prem Prism Central to continuously detect new VPN upgrades as soon as they are available.

The upgrade notification banner is displayed on the VPN Gateways page.

The Xi PC user interface displays a banner on the VPN Gateways page notifying you that a VPN upgrade is available after you have run LCM inventory. The table on the VPN Gateways page also displays an icon for the VPN gateways that the upgrade applies to.

The VPN Gateways page in Xi displays the same banner and icon (as in case of Xi PC) as soon as the upgrades are available in LCM inventory.

Upgrading the Xi-based VPN Gateways

About this task

Each VPN gateway is upgraded independently of one another on both sides of the VPN connection. In a Xi Leap configuration, you must perform the VPN Gateway upgrades on both sides of the VPN connection using the Xi user interface.

Perform upgrades of Xi-managed Onprem Nutanix VPN Gateways from the Xi PC.

Note:

You cannot upgrade Xi-managed Nutanix VPN Gateways from the Onprem PC.

If you do not upgrade the Xi VPN Gateway version when a new version is available, the Xi PC automatically installs the upgrades to protect Xi Cloud Services.

To upgrade the Xi VPN Gateways and the Xi-managed on-prem Nutanix VPN Gateways, do the following:

Procedure

  1. Do one of the following to open the VPN Version dialog box:
    • On the VPN Gateway details page, the version identifier or number is clickable.

      Click the version number link text.
    • Click the alert icon for the VPN Gateway in the VPN Gateway table.

    The VPN Version dialog box opens.

    See Identifying the VPN Gateway Version for a sample view of the VPN Version dialog box.

  2. Click the Update button to upgrade the VPN Gateway version.
Upgrading the PC-managed Onprem Nutanix VPN Gateways

About this task

Perform upgrades of PC-managed Nutanix Gateways using the respective PC on which the Gateway is created.

To upgrade the on-prem Nutanix Gateways, do the following:

Procedure

  1. Log on to the Prism Central as the admin user and click the gear icon.
  2. Go to Administration > LCM > Inventory .
  3. Click Perform Inventory .

    When you click Perform Inventory , the system scans the registered Prism Central cluster for software versions that are running currently. Then it checks for any available upgrades and displays the information on the LCM page under Software .

    Note:

    Skip this step if you have enabled auto-inventory in the LCM page in Prism Central.

  4. Go to Updates > Software . Select the Gateway version you want to upgrade to and click Update .

    LCM upgrades the Gateway version. This process takes sometime.

VPN Connection Management

You can see the details of each VPN connection, update the connection, or delete the connection.

All your VPN connections are displayed in the VPN Connections page.

Displaying the Details of a VPN Connection

You can display details such as the gateways associated with the connection, protocol details, Xi gateway routes, throughput of the connection, and logs of the IPSec and eBGP sessions for troubleshooting purposes.

About this task

Perform the following to display the details of a VPN connection.

Procedure

  1. In the Xi Cloud Services portal, go to Explore -> Networking -> VPN Connections .
    A list of all your VPN connections is displayed. The VPN connections table displays details such as the name, IPSec and eBGP status, dynamic route priority, and the VPC and gateways associated with each VPN connection.
    Figure. VPN Connections List Click to enlarge

  2. Click the name of a VPN connection to display more details of that VPN connection.
    The details page displays the following tabs:
    • Summary : Displays details of each gateway, protocol, and Xi gateway routes associated with the connection.
    • Throughput : Displays a graph for throughput of the VPN connection.
    • IPSec Logging : Displays logs of the IPSec sessions of the VPN connection. You can see these logs to troubleshoot any issues with the VPN connection.
    • EBGP Logging : Displays logs of the eBGP sessions of the VPN connection. You can see these logs to troubleshoot any issues with the VPN connection.

    Click the name of the tab to display the details in that tab. For example, click the Summary tab to display the details.

    Figure. VPN Connection Summary Tab Click to enlarge

Updating a VPN Connection

You can update the name, description, IPSec secret, and dynamic route priority of the VPN connection.

About this task

Perform the following to update a VPN connection.

Procedure

  1. In the Xi Cloud Services portal, go to Explore -> Networking -> VPN Connections .
  2. Do one of the following:
    • Select the checkbox next to the name of the VPN connection and, in the Actions drop-down list, click Update .
      Figure. Use the Actions drop-down list Click to enlarge

    • Click the name of the VPN connection and, in the details page that appears, click Update .
      Figure. Click the name of the VPN connection Click to enlarge

    The Update VPN Connection dialog box appears.

  3. Update the details as required.
    The fields are similar to the Create VPN Connection dialog box. See Creating a VPN Connection for more information.
  4. Click Save .
Deleting a VPN Connection

About this task

Perform the following to delete a VPN connection.

Procedure

  1. In the Xi Cloud Services portal, go to Explore -> Networking -> VPN Connections .
  2. Do one of the following:
    • Select the checkbox next to the name of the VPN connection and, in the Actions drop-down list, click Delete .
    • Click the name of the VPN connection and, in the details page that appears, click Delete .
  3. Click OK in the confirmation message that appears to delete the VPN connection.

Upgrading the VPN Gateway Appliances

You can upgrade the VPN gateway VM in the Xi Cloud and on-prem VPN gateway VM in your on-prem AZ if you are using the On Prem - Nutanix VPN solution by using the Xi Cloud Services portal. If you are using a third-party VPN solution, you can upgrade only the VPN gateway VM running in the Xi Cloud by using the Xi Cloud Services portal. To upgrade the on-prem VPN gateway appliance provided by a third-party vendor, see the documentation of that vendor for instructions about how to upgrade the VPN appliance.

About this task

Note: The VPN gateway VM restarts after the upgrade is complete. Therefore, perform the upgrade during a scheduled maintenance window.

Perform the following to upgrade your VPN gateway appliances.

Procedure

  1. In the Xi Cloud Services portal, go to Explore -> Networking -> VPN Gateways .
  2. Click the name of the VPN gateway.

    To upgrade the VPN gateway VM running in the Xi Cloud, select a Xi gateway.

    To upgrade the VPN gateway VM running in your on-prem AZ, select the on-prem gateway associated with that on-prem VPN gateway VM.

  3. In the details page of the gateway, click the link in the Version row.

    The VPN Version dialog box appears.

    If you are using the latest version of the VPN gateway VM, the VPN Version dialog box displays a message that your VPN gateway VM is up to date.

    If your VPN gateway VM is not up to date, the VPN Version dialog box displays the Upgrade option.

  4. In the VPN Version dialog box, click Upgrade to upgrade your VPN gateway VM to the latest version.
    The VPN gateway VM restarts after the upgrade is complete and starts with the latest version.

Nutanix Virtual Networks

A planned or an unplanned failover for production workloads requires production virtual networks in both the primary and the recovery AZ. To ensure that a failover operation, whenever necessary, goes as expected, you also need test virtual network in both the AZs for testing your recovery configuration in both directions (failover and failback). To isolate production and test workflows, a recovery plan in Leap uses four separate virtual networks, which are as follows.

Two Production Networks
A production virtual network in the primary AZ is mapped to a production network in the recovery AZ. Production failover and failback are confined to these virtual networks.
Two Test Networks
The production virtual network in each AZ is mapped to a test virtual network in the paired AZ. Test failover and failback are confined to these virtual networks.

The following figures show the source and target networks for planned, unplanned, and test failovers.

Figure. Virtual Network Mapping Click to enlarge Virtual Network Mapping (On-Prem to On-Prem)

Figure. Virtual Network Mapping (On-Prem to Xi Cloud Services) Click to enlarge Virtual Network Mapping (On-Prem to Xi Cloud Services)

Virtual networks on on-prem Nutanix clusters are virtual subnets bound to a single VLAN. At on-prem AZs (including the recovery AZ), you must manually create the production and test virtual networks before you create your first recovery plan.

The virtual networks required in Xi Cloud Services are contained within virtual private clouds (VPCs). Virtual networks required for production workloads are contained within a virtual private cloud named production. Virtual networks required for testing failover from on-prem AZs are contained within a virtual private cloud named Test. The task of creating virtual networks in the VPCs in Xi Cloud Services is an optional one. If you do not create a virtual network in a VPC, Leap dynamically creates the virtual networks for you when a failover operation is in progress. Leap cleans up dynamically created virtual networks when they are no longer required (after failback).

Note: You cannot create more VPCs in Xi Cloud Services. However, you can update the VPCs to specify settings such as DNS and DHCP, and you can configure policies to secure the virtual networks.

Virtual Subnet Configuration in On-Prem AZ

You can use your on-prem Prism Central instance to create, modify, and remove virtual networks. For information about how to perform these procedures by using Prism Central, see the Prism Central Guide .

Virtual Subnet Configuration in Xi Cloud Services

You can create virtual subnets in the production and test virtual networks. This is an optional task. You must perform these procedures in Xi Cloud Services. For more information, see the Xi Infrastructure Services Guide .

Xi Leap RPO Sizer

Nutanix offers standard service level agreements (SLAs) for data replication from your on-prem AHV clusters to Xi Cloud Services based on RPO and RTO. The replication to Xi Cloud Services occurs over public Internet (VPN or DirectConnect) and therefore the network bandwidth available for replication to Xi Cloud Services cannot be controlled. The unstable network bandwidth and the lack of network information affects the amount of data that can be replicated in a given time frame. You can test your RPO objectives by setting up a real protection policy or use Xi Leap RPO sizer utility to simulate the protection plan (without replicating data to Xi Cloud Services). Xi Leap RPO Sizer provides you with information required to determine if the RPO SLAs are achievable. The utility provides insights on your network bandwidth, estimates performance, calculates actual change rate, and calculates the feasible RPO for your data protection plan.

About this task

See Xi Leap Service-Level Agreements (SLAs) for more information about Nutanix SLAs for data replication to Xi Cloud Services. To use the Xi Leap RPO Sizer utility, perform the following steps.

Procedure

  1. Log on to the My Nutanix portal with your account credentials.
  2. Click Launch in Xi Leap RPO Sizer widget.
  3. (optional) Download the bundle (rpo_sizer.tar) using the hyperlink given in the instructions.
    Tip: You can also download the bundle directly (using wget command in CLI) into the directory after step 4.a.
  4. Log on to any on-prem guest VM through an SSH session and do the following.
    Note: The guest VM must have connectivity to the Prism Central VM and CVMs.
    1. Create a separate directory to ensure that all the downloaded and extracted files inside the downloaded bundle remain in one place.
      $ mkdir dir_name

      Replace dir_name with an identifiable name. For example, rpo_sizer.

    2. (optional) Copy the downloaded bundle into the directory created in the previous step.
      $ cp download_bundle_path/rpo_sizer.tar ./dir_name/

      Replace download_bundle_path with the path to the downloaded bundle.

      Replace dir_name with the directory name created in the previous step.

      Tip: If you download the bundle directly (using wget command in CLI) from the directory, you can skip this step.
    3. Go to the directory where the bundle is stored and extract the bundle.
      $ cd ./dir_name

      Replace dir_name with the directory name created in the step 4.a.

      $ tar -xvf rpo_sizer.tar
      This command generates rpo_sizer.sh and rposizer.tar in the same directory.
    4. Change the permissions to make the extracted shell file executable.
      $ chmod +x rpo_sizer.sh
    5. Run the shell script in the bundle.
      $ ./rpo_sizer.sh
      Note: If you ran the Xi Leap RPO Sizer previously on the Prism Central VM, ensure that you clean up the script before you run the shell script again. Run the command ./rpo_sizer.sh delete to clean up the script. If you do not clean up the script, you get an error similar to
      The container name "/rpo_sizer" is already in use by container "xxxx"(where xxxx is the container name. You have to remove (or rename) that container to be able to reuse that name
      .
  5. Open a web browser and go to http:// Prism_Central_IP_address :8001/ to run the RPO test.

    Replace Prism_Central_IP_address with the virtual IP address of your Prism Central deployment.

    Note: If you have set up a firewall on Prism Central, ensure that the port 8001 is open.
    $ modify_firewall -p 8001 -o open -i eth0 -a
    Close the port after running the RPO test.
    $ modify_firewall -p 8001 -o close -i eth0 -a
  6. Click Configure and execute test and specify the following information in the Configuration Wizard .
    Note: If you are launching the Xi Leap RPO Sizer utility for the first time, generate an API key pair. To generate API key pair, see Creating an API Key in the Nutanix Licensing Guide guide.
    1. In the API Key and PC Credentials tab, specify the following information.
        1. API Key : Enter the API key that you generated.
        2. Key ID : Enter the key ID that you generated.
        3. PC IP : Enter the IP address of Prism Central VM.
        4. Username : Enter the username of your Prism Central deployment.
        5. Password : Enter the password of your Prism Central deployment.
        6. Click Next .
    2. In the Select Desired RPO and Entities tab, select the desired RPO from the drop-down list, select the VM Categories or individual VMs, and click + . If you want to add more RPO and entities to the test, enter the information again and click Next .
      Note: Only when you select the desired RPO, you can select the VM Categories or individual VMs on which you can test the selected RPO.
      The system discovers Prism Element automatically based on the VM Categories and the individual VMs you choose.
    3. In the Enter PE credentials tab, enter the SSH password or SSH key for Prism Element ("nutanix" user) running on AHV cluster and click Next .
    4. In the Network Configuration tab, specify the following information.
        1. Select region : Select a region closest to Xi Leap datacenter from the drop-down list where the workloads should be copied.
        2. Select AZ : Select an AZ (AZ) from the drop-down list.
        3. NAT Gateway IPs : Enter the public facing IP address of Prism Element running on your AHV cluster.
          Note: To find the NAT gateway IP address of Prism Element running on your AHV cluster, log on to Prism Element through an SSH session (as the "nutanix" user) and run the curl ifconfig.me command.

          Note: Do not turn on the Configure Advanced Options switch unless advised by the Nutanix Support.
        4. Click Next .
    5. In the View Configuration tab, review the RPO, entity, and network configuration, the estimated test duration, and click Submit .
    The new window shows the ongoing test status in a progress bar.
  7. When the RPO test completes, click Upload result .
    The test result is uploaded and visible on the RPO Sizer portal. To view the detailed and intuitive report of the test, click View Report . To abort the test, click X .
    Note: If a test is in progress, you cannot trigger a new test.

Protection and Automated DR (Xi Leap)

Automated data recovery (DR) configurations use protection policies to protect the guest VMs, and recovery plans to orchestrate the recovery of those guest VMs to Xi Cloud Services. With reverse synchronization, you can protect guest VMs and enable DR from Xi Cloud Services to a Nutanix cluster at an on-prem AZ (AZ). You can automate protection of your guest VMs with the following supported replication schedules in Xi Leap.

  • Asynchronous replication schedule (1 hour or greater RPO). For information about protection with Asynchronous replication schedule, see Protection with Asynchronous Replication and DR (Xi Leap).
  • NearSync replication schedule (1–15 minute RPO). For information about protection with NearSync replication schedule, see Protection with NearSync Replication and DR (Xi Leap).

Protection with Asynchronous Replication and DR (Xi Leap)

Asynchronous replication schedules enable you to protect your guest VMs with an RPO of 1 hour or beyond. A protection policy with an Asynchronous replication schedule creates a recovery point in an hourly time interval, and replicates it to Xi Cloud Services for High Availability. For guest VMs protected with Asynchronous replication schedule, you can perform disaster recovery (DR) to Xi Cloud Services. With reverse synchronization, you can perform DR from Xi Cloud Services to a Nutanix cluster at an on-prem AZ. In addition to performing DR from AHV clusters to Xi Cloud Services (only AHV), you can also perform cross-hypervisor disaster recovery (CHDR)—DR from ESXi clusters to Xi Cloud Services.

Note: Nutanix provides multiple disaster recovery (DR) solutions to secure your environment. See Nutanix Disaster Recovery Solutions for the detailed representation of the DR offerings of Nutanix.

Asynchronous Replication Requirements (Xi Leap)

The following are the specific requirements for protecting your guest VMs with Asynchronous replication schedule. Ensure that you meet the following requirements in addition to the general requirements of Xi Leap.

For information about the general requirements of Xi Leap, see Xi Leap Requirements.

For information about the on-prem node, disk and Foundation configurations required to support Asynchronous replication schedules, see On-Prem Hardware Resource Requirements.

Hypervisor Requirements

AHV or ESXi

  • The AHV clusters must be running on AHV versions that come bundled with the latest version of AOS.
  • The ESXi clusters must be running on version ESXi 6.5 GA or newer.

Nutanix Software Requirements

The on-prem Prism Central and their registered clusters (Prism Elements) must be running the following versions of AOS.

  • AOS 5.10 or newer with AHV.
  • AOS 5.11 or newer with ESXi.

Xi Cloud Services runs the latest versions of AOS.

Cross Hypervisor Disaster Recovery (CHDR) Requirements

Guest VMs protected with Asynchronous replication schedule support cross-hypervisor disaster recovery. You can perform failover (DR) to recover guest VMs from ESXi clusters to AHV clusters (Xi Cloud Services) by considering the following requirements.

  • The on-prem Nutanix clusters must be running AOS 5.11 or newer.
  • Install and configure Nutanix Guest Tools (NGT) on all the guest VMs. For more information, see Enabling and Mounting Nutanix Guest Tools in Prism Web Console Guide .

    NGT configures the guest VMs with all the required drivers for VM portability. For more information about general NGT requirements, see Nutanix Guest Tools Requirements and Limitations in Prism Web Console Guide .

  • CHDR supports guest VMs with flat files only.
  • CHDR supports IDE/SCSI and SATA disks only.
  • For all the non-boot SCSI disks of Windows guest VMs, set the SAN policy to OnlineAll so that they come online automatically.
  • In vSphere 6.7, guest VMs are configured with UEFI secure boot by default. Upon CHDR to an AHV cluster, these guest VMs do not start if the host does not support the UEFI secure boot feature. For more information about supportability of UEFI secure boot on Nutanix clusters, see Compatibility Matrix.

    For operating systems that support UEFI and Secure Boot, see UEFI and Secure Boot Support for CHDR.

  • Nutanix does not support vSphere inventory mapping (for example, VM folder and resource pools) when protecting workloads between VMware clusters.

  • Nutanix does not support vSphere snapshots or delta disk files.

    If you have delta disks attached to a guest VM and you proceed with failover, you get a validation warning and the guest VM does not recover. Contact Nutanix Support for assistance.

Table 1. Operating Systems Supported for CHDR
Operating System Version Requirements and limitations
Windows
  • Windows 2008 R2 or newer versions
  • Windows 7 or newer versions
  • Only 64-bit operating systems are supported.
Linux
  • CentOS 6.5 and 7.0
  • RHEL 6.5 or newer and RHEL 7.0 or newer.
  • Oracle Linux 6.5 and 7.0
  • Ubuntu 14.04
  • SLES operating system is not supported.

Additional Requirement

The storage container name of the protected guest VMs must be the same on both the primary and recovery clusters. Therefore, a storage container must exist on the recovery cluster with the same name as the one on the primary cluster. For example, if the protected VMs are in the SelfServiceContainer storage container on the primary cluster, there must also be a SelfServiceContainer storage container on the recovery cluster.

Asynchronous Replication Limitations (Xi Leap)

Consider the following specific limitations before protecting your guest VMs with Asynchronous replication schedule. These limitations are in addition to the general limitations of Xi Leap.

For information about the general limitations of Leap, see Xi Leap Limitations.

  • You cannot restore guest VMs with incompatible GPUs at the recovery AZ.
  • You cannot protect guest VMs configured as part of a network function chain.
  • You cannot retain hypervisor-specific properties after cross hypervisor disaster recovery (CHDR).

    Cross hypervisor disaster recovery (CHDR) does not preserve hypervisor-specific properties (for example, multi-writer flags, independent persistent and non-persistent disks, changed block tracking (CBT), PVSCSI disk configurations).

Creating a Protection Policy with Asynchronous Replication Schedule (Xi Leap)

To protect the guest VMs in an hourly replication schedule, configure an Asynchronous replication schedule while creating the protection policy. The policy takes recovery points of those guest VMs in the specified time intervals (hourly) and replicates them to Xi Cloud Services for High Availability. With reverse synchronization, you can create policy at Xi Cloud Services and replicate to an on-prem AZ (AZ). For protection from Xi Cloud Services to an on-prem AZ, the protection policy allows you to add only one Asynchronous replication schedule.

Before you begin

See Asynchronous Replication Requirements (Xi Leap) and Asynchronous Replication Limitations (Xi Leap) before you start.

About this task

To create a protection policy with an Asynchronous replication schedule, perform the following procedure at Xi Cloud Services. You can also create a protection policy at the on-prem AZ. Protection policies you create or update at the on-prem AZ synchronize back to Xi Cloud Service.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click the hamburger icon at the top-left corner of the window. Go to Data Protection & Recovery > Protection Policies in the left pane.
  3. Click Create Protection Policy .
    Specify the following information in the Create Protection Policy window.
    Figure. Protection Policy Configuration: Asynchronous
    Click to enlarge Protection Policy Configuration: Asynchronous

    1. Policy name : Enter a name for the protection policy.
      Caution: The name can be of only alphanumeric, dot, dash, and underscore characters.
    2. In the Primary Location pane, specify the following information.
        1. Location : From the drop-down list, check the Xi Cloud Services AZ (AZ) that hosts the guests VMs to protect.

          The drop-down lists all the AZs paired with the local AZ. Local AZ represents the local AZ (Prism Central). For your primary AZ, you can check either the local AZ or a non-local AZ.

        2. Cluster : Xi Cloud Services automatically selects the cluster for you. Therefore the only option available is Auto .

          The drop-down lists all the Nutanix clusters registered to Prism Central representing the selected AZ. If you want to protect the guest VMs from multiple Nutanix clusters in the same protection policy, check the clusters that host those guest VMs. All Clusters protects the guest VMs of all Nutanix clusters registered to Prism Central.

        3. Click Save .

          Clicking Save activates the Recovery Location pane. After saving the primary AZ configuration, you can optionally add a local schedule (step iv) to retain the recovery points at the primary AZ.

        4. Click + Add Local Schedule if you want to retain recovery points locally in addition to retaining recovery points in a replication schedule (step d.iv). For example, you can create a local schedule to retain 15 minute recovery points locally and also an hourly replication schedule to retain recovery points and replicate them to a recovery AZ every 2 hours. The two schedules apply differently on the guest VMs.

          Specify the following information in the Add Schedule window.

          Figure. Protection Policy Configuration: Add Local Schedule
          Click to enlarge Protection Policy Configuration: Add Local Schedule

          1. Take Snapshot Every : Specify the frequency in minutes , hours , days , or weeks at which you want the recovery points to be taken locally.
          2. Retention Type : Specify one of the following two types of retention policy.
            • Linear : Implements a simple retention scheme at the local AZ. If you set the retention number to n, the local AZ retains the n recent recovery points.

              When you enter the frequency in minutes , the system selects the Roll-up retention type by default because minutely recovery points do not support Linear retention types.

            • Roll-up : Rolls up the recovery points into a single recovery point at the local AZ.

              For more information about the roll-up recovery points, see step d.iii.

          3. Retention on XI-US-EAST-1A-PPD : Auto : Specify the retention number for the local AZ.
          4. If you want to take application-consistent recovery points, check Take App-Consistent Recovery Point .

            Irrespective of the local or replication schedules, the recovery points are of the specified type. If you check Take App-Consistent Recovery Point , the recovery points generated are application-consistent and if you do not check Take App-Consistent Recovery Point , the recovery points generated are crash-consistent. If the time in the local schedule and the replication schedule match, the single recovery point generated is application-consistent.

            Note: See Application-consistent Recovery Point Conditions and Limitations before you take application-consistent snapshot.
          5. Click Save Schedule .
    3. In the Recovery Location pane, specify the following information.
      Figure. Protection Policy Configuration: Select Recovery Location
      Click to enlarge Protection Policy Configuration: Select Recovery Location

        1. Location : From the drop-down list, select the AZ (AZ) where you want to replicate the recovery points.

          The drop-down lists all the AZs paired with the Xi Cloud Services. XI-US-EAST-1A-PPD : Auto represents the local AZ (Prism Central). Do not select XI-US-EAST-1A-PPD : Auto because a duplicate location is not supported in Xi Cloud Services.

          If you do not select a AZ, local recovery points that are created by the protection policy do not replicate automatically. You can, however, replicate the recovery points manually and use recovery plans to recover the guest VMs. For more information, see Protection and Manual DR (Nutanix Disaster Recovery).

        2. Cluster : Xi Cloud Services automatically selects the cluster for you. Therefore the only option available is Auto .

          The drop-down lists all the Nutanix clusters registered to Prism Central representing the selected AZ. If you want to protect the guest VMs from multiple Nutanix clusters in the same protection policy, check the clusters that host those guest VMs. All Clusters protects the guest VMs of all Nutanix clusters registered to Prism Central.

        3. Click Save .

          Clicking Save activates the + Add Schedule button between the primary and the recovery AZ. After saving the recovery AZ configuration, you can optionally add a local schedule to retain the recovery points at the recovery AZ.

        4. Click + Add Local Schedule if you want to retain recovery points locally in addition to retaining recovery points in a replication schedule (step d.iv). For example, you can create a local schedule to retain one hourly recovery points locally to supplement the hourly replication schedule. The two schedules apply differently on the guest VMs after failover, when the recovery points replicate back to the primary AZ.

          Specify the following information in the Add Schedule window.

          Figure. Protection Policy Configuration: Add Local Schedule
          Click to enlarge Protection Policy Configuration: Add Local Schedule

          1. Take Snapshot Every : Specify the frequency in minutes , hours , days , or weeks at which you want the recovery points to be taken locally.
          2. Retention Type : Specify one of the following two types of retention policy.
            • Linear : Implements a simple retention scheme at the local AZ. If you set the retention number to n, the local AZ retains the n recent recovery points.

              When you enter the frequency in minutes , the system selects the Roll-up retention type by default because minutely recovery points do not support Linear retention types.

            • Roll-up : Rolls up the recovery points into a single recovery point at the local AZ.

              For more information about the roll-up recovery points, see step d.iii.

          3. Retention on PC_xx.xx.xxx:PE_yyy : Specify the retention number for the local AZ.
          4. If you want to take application-consistent recovery points, check Take App-Consistent Recovery Point .

            Irrespective of the local or replication schedules, the recovery points are of the specified type. If you check Take App-Consistent Recovery Point , the recovery points generated are application-consistent and if you do not check Take App-Consistent Recovery Point , the recovery points generated are crash-consistent. If the time in the local schedule and the replication schedule match, the single recovery point generated is application-consistent.

            Note: See Application-consistent Recovery Point Conditions and Limitations before you take application-consistent snapshot.
          5. Click Save Schedule .
    4. Click + Add Schedule to add a replication schedule between the primary and the recovery AZ.

      Specify the following information in the Add Schedule window. The window auto-populates the Primary Location and Recovery Location that you have selected in step b and step c.

      Figure. Protection Policy Configuration: Add Schedule (Asynchronous)
      Click to enlarge Protection Policy Configuration: Add Schedule (Asynchronous)

        1. Protection Type : Click Asynchronous .
        2. Take Snapshot Every : Specify the frequency in hours , days , or weeks at which you want the recovery points to be taken.

          The specified frequency is the RPO. For more information about RPO, see Nutanix Disaster Recovery Terminology.

        3. Retention Type : Specify one of the following two types of retention policy.
          • Linear : Implements a simple retention scheme at both the primary (local) and the recovery (remote) AZ. If you set the retention number for a given AZ to n, that AZ retains the n recent recovery points. For example, if the RPO is 1 hour, and the retention number for the local AZ is 48, the local AZ retains 48 hours (48 X 1 hour) of recovery points at any given time.
            Tip: Use linear retention policies for small RPO windows with shorter retention periods or in cases where you always want to recover to a specific RPO window.
          • Roll-up : Rolls up the recovery points as per the RPO and retention period into a single recovery point at a AZ. For example, if you set the RPO to 1 hour, and the retention time to 5 days, the 24 oldest hourly recovery points roll up into a single daily recovery point (one recovery point = 24 hourly recovery points) after every 24 hours. The system keeps one day (of rolled-up hourly recovery points) and 4 days of daily recovery points.
            Note:
            • If the retention period is n days, the system keeps 1 day of RPO (rolled-up hourly recovery points) and n-1 days of daily recovery points.
            • If the retention period is n weeks, the system keeps 1 day of RPO, 1 week of daily and n-1 weeks of weekly recovery points.
            • If the retention period is n months, the system keeps 1 day of RPO, 1 week of daily, 1 month of weekly, and n-1 months of monthly recovery points.
            • If the retention period is n years, the system keeps 1 day of RPO, 1 week of daily, 1 month of weekly, and n-1 months of monthly recovery points.
            Note: The recovery points that are used to create a rolled-up recovery point are discarded.
            Tip: Use roll-up retention policies for anything with a longer retention period. Roll-up policies are more flexible and automatically handle recovery point aging/pruning while still providing granular RPOs for the first day.
        4. To specify the retention number for the primary and recovery AZs, do the following.
          • Retention on XI-US-EAST-1A-PPD : Auto : Specify the retention number for the primary AZ.

            This field is unavailable if you do not specify a recovery location.

          • Retention on PC_xx.xx.xx.xxx:PE_yyy : Specify the retention number for the recovery AZ.

            If you select linear retention, the remote and local retention count represents the number of recovery points to retain at any given time. If you select roll-up retention, these numbers specify the retention period.

        5. If you want to enable reverse retention of the recovery points, check Reverse retention for VMs on recovery location .
          Note: Reverse retention for VMs on recovery location is available only when the retention numbers on the primary and recovery AZs are different.

          Reverse retention maintains the retention numbers of recovery points even after failover to a recovery AZ in the same or different AZs. For example, if you retain two recovery points at the primary AZ and three recovery points at the recovery AZ, and you enable reverse retention, a failover event does not change the initial retention numbers when the recovery points replicate back to the primary AZ. The recovery AZ still retains two recovery points while the primary AZ retains three recovery points. If you do not enable reverse retention, a failover event changes the initial retention numbers when the recovery points replicate back to the primary AZ. The recovery AZ retains three recovery points while the primary AZ retains two recovery points.

          Maintaining the same retention numbers at a recovery AZ is required if you want to retain a particular number of recovery points, irrespective of where the guest VM is after its failover.

        6. If you want to take application-consistent recovery points, check Take App-Consistent Recovery Point .

          Application-consistent recovery points ensure that application consistency is maintained in the replicated recovery points. For application-consistent recovery points, install NGT on the guest VMs running on AHV clusters. For guest VMs running on ESXi clusters, you can take application-consistent recovery points without installing NGT, but the recovery points are hypervisor-based, and leads to VM stuns (temporary unresponsive VMs) after failover to the recovery AZs.

          Note: See Application-consistent Recovery Point Conditions and Limitations before you take application-consistent snapshot.
          Caution: Application-consistent recovery points fail for EFI-boot enabled Windows 2019 VM running on ESXi when NGT is not installed. Nutanix recommends installing NGT on guest VMs running on ESXi also.
        7. Click Save Schedule .
    5. Click Next .
      Clicking Next shows a list of VM categories where you can optionally check one or more VM categories to protect in the protection policy. DR configurations using Leap allows you to protect a guest VM by using only one protection policy. Therefore, VM categories specified in another protection policy are not in the list. If you protect a guest VM in another protection policy by specifying the VM category of the guest VM (category-based inclusion), and if you protect the guest VM from the VMs page in this policy (individual inclusion), the individual inclusion supersedes the category-based inclusion. Effectively, only the protection policy that protected the individual guest VM protects the guest VM.

      For example, the guest VM VM_SherlockH is in the category Department:Admin , and you add this category to the protection policy named PP_AdminVMs . Now, if you add VM_SherlockH from the VMs page to another protection policy named PP_VMs_UK , VM_SherlockH is protected in PP_VMs_UK and unprotected from PP_AdminVMs .

    6. If you want to protect the guest VMs category wise, check the VM categories that you want to protect from the list and click Add .
      Figure. Protection Policy Configuration: Add VM Categories Click to enlarge Protection Policy Configuration: Add VM Categories

      Prism Central includes built-in VM categories for frequently encountered applications (for example, MS Exchange and Oracle). If the VM category or value you want is not available, first create the category with the required values, or update an existing category so that it has the values you require. Doing so ensures that the VM categories and values are available for selection. You can add VMs to the category either before or after you configure the protection policy. If the guest VMs have a common characteristic, such as belonging to a specific application or location, create a VM category and add the guest VMs into the category.

      If you do not want to protect the guest VMs category wise, proceed to the next step without checking VM categories. You can add the guest VMs individually to the protection policy later from the VMs page (see Adding Guest VMs Individually to a Protection Policy).

    7. Click Create .
      The protection policy with an Asynchronous replication schedule is created. To verify the protection policy, see the Protection Policies page. You can add VMs individually (without VM categories) to the protection policy or remove VMs from the protection policy. For information about the operations that you can perform on a protection policy, see Protection Policy Management.

Creating a Recovery Plan (Xi Leap)

To orchestrate the failover (disaster recovery) of the protected guest VMs to the recovery AZ, create a recovery plan. After a failover, a recovery plan recovers the protected guest VMs to the recovery AZ (AZ). To create a recovery plan, perform the following procedure at Xi Cloud Services. You can also create a recovery plan at the on-prem AZ. The recovery plan you create or update at the on-prem AZ synchronizes back to Xi Cloud Service.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to Recovery Plans in the left pane.
  3. Click Create Recovery Plan .
    Specify the following information in the Create Protection Policy window.
    1. Primary Location : Select the primary AZ that hosts the guest VMs to protect. This list displays the Local AZ by default and is unavailable for editing.
    2. Recovery Location : Select the on-prem AZ where you want to replicate the recovery points.
    3. Click Proceed .
    Tip: After you create the recovery plan, you cannot change the Recovery Location from the Recovery Plans page. To change the recovery location on an existing recovery plan, do the following.
    • Update the protection policy to point to the new recovery location. For more information, see Updating a Protection Policy.
    • Configure the network mapping. For more information, see Nutanix Virtual Networks.
    Caution: If all the VMs in the recovery plan do not point to the new recovery location, you get an AZ conflict alert.
  4. In the General tab, enter Recovery Plan Name , Recovery Plan Description . Click Next .
    Figure. Recovery Plan Configuration: General Click to enlarge Recovery Plan Configuration: General

  5. In the Power On Sequence tab, click + Add Entities to add VMs to the sequence and do the following.
    Figure. Recovery Plan Configuration: Add Entities
    Click to enlarge Recovery Plan Configuration: Adding Entities

    1. In the Search Entities by , select VM Name from the drop-down list to specify VMs by name.
    2. In the Search Entities by , select Category from the drop-down list to specify VMs by category.
    3. To add the VMs or VM categories to the stage, select the VMs or VM categories from the list.
      Note: The VMs listed in the search result are in the active state of replication.
    4. Click Add .
    The selected VMs are added to the sequence. You can also create multiple stages and add VMs to those stages to define their power-on sequence. For more information about stages, see Stage Management.
    Caution: Do not include the guest VMs protected with Asynchronous, NearSync, and Synchronous replication schedules in the same recovery plan. You can include guest VMs protected with Asynchronous or NearSync replication schedules in the same recovery plan. However, if you combine these guest VMs with the guest VMs protected by Synchronous replication schedules in a recovery plan, the recovery fails.
  6. To manage in-guest script execution on guest VMs during recovery, select the individual VMs or VM categories in the stage. Click Manage Scripts and then do the following.
    Note: In-guest scripts allow you to automate various task executions upon recovery of the VMs. For example, in-guest scripts can help automate the tasks in the following scenarios.

    • After recovery, the VMs must use new DNS IP addresses and also connect to a new database server that is already running at the recovery AZ.

      Traditionally, to achieve this new configuration, you would manually log on to the recovered VM and modify the relevant files. With in-guest scripts, you have to write a script to automate the required steps and enable the script when you configure a recovery plan. The recovery plan execution automatically invokes the script and performs the reassigning of DNS IP address and reconnection to the database server at the recovery AZ.

    • If VMs are part of domain controller AZA.com at the primary AZ AZ1 , and after the VMs recover on the AZ AZ2 , you want to add the recovered VMs to the domain controller AZB.com .

      Traditionally, to reconfigure, you would manually log on to the VM, remove the VM from an existing domain controller, and then add the VM to a new domain controller. With in-guest scripts, you can automate the task of changing the domain controller.

    Note: In-guest script execution requires NGT version 1.9 or newer installed on the VM. The in-guest scripts run as a part of the recovery plan only if they have executable permissions for the following.
    • Administrator user (Windows)
    • Root user (Linux)
    Note: You can have only two in-guest batch or shell scripts—one for production (planned and unplanned failover) while the other for test failover. One script, however, can invoke other scripts. Place the scripts at the following locations in the VMs.
    • In Windows VMs,
      • Batch script file path for production failover:
        C:\Program Files\Nutanix\scripts\production\vm_recovery.bat
      • Batch script file path for test failover:
        C:\Program Files\Nutanix\scripts\test\vm_recovery.bat
    • In Linux VMs,
      • Shell script file path for production failover:
        /usr/local/sbin/production_vm_recovery
      • Shell script file path for test failover:
        /usr/local/sbin/test_vm_recovery
    Note: When an in-guest script runs successfully, it returns code 0 . Error code 1 signifies that the execution of the in-guest script was unsuccessful.
    Figure. Recovery Plan Configuration: In-guest Script Execution
    Click to enlarge Recovery Plan Configuration: In-guest Script execution

    1. To enable script execution, click Enable .
      A command prompt icon appears against the VMs or VM categories to indicate that in-guest script execution is enabled on those VMs or VM categories.
    2. To disable script execution, click Disable .
  7. In the Network Settings tab, map networks in the primary cluster to networks at the recovery cluster.
    Figure. Recovery Plan Configuration: Network Settings
    Click to enlarge Recovery Plan Configuration: Network Mapping

    Network mapping enables replicating the network configurations of the primary clusters to the recovery clusters, and recover VMs into the same subnet at the recovery cluster. For example, if a VM is in the vlan0 subnet at the primary cluster, you can configure the network mapping to recover that VM in the same vlan0 subnet at the recovery cluster. To specify the source and destination network information for a network mapping, do the following in Local AZ (Primary) and PC 10.51.1xx.xxx (Recovery) .
    1. Under Production in Virtual Network or Port Group , select the production subnet that contains the protected VMs for which you are configuring a recovery plan. (Optional) If the virtual network is a non-IPAM network, specify the gateway IP address and prefix length in the Gateway IP/Prefix Length field.
    2. Under Test Failback in Virtual Network or Port Group , select the test subnet that you want to use for testing failback from the recovery cluster. (Optional) If the virtual network is a non-IPAM network, specify the gateway IP address and prefix length in the Gateway IP/Prefix Length field.
    3. To add a network mapping, click Add Networks at the top-right corner of the page, and then repeat the steps 7.a-7.b.
      Note: The primary and recovery Nutanix clusters must have identical gateway IP addresses and prefix length. Therefore you cannot use a test failover network for two or more network mappings in the same recovery plan.
    4. Click Done .
    Note: For ESXi, you can configure network mapping for both standard and distributed (DVS) port groups. For more information about DVS, see VMware documentation.
    Caution: Leap does not support VMware NSX-T datacenters. For more information about NSX-T datacenters, see VMware documentation.
  8. If you want to enable the VMs in the production VPC to access the Internet, enable Outbound Internet Access .
  9. To assign floating IP addresses to the VMs when they are running in Xi Cloud Services, click + Floating IPs in Floating IPs section and do the following.
    Figure. Recovery Plan Configuration: Assign Floating IP Address
    Click to enlarge Recovery Plan Configuration: Assign Floating IP Addresses

    1. In the NUMBER OF FLOATING IPS , enter the number of floating IP addresses you need for assigning to VMs.
    2. In the ASSIGN FLOATING IPS TO VMS (OPTIONAL) , enter the name of the VMs and select the IP address for it.
    3. In Actions , click Save .
    4. To assign a floating IP address to another VM, click + Assign Floating IP , and then repeat the steps for assigning a floating IP address.
  10. Click Done .

    The recovery plan is created. To verify the recovery plan, see the Recovery Plans page. You can modify the recovery plan to change the recovery location, add, or remove the protected guest VMs. For information about various operations that you can perform on a recovery plan, see Recovery Plan Management.

Failover and Failback Operations (Xi Leap)

You perform failover of the protected guest VMs when unplanned failure events (for example, natural disasters) or planned events (for example, scheduled maintenance) happen at the primary AZ (AZ) or the primary cluster. The protected guest VMs migrate to the recovery AZ where you perform the failover operations. On recovery, the protected guest VMs start in the Xi Cloud Services region you specify in the recovery plan that orchestrates the failover.

The following are the types of failover operations in Xi Leap.

Test Failover
To ensure that the protected guest VMs failover efficiently to the recovery AZ, you perform a test failover. When you perform a test failover, the guest VMs recover in the virtual network designated for testing purposes at the recovery AZ (a manually created virtual subnet in the test VPC in Xi Cloud Services). However, the guest VMs at the primary AZ are not affected. Test failovers rely on the presence of VM recovery points at the recovery AZs.
Planned Failover
To ensure VM availability when you foresee service disruption at the primary AZ, you perform a planned failover to the recovery AZ. For a planned failover to succeed, the guest VMs must be available at the primary AZ. When you perform a planned failover, the recovery plan first creates a recovery point of the protected guest VM, replicates the recovery point to the recovery AZ, and then starts the guest VM at the recovery AZ. The recovery point used for migration is retained indefinitely. After a planned failover, the guest VMs no longer run at the primary AZ. After a planned failover, the VMs no longer run at the primary AZ.
Unplanned Failover
To ensure VM availability when a disaster causing service disruption occurs at the primary AZ, you perform an unplanned failover to the recovery AZ. In an unplanned failover, you can expect some data loss to occur. The maximum data loss possible is equal to the least RPO you specify in the protection policy, or the data that was generated after the last manual recovery point for a given guest VM. In an unplanned failover, by default, the protected guest VMs recover from the most recent recovery point. However, you can recover from an earlier recovery point by selecting a date and time of the recovery point.

After the failover, replication begins in the reverse direction. You can perform an unplanned failover operation only if recovery points have replicated to the recovery cluster. At the recovery AZ, failover operations cannot use recovery points that were created locally in the past. For example, if you perform an unplanned failover from the primary AZ AZ1 to recovery AZ AZ2 in Xi Cloud Services and then attempt an unplanned failover (failback) from AZ2 to AZ1 , the recovery succeeds at AZ1 only if the recovery points are replicated from AZ2 to AZ1 after the unplanned failover operation. The unplanned failover operation cannot perform recovery based on the recovery points that were created locally when the VMs were running in AZ1 .

The procedure for performing a planned failover is the same as the procedure for performing an unplanned failover. You can perform a failover even in different scenarios of network failure. For more information about network failure scenarios, see Nutanix Disaster Recovery and Xi Leap Failover Scenarios.

You can also perform self-service restore on Xi Cloud Services. For more information, see Self-Service Restore.

Performing a Test Failover (Xi Leap)

After you create a recovery plan, you can run a test failover periodically to ensure that the failover occurs smoothly when required. You can perform the test failover from Xi Cloud Services.

About this task

To perform a test failover to Xi Cloud Services, do the following.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to Recovery Plans in the left pane.
  3. Select the recovery plan that you want to test.
  4. Click Test from the Actions drop-down menu.
  5. In the Test Recovery Plan dialog box, do the following.
    1. In Primary Location , select the primary AZ (AZ).
    2. In Recovery Location , select the recovery AZ.
    3. Click Test .
    If you get errors or warnings, see the failure report that is displayed. Click the report to review the errors and warnings. Resolve the error conditions and then restart the test procedure.
  6. Click Close .
Cleaning up Test VMs (Xi Leap)

After testing a recovery plan, you can remove the test VMs that the recovery plan created in the recovery test network on Xi Cloud Services. To clean up the test VMs created when you test a recovery plan, do the following.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to Recovery Plans in the left pane.
  3. Click the recovery plans whose VMs you want to remove.
  4. Click Clean Up Test VMs from the Actions drop-down menu.
  5. In the Clean Up Test VMs dialog box, click Clean .
    Test VMs are deleted. If you get errors or warnings, see the failure report that is displayed. Click the report to review the errors and warnings. Resolve the error conditions and then restart the test procedure.
Performing a Planned Failover (Xi Leap)

Perform a planned failover at the recovery AZ. To perform a planned failover to Xi Cloud Services, do the following procedure.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to Recovery Plans in the left pane.
    Figure. Planned Failover
    Click to enlarge Planned Failover

  3. Select a recovery plan for the failover operation.
  4. Click Failover from the Actions drop-down menu. Specify the following information in the Failover from Recovery Plan dialog box.
    Note: The Failover action is available only when all the selected recovery plans have the same primary and recovery locations.
    Figure. Planned Failover
    Click to enlarge Planned Failover

    1. Failover Type : Click Planned Failover .
    2. Failover From (Primary) : Select the protected primary cluster.
    3. Failover To (Recovery) : Select the recovery cluster where you want the VMs to failover. This list displays Local AZ by default and is unavailable for editing.
    Note: Click + to add more combinations of primary and recovery clusters. You can add as many primary clusters as there are in the selected recovery plan.
  5. Click Failover .
    The Failover from Recovery Plan dialog box lists the errors and warnings, if any, and allows you to stop or continue the failover operation.
  6. If you see errors, do the following.
    1. To review errors or warnings, click View Details in the description.
    2. Click Cancel to return to the Failover from Recovery Plan dialog box.
    3. Select one of the following.
      • To stop the failover operation, click Abort .
      • To continue the failover operation despite the warnings, click Execute Anyway .
      You cannot continue the failover operation when the validation fails with errors.
    Note:

    The entities of AHV/ESXi clusters recover at a different path on the ESXi clusters if their files conflict with the existing files on the recovery ESXi cluster. For example, there is a file name conflict if a VM (VM1) migrates to a recovery cluster that already has a VM (VM1) in the same container.

    However, the entities recover at a different path with VmRecoveredAtAlternatePath alert only if the following conditions are met.

    • Both the primary and the recovery clusters (Prism Elements) are of version 5.17 or newer.
    • A path for the entity recovery is not defined while initiating the failover operation.
    • The protected entities do not have shared disk/s.

    If these conditions are not satisfied, the failover operation fails.

Performing an Unplanned Failover (Xi Leap)

Perform an unplanned failover at the recovery AZ. To perform an unplanned failover to Xi Cloud Services, do the following procedure.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to Recovery Plans in the left pane.
  3. Select a recovery plan for the failover operation.
  4. Click Failover from the Actions drop-down menu. Specify the following information in the Failover from Recovery Plan dialog box.
    Note: The Failover action is available only when all the selected recovery plans have the same primary and recovery locations.
    Figure. Unplanned Failover
    Click to enlarge Unplanned Failover

    1. Failover Type : Click Unplanned Failover and do one of the following.
      • Click Recover from latest Recovery Point to use the latest recovery point for recovery.
      • Click Recover from specific point in time to use a recovery point taken at a specific point in time for recovery.
    2. Failover From (Primary) : Select the protected primary cluster.
    3. Failover To (Recovery) : Select the recovery cluster where you want the VMs to failover. This list displays Local AZ by default and is unavailable for editing.
    Note: Click + to add more combinations of primary and recovery clusters. You can add as many primary clusters as there are in the selected recovery plan.
    Note: If recovery plans contain VM categories, the VMs from those categories recover in the same category after an unplanned failover to the recovery AZ. Also, the recovery points keep generating at the recovery AZ for those recovered VMs. Since the VM count represents the number of recoverable VMs (calculated from recovery points), the recovered VMs and their newly created recovery points sum up. Their sum gives double the count of the originally recovered VMs on the recovery plans page. Now, if some VMs belonging to the given category at the primary or recovery AZ are deleted, the VM count at both AZs still stay the same until the recovery points of deleted VMs expire. For example, when two VMs have failed over, the recovery plans page at the recovery AZ shows four VMs (two replicated recovery points from source and two newly generated recovery points). The page shows four VMs even if the VMs are deleted from the primary or recovery AZ. The VM count synchronizes and becomes consistent in the subsequent RPO cycle conforming to the retention policy set in the protection policy (due to the expiration of recovery points).
  5. Click Failover .
    The Failover from Recovery Plan dialog box lists the errors and warnings, if any, and allows you to stop or continue the failover operation.
  6. If you see errors, do the following.
    1. To review errors or warnings, click View Details in the description.
    2. Click Cancel to return to the Failover from Recovery Plan dialog box.
    3. Select one of the following.
      • To stop the failover operation, click Abort .
      • To continue the failover operation despite the warnings, click Execute Anyway .
      Note: You cannot continue the failover operation when the validation fails with errors.
      Note:

      The entities of AHV/ESXi clusters recover at a different path on the ESXi clusters if their files conflict with the existing files on the recovery ESXi cluster. For example, there is a file name conflict if a VM (VM1) migrates to a recovery cluster that already has a VM (VM1) in the same container.

      However, the entities recover at a different path with VmRecoveredAtAlternatePath alert only if the following conditions are met.

      • Both the primary and the recovery clusters (Prism Elements) are of version 5.17 or newer.
      • A path for the entity recovery is not defined while initiating the failover operation.
      • The protected entities do not have shared disk/s.

      If these conditions are not satisfied, the failover operation fails.

    Note: To avoid conflicts when the primary AZ becomes active after the failover, shut down the guest VMs associated with this recovery plan. Manually power off the guest VMs on either primary or recovery AZ after the failover is complete. You can also block the guest VMs associated with this recovery plan through the firewall.

Performing Failback (Xi Leap)

A failback is similar to a failover but in the reverse. The same recovery plan applies to both the failover and the failback operations. Therefore, how you perform a failback is identical to how you perform a failover. Log on to the AZ where you want the VMs to failback, and then perform a failover. For example, if you failed over VMs from an on-prem AZ to Xi Cloud Services, to failback to the on-prem AZ, perform the failover from the on-prem AZ.

About this task

To perform a failback, do the following procedure at the primary AZ.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Policies > Recovery Plans in the left pane.
  3. Select a recovery plan for the failover operation.
  4. Click Failover from the Actions drop-down menu.
    Note: If you select more than one recovery plan in step 3, the Failover action is available only when the selected recovery plans have the same primary and recovery locations.
    Specify the following information in the Failover from Recovery Plan window. The window auto-populates the Failover From and Failover To locations from the recovery plan you select in step 3.
    Figure. Unplanned Failover
    Click to enlarge Unplanned Failover

    1. Failover Type : Click Unplanned Failover and do one of the following.
      Tip: You can also click Planned Failover to perform planned failover procedure for a failback.
      • Click Recover from latest Recovery Point to use the latest recovery point for recovery.
      • Click Recover from specific point in time to use a recovery point taken at a specific point in time for recovery.
    2. Click + Add target clusters if you want to failover to specific Nutanix clusters at the primary AZ.
      If you do not add target clusters, the recovery plan recovers the guest VMs to any eligible cluster at the primary AZ.
    Note: If recovery plans contain VM categories, the VMs from those categories recover in the same category after an unplanned failover to the recovery AZ. Also, the recovery points keep generating at the recovery AZ for those recovered VMs. Since the VM count represents the number of recoverable VMs (calculated from recovery points), the recovered VMs and their newly created recovery points sum up. Their sum gives double the count of the originally recovered VMs on the recovery plans page. Now, if some VMs belonging to the given category at the primary or recovery AZ are deleted, the VM count at both AZs still stay the same until the recovery points of deleted VMs expire. For example, when two VMs have failed over, the recovery plans page at the recovery AZ shows four VMs (two replicated recovery points from source and two newly generated recovery points). The page shows four VMs even if the VMs are deleted from the primary or recovery AZ. The VM count synchronizes and becomes consistent in the subsequent RPO cycle conforming to the retention policy set in the protection policy (due to the expiration of recovery points).
  5. Click Failover .
    The Failover from Recovery Plan dialog box lists the errors and warnings, if any, and allows you to stop or continue the failover operation. If there are no errors or you resolve the errors in step 6, the guest VMs failover to the recovery cluster.
  6. If you see errors, do the following.
    • To review errors or warnings, click View Details in the description.

      Resolve the error conditions and then restart the failover procedure.

    • Select one of the following.
      • To stop the failover operation, click Abort .
      • To continue the failover operation despite the warnings, click Execute Anyway .
        Note: You cannot continue the failover operation when the validation fails with errors.
    Note:

    The entities of AHV/ESXi clusters recover at a different path on the ESXi clusters if their files conflict with the existing files on the recovery ESXi cluster. For example, there is a file name conflict if a VM (VM1) migrates to a recovery cluster that already has a VM (VM1) in the same container.

    However, the entities recover at a different path with VmRecoveredAtAlternatePath alert only if the following conditions are met.

    • Prism Element running on both the primary and the recovery Nutanix clusters are of version 5.17 or newer.
    • A path for the entity recovery is not defined while initiating the failover operation.
    • The protected entities do not have shared disk/s.

    If these conditions are not satisfied, the failover operation fails.

    Note: To avoid conflicts when the primary AZ becomes active after the failover, shut down the guest VMs associated with this recovery plan. Manually power off the guest VMs on either primary or recovery AZ after the failover is complete. You can also block the guest VMs associated with this recovery plan through the firewall.

Monitoring a Failover Operation (Xi Leap)

After you trigger a failover operation, you can monitor failover-related tasks. To monitor a failover, do the following.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to Recovery Plans in the left pane.
  3. Click the name of the recovery plan for which you triggered failover.
  4. Click the Tasks tab.
    The left pane displays the overall status. The table in the details pane lists all the running tasks and their individual statuses.

UEFI and Secure Boot Support for CHDR

Nutanix supports CHDR migrations of guest VMs having UEFI and Secure Boot.

Table 1. Nutanix Software - Minimum Requirements
Nutanix Software Minimum Supported Version
Minimum AOS 5.19.1
Minimum PC pc.2021.1
Minimum NGT 2.1.1
Table 2. Applications and Operating Systems Requirements - UEFI
Operating Systems Versions
Microsoft Windows
  • Microsoft Windows 10
  • Microsoft Windows Server 2016
  • Microsoft Windows Server 2019
Linux
  • CentOS Linux 7.3
  • Ubuntu 18.04
  • Red Hat Enterprise Linux Server versions 7.1 and 7.7
Table 3. Applications and Operating Systems Requirements - Secure Boot
Operating Systems Versions
Microsoft Windows
  • Microsoft Windows Server 2016
  • Microsoft Windows Server 2019
Linux
  • CentOS Linux 7.3
  • Red Hat Enterprise Linux Server versions 7.7
Table 4. Recovery Limitations
System Configuration Limitation

Microsoft Windows Defender Credential Guard

VMs which have Credential Guard enabled cannot be recovered with CHDR recovery solution.

IDE + Secure Boot

VMs on ESXi which have IDE Disks or CD-ROM and Secure Boot enabled cannot be recovered on AHV.

UEFI VMs on CentOS 7.4, CentOS 7.5 and Ubuntu 16.04 may fail to boot after CHDR migration.

CentOS 7.4, CentOS 7.5 and Ubuntu 16.04 UEFI VMs do not boot after cross-hypervisor disaster recovery migrations.

See KB-10633 for more information about this limitation. Contact Nutanix Support for assistance with this limitation.

UEFI VM may fail to boot after failback.

When a UEFI VM is booted on AHV for the first time, UEFI firmware settings of the VM are initialized. The next step is to perform a guest reboot or guest shutdown to fully flush the settings into persistent storage in the NVRAM.

If this UEFI VM is failed over to an ESXi host without performing the guest reboot/shutdown, the UEFI settings of the VM remain partial. Although the VM boots on ESXi, it fails to boot on AHV when a failback is performed.

See KB-10631 for more information about this limitation. Contact Nutanix Support for assistance with this limitation.

Protection with NearSync Replication and DR (Xi Leap)

NearSync replication enables you to protect your data with an RPO of as low as 1 minute. You can configure a protection policy with NearSync replication by defining the VMs or VM categories. The policy creates a recovery point of the VMs in minutes (1–15 minutes) and replicates it to Xi Cloud Services. You can configure disaster recovery with Asynchronous replication between an on-prem AHV or ESXi clusters and Xi Cloud Services. You can also perform cross-hypervisor disaster recovery (CHDR)—disaster recovery of VMs from AHV clusters to ESXi clusters or of VMs from ESXi clusters to AHV clusters.

Note: Nutanix provides multiple disaster recovery (DR) solutions to secure your environment. See Nutanix Disaster Recovery Solutions for the detailed representation of the DR offerings of Nutanix.

The following are the advantages of NearSync replication.

  • Protection for the mission-critical applications. Securing your data with minimal data loss if there is a disaster, and providing you with more granular control during the recovery process.
  • No minimum network latency or distance requirements.
    Note: However, a maximum of 75 ms network latency is allowed for replication between an AHV cluster and Xi Cloud Services.
  • Low stun time for VMs with heavy I/O applications.

    Stun time is the time of application freeze when the recovery point is taken.

  • Allows resolution to a disaster event in minutes.

To implement the NearSync feature, Nutanix has introduced a technology called lightweight snapshots (LWSs). LWS recovery points are created at the metadata level only, and they continuously replicate incoming data generated by workloads running on the active clusters. LWS recovery points are stored in the LWS store, which is allocated on the SSD tier. When you configure a protection policy with NearSync replication, the system allocates the LWS store automatically.

Note: The maximum LWS store allocation for each node is 360 GB. For the hybrid systems, it is 7% of the SSD capacity on that node.

Transitioning in and out of NearSync

When you configure a protection policy with NearSync replication, the policy remains in an hourly schedule until its transition into NearSync is complete.

To transition into NearSync, initial seeding of the recovery AZ with the data is performed, the recovery points are taken on an hourly basis, and replicated to the recovery AZ. After the system determines that the recovery points containing the seeding data have replicated within a specified amount of time (default is an hour), the system automatically transitions the protection policy into NearSync depending on the bandwidth and the change rate. After you transition into NearSync, you can see the configured NearSync recovery points in the web interface.

The following are the characteristics of the process.

  • Until you are transitioned into NearSync, you can see only the hourly recovery points in Prism Central.
  • If for any reason, a VM transitions out of NearSync, the system raises alerts in the Alerts dashboard, and the protection policy transitions out to the hourly schedule. The system continuously tries to get to the NearSync schedule that you have configured. If the transition is successful, the protection policy automatically transitions back into NearSync, and alerts specific to this condition are raised in the Alerts dashboard.

To transition out of NearSync, you can do one of the following.

  • Delete the protection policy with NearSync replication that you have configured.
  • Update the protection policy with NearSync replication to use an hourly RPO.
  • Unprotect the VMs.
    Note: There is no transitioning out of the protection policy with NearSync replication on the addition or deletion of a VM.

Repeated transitioning in and out of NearSync can occur because of the following reasons.

  • LWS store usage is high.
  • The change rate of data is high for the available bandwidth between the primary and the recovery AZs.
  • Internal processing of LWS recovery points is taking more time because the system is overloaded.

Retention Policy

Depending on the RPO (1–15 minutes), the system retains the recovery points for a specific amount of time. For protection policy with NearSync replication, you can configure the retention policy for days, weeks, or months on both the primary and recovery AZs instead of defining the number of recovery points you want to retain. For example, if you desire an RPO of 1 minute and want to retain the recovery points for 5 days, the following retention policy is applied.

  • For every 1 minute, a recovery point is created and retained for a maximum of 15 minutes.
    Note: The recent 15 recovery points are only visible in Prism Central and are available for the restore operation.
  • For every hour, a recovery point is created and retained for 6 hours.
  • One daily recovery point is created and retained for 5 days.

You can also define recovery point retention in weeks or months. For example, if you configure a 3-month schedule, the following retention policy is applied.

  • For every 1 minute, a recovery point is created and retained for 15 minutes.
  • For every hour, a recovery point is created and retained for 6 hours.
  • One daily recovery point is created and retained for 7 days.
  • One weekly recovery point is created and retained for 4 weeks.
  • One monthly recovery point is created and retained for 3 months.
Note:
  • You can define different retention policies on the primary and recovery AZs.
  • The system retains subhourly and hourly recovery points for 15 minutes and 6 hours respectively. Maximum retention time for days, weeks, and months is 7 days, 4 weeks, and 12 months respectively.
  • If you change the protection policy configuration from hourly schedule to minutely schedule (Asynchronous to NearSync), the first recovery point is not created according to the new schedule. The recovery points are created according to the start time of the old hourly schedule (Asynchronous). If you want to get the maximum retention for the first recovery point after modifying the schedule, update the start time accordingly for NearSync.

NearSync Replication Requirements (Xi Leap)

The following are the specific requirements of configuring protection policies with NearSync replication schedule in Xi Leap. Ensure that you meet the following requirements in addition to the general requirements of Xi Leap.

For more information about the general requirements of Xi Leap, see Xi Leap Requirements.

For information about the on-prem node, disk and Foundation configurations required to support NearSync replication schedules, see On-Prem Hardware Resource Requirements.

Hypervisor Requirements

AHV or ESXi clusters running AOS 5.17 or newer, each registered to a different Prism Central

  • The on-prem AHV clusters must be running on version 20190916.189 or newer.
  • The on-prem ESXi clusters must be running on version ESXi 6.5 GA or newer.

Nutanix Software Requirements

The on-prem Prism Central and its registered clusters (Prism Elements) must be running the following versions of AOS.

  • AOS 5.17 or newer with AHV.
  • AOS 5.17 or newer with ESXi.

Cross Hypervisor Disaster Recovery (CHDR) Requirements

Data Protection with NearSync replication supports cross-hypervisor disaster recovery. You can configure disaster recovery to recover VMs from AHV clusters to ESXi clusters or VMs from ESXi clusters to AHV clusters by considering the following requirement of CHDR.

  • The on-prem clusters are running AOS 5.18 or newer.
  • Install and configure Nutanix Guest Tools (NGT) on all the guest VMs. For more information, see Enabling and Mounting Nutanix Guest Tools in Prism Web Console Guide .

    NGT configures the guest VMs with all the required drivers for VM portability. For more information about general NGT requirements, see Nutanix Guest Tools Requirements and Limitations in Prism Web Console Guide .

  • CHDR supports guest VMs with flat files only.
  • CHDR supports IDE/SCSI disks only.
    Tip: From AOS 5.19.1, CHDR supports SATA disks also.
  • For all the non-boot SCSI disks of Windows guest VMs, set the SAN policy to OnlineAll so that they come online automatically.
  • In vSphere 6.7, guest VMs are configured with UEFI secure boot by default. Upon CHDR to an AHV cluster, these guest VMs do not start if the host does not support the UEFI secure boot feature. For more information about supportability of UEFI secure boot on Nutanix clusters, see Compatibility Matrix.

  • For information about operating systems that support UEFI and Secure Boot, see UEFI and Secure Boot Support for CHDR.

  • Nutanix does not support vSphere inventory mapping (for example, VM folder and resource pools) when protecting workloads between VMware clusters.

  • Nutanix does not support vSphere snapshots or delta disk files. If you have delta disks attached to a VM and you proceed with failover, you get a validation warning and the VM does not recover. Contact Nutanix Support for assistance.
Note: CHDR does not preserve hypervisor-specific properties (for example, multi-writer flags, independent persistent and non-persistent disks, changed block tracking (CBT), PVSCSI disk configurations).
Note: In vSphere 6.7, guest VMs are configured with EFI secure boot by default. Upon CHDR to AHV, these guest VMs will not start if the host does not support the UEFI secure boot feature. For more information about the supportability of UEFI secure boot on Nutanix clusters, see https://portal.nutanix.com/page/documents/compatibility-interoperability-matrix/guestos.

Table 1. Operating System Supported for CHDR
Operating System Version Requirements and Limitations
Windows
  • Windows 2008 R2 or newer versions
  • Windows 7 or newer versions
  • Only 64-bit operating systems are supported.
Linux
  • CentOS 6.5 and 7.0
  • RHEL 6.5 or newer and RHEL 7.0 or newer.
  • Oracle Linux 6.5 and 7.0
  • Ubuntu 14.04
  • SLES operating system is not supported.

Additional Requirements

  • Both the primary and the recovery clusters must be of minimum three-nodes.
  • See On-Prem Hardware Resource Requirements for the on-prem hardware and Foundation configurations required to support NearSync replication schedules.
  • Set the virtual IP address and the data services IP address in the primary and the recovery clusters.
  • The recovery AZ container must have as much space as the protected VMs working size set of the primary AZ. For example, if you are protecting a VM that is using 30 GB of space on the container of the primary AZ, the same amount of space is required on the recovery AZ container.
  • The bandwidth between the two AZs must be approximately equal to or higher than the change rate of the protected VMs (maximum change rate is 20 MBps).

NearSync Replication Limitations (Xi Leap)

The following are the specific limitations of data protection with NearSync replication in Xi Leap. These limitations are in addition to the general limitations of Leap.

For information about the general limitations of Leap, see Xi Leap Limitations.

  • Deduplication enabled on storage containers having VMs protected with NearSync lowers the replication speed.
  • All files associated with the VMs running on ESXi must be located in the same folder as the VMX configuration file. The files not located in the same folder as the VMX configuration file might not recover on a recovery cluster. On recovery, the VM with such files fails to start with the following error message. Operation failed: InternalTaskCreationFailure: Error creating host specific VM change power state task. Error: NoCompatibleHost: No host is compatible with the virtual machine
  • In CHDR, NearSync replication does not support retrieving recovery points from the recovery AZs.

    For example, if you have 1 day retention at the primary AZ and 5 days retention at the recovery AZ, and you want to go back to a recovery point from 5 days ago. NearSync does not support replicating 5 days retention back from the recovery AZ to the primary AZ.

Creating a Protection Policy with NearSync Replication Schedule (Xi Leap)

Create a NearSync protection policy in the primary AZ Prism Central. The policy schedules recovery points of the protected VMs as per the set RPO and replicates them to Xi Cloud Services for availability. When creating a protection policy, you can specify only VM categories. If you want to include VMs individually, you must first create the protection policy—which can also include VM categories and then include the VMs individually in the protection policy from the VMs page.

Before you begin

Ensure that the AHV or ESXi clusters on both the primary and recovery AZ are NearSync capable. A cluster is NearSync capable if the capacity of each SSD in the cluster is at least 1.2 TB.

See NearSync Replication Requirements (Xi Leap) and NearSync Replication Limitations (Xi Leap) before you start.

About this task

To create a protection policy with NearSync replication in Xi Cloud Services, perform the following procedure.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to Protection Policies in the left pane.
  3. Click Create Protection Policy .
    Specify the following information in the Create Protection Policy window.
    Figure. Protection Policy Configuration: NearSync
    Click to enlarge Protection Policy Configuration: NearSync

    1. Name : Enter a name for the policy.
      Caution: The name can be of only alphanumeric, dot, dash, and underscore characters.
    2. Primary Location : Select the primary AZ that hosts the VMs to protect. This list displays the Local AZ by default and is unavailable for editing.
    3. Primary Cluster(s) : Select the cluster that hosts the VMs to protect.
    4. Recovery Location : Select the recovery AZ where you want to replicate the recovery points.
      If you do not select a recovery location, the local recovery points that are created by this protection policy do not replicate automatically. You can, however, replicate recovery points manually and use recovery plans to recover the VMs. For more information, see Protection and Manual DR (Xi Leap).
    5. Target Cluster : Select the NearSync capable cluster where you want to replicate the recovery points.
      This field becomes available only if the recovery location is a physical remote AZ. If the specified recovery location is an AZ in Xi Cloud Services, the Target Cluster field becomes unavailable because Xi Cloud Services selects a cluster for you. If the specified recovery location is a physical location, you can select a cluster of your choice.
      Caution: If the primary cluster contains an IBM Power Systems server, you cannot replicate recovery points to Xi Cloud Services. However, you can replicate recovery points to the on-prem target cluster if the target on-prem cluster also contains an IBM Power Systems server.
      Caution: Select auto-select from the drop-down list only if all the clusters at the recovery AZ are NearSync capable.

    6. Policy Type : Click Asynchronous .
    7. Recovery Point Objective : Specify the frequency in minutes (anywhere between 1-15 minutes) at which you want recovery points to be taken.

      By default, recovery point creation begins immediately after you create the protection policy. If you want to specify when recovery point creation must begin, click Change , and then, in the Start Time dialog box, do the following.

      Click Start from specific point in time.

      In the time picker, specify the time at which you want to start taking recovery points.

      Click Save .

      Tip: NearSync also allows you to recover the data of the minute just before the unplanned failover. For example, on a protection policy with 10 minute RPO, you can use the internal lightweight snapshots (LWS) to recover the data of the 9th minute when there is an unplanned failover.
    8. Retention Policy : Specify the type of retention policy.
      Figure. Roll-up Retention Policy Click to enlarge Roll-up Retention Policy

      • Roll-up : Rolls up the recovery points as per the RPO and retention period into a single recovery point at a AZ. For example, if you set the RPO to 1 hour, and the retention time to 5 days, the 24 oldest hourly recovery points roll up into a single daily recovery point (one recovery point = 24 hourly recovery points) after every 24 hours. The system keeps one day (of rolled-up hourly recovery points) and 4 days of daily recovery points.
        Note:
        • If the retention period is n days, the system keeps 1 day of RPO (rolled-up hourly recovery points) and n-1 days of daily recovery points.
        • If the retention period is n weeks, the system keeps 1 day of RPO, 1 week of daily and n-1 weeks of weekly recovery points.
        • If the retention period is n months, the system keeps 1 day of RPO, 1 week of daily, 1 month of weekly, and n-1 months of monthly recovery points.
        • If the retention period is n years, the system keeps 1 day of RPO, 1 week of daily, 1 month of weekly, and n-1 months of monthly recovery points.
        Note: The recovery points that are used to create a rolled-up recovery point are discarded.
        Tip: Use roll-up retention policies for anything with a longer retention period. Roll-up policies are more flexible and automatically handle recovery point aging/pruning while still providing granular RPOs for the first day.
        Note: NearSync does not support Linear retention policies. When you enter a minutely time unit in the Recovery Point Objective , the Roll-up retention policy is automatically selected.
  4. To specify the retention number for the AZs, do the following.
    1. Remote Retention : Specify the retention number for the remote AZ.
      This field is unavailable if you do not specify a recovery location.
    2. Local Retention : Specify the retention number for the local AZ.

      If you select linear retention, the remote and local retention count represents the number of recovery points to retain at any given time. If you select roll-up retention, these numbers specify the retention period.

  5. If you want to take application consistent recovery points, select Take App-Consistent Recovery Point .
    Application consistent recovery points ensure that application consistency is maintained in the replicated recovery points. For application-consistent recovery points, install NGT on the VMs running on AHV. For VMs running on ESXi, you can take application-consistent recovery points without installing NGT, but the recovery points are hypervisor-based and lead to VM stuns (temporary unresponsive VMs).
    Caution: Application-consistent recovery points fail for EFI-boot enabled Windows 2019 VM running on ESXi when NGT is not installed. Nutanix recommends installing NGT on VMs running on ESXi also.
  6. Associated Categories : To protect categories of VMs, perform the following.
    Tip: Before associating VM categories to a protection policy, determine how you want to identify the VMs you want to protect. If they have a common characteristic (for example, the VMs belong to a specific application or location), check the Categories page to ensure that both the category and the required value are available. Prism Central includes built-in categories for frequently encountered applications such as MS Exchange and Oracle. You can also create your custom categories. If the category or value you want is not available, first create the category with the required values, or update an existing category so that it has the values that you require. Doing so ensures that the categories and values are available for selection when creating the protection policy. You can add VMs to the category either before or after you configure the protection policy. For more information about VM categories, see Category Management in the Prism Central Guide .
    1. Click Add Categories .
    2. Select the VM categories from the list to add to the protection policy.
      Note:

      You cannot protect a VM by using two or more protection policies. Therefore, VM categories specified in another protection policy are not listed here. Also, if you included a VM in another protection policy by specifying the category to which it belongs (category-based inclusion), and if you add the VM to this policy by using its name (individual inclusion), the individual inclusion supersedes the category-based inclusion. Effectively, the VM is protected only by this protection policy and not by the protection policy in which its category is specified.

      For example, the guest VM VM_SherlockH is in the category Department:Admin , and you add this category to the protection policy named PP_AdminVMs . Now, if you add VM_SherlockH from the VMs page to another protection policy named PP_VMs_UK , VM_SherlockH is protected in PP_VMs_UK and unprotected from PP_AdminVMs .

    3. Click Save .
    Tip: To add or remove categories from the existing protection policy, click Update .
  7. Click Save .
    You have successfully created a protection policy with NearSync replication in Xi Leap. You can add VMs individually (without VM categories) to the protection policy or remove VMs from the protection policy. For information about the operations that you can perform on a protection policy, see Protection Policy Management.

Creating a Recovery Plan (Xi Leap)

Create a recovery plan in the primary Prism Central. The procedure for creating a recovery plan is the same for all the data protection strategies in Xi Leap.

For more information about creating a recovery plan in Xi Leap, see Creating a Recovery Plan (Xi Leap).

Protection Policy Management

A protection policy automates the creation and replication of recovery points. When configuring a protection policy for creating local recovery points, you specify the RPO, retention policy, and the VMs that you want to protect. You also specify the recovery location if you want to automate recovery point replication to Xi Cloud Services.

When you create, update, or delete a protection policy, it synchronizes to the paired Xi Cloud Services. The recovery points automatically start replicating in the reverse direction after you perform a failover at the recovery Xi Cloud Services. For information about how Xi Leap determines the list of AZs for synchronization, see Entity Synchronization Between Paired AZs.

Note: A VM cannot be simultaneously protected by a protection domain and a protection policy. If you want to use a protection policy to protect a VM that is part of a protection domain, first remove the VM from the protection domain, and then include it in the protection policy. For information, see Migrating Guest VMs from a Protection Domain to a Protection Policy

Adding Guest VMs Individually to a Protection Policy

You can also add VMs directly to a protection policy from the VMs page, without the use of a VM category. To add VMs directly to a protection policy in Xi Cloud Services, perform the following procedure.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to VMs in the left pane.
  3. Click Protect from the Actions drop-down menu.
    Figure. Protect VMs Individually
    Click to enlarge Protect VMs Individually

  4. Select the protection policy in the table to include the VMs in a protection policy.
    Figure. Protection Policy Selection
    Click to enlarge Protection Policy Selection

  5. Click Protect .
    The VMs are added to the selected protection policy. The updated protection policy starts synchronizing to the recovery Prism Central.

Removing Guest VMs Individually from a Protection Policy

You can directly remove guest VMs from a protection policy from the VMs page. To remove guest VMs from a protection policy in Xi Cloud Services, perform the following procedure.

About this task

Note: If a guest VM is protected individually (not through VM categories), you can remove it from the protection policy only by using this individual removal procedure.
Note: If a guest VM is protected under a VM category, you cannot remove the guest VM from the protection policy with this procedure. You can remove the guest VM from the protection policy only by dissociating the guest VM from the category.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to VMs in the left pane.
  3. Select the guest VMs that you want to remove from a protection policy.
  4. Click UnProtect from the Actions drop-down menu.
    The selected guest VMs are removed from the protection policy. The updated protection policy starts synchronizing to the recovery Prism Central.
    Note: Delete all the recovery points associated with the guest VM to avoid incurring subscription charges. The recovery points adhere to the expiration period set in the protection policy and unless deleted individually, continue to incur charges until the expiry.

Cloning a Protection Policy

If the requirements of the protection policy that you want to create are similar to an existing protection policy in Xi Cloud Services, you can clone the existing protection policy and update the clone.

About this task

To clone a protection policy from Xi Cloud Services, perform the following procedure.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to Protection Policies in the left pane.
  3. Select the protection policy that you want to clone.
  4. Click Clone from the Actions drop-down menu.
  5. Make the required changes on the Clone Protection Policy page. For information about the fields on the page, see:
    • Creating a Protection Policy with Asynchronous Replication Schedule (Xi Leap)
    • Creating a Protection Policy with NearSync Replication Schedule (Xi Leap)
  6. Click Save .
    The selected protection policy is cloned. The updated protection policy starts synchronizing to the recovery Prism Central.

Updating a Protection Policy

You can modify an existing protection policy in the Xi Cloud Services. To update an existing protection policy in Xi Cloud Services, perform the following procedure.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to Protection Policies in the left pane.
  3. Select the protection policy that you want to update.
  4. Click Update from the Actions drop-down menu.
  5. Make the required changes on the Update Protection Policy page. For information about the fields on the page, see:
    • Creating a Protection Policy with Asynchronous Replication Schedule (Xi Leap)
    • Creating a Protection Policy with NearSync Replication Schedule (Xi Leap)
  6. Click Save .
    The selected protection policy is updated. The updated protection policy starts synchronizing to the recovery Prism Central.

Finding the Protection Policy of a Guest VM

You can use the data protection focus on the VMs page to determine the protection policies to which a VM belongs in Xi Cloud Services. To determine the protection policy in Xi Cloud Services to which a VM belongs, do the following.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to VMs in the left pane.
  3. Click Data Protection from the Focus menu at the top-right corner.
    The Protection Policy column that is displayed shows the protection policy to which the VMs belong.
    Figure. Focus
    Click to enlarge Focus

  4. After you review the information, to return the VM page to the previous view, remove the Focus Data Protection filter from the filter text box.

Recovery Plan Management

A recovery plan orchestrates the recovery of protected VMs at a recovery AZ. Recovery plans are predefined procedures (runbooks) that use stages to enforce VM power-on sequence. You can also configure the inter-stage delays to recover applications gracefully. Recovery plans that recover applications in Xi Cloud Services are also capable of creating the required networks during failover and can assign public-facing IP addresses to VMs.

A recovery plan created in one AZ (AZ) replicates to the paired availability zone and works bidirectionally. After a failover from the primary AZ to a recovery AZ, you can failback to the primary AZ by using the same recovery plan.

After you create a recovery plan, you can validate or test it to ensure that recovery goes through smoothly when failover becomes necessary. Xi Cloud Services includes a built-in VPC for validating or testing failover.

Recovery plans are independent of protection policies and do not reference protection policies in their configuration information. Also, they do not create recovery points. While the process of planned failover includes the creation of a recovery point so that the latest data can be used for recovery, unplanned and test failovers rely on the availability of the required recovery points at the designated recovery AZ. A recovery plan therefore requires the VMs in the recovery plan to also be associated with a protection policy.

Recovery plans are synchronized to one or more paired AZs when they are created, updated, or deleted. For information about how Leap determines the list of AZs (AZs) for synchronization, see Entity Synchronization Between Paired AZs.

Adding Guest VMs Individually to a Recovery Plan

You can also add VMs directly to a recovery plan in the VMs page, without the use of a category. To add VMs directly to a recovery plan in Xi Cloud Services, perform the following procedure.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to VMs in the left pane.
  3. Select the VMs that you want to add to a recovery plan.
  4. Click Add to Recovery Plan from the Actions drop-down menu.
    The Update Recovery Plan page is displayed.
  5. Select the recovery plan where you want to add the VMs in the Add to Recovery Plan dialog box.
  6. Click Add .
    The Update Recovery Plan dialog box appears.
  7. In the General tab, check Recovery Plan Name , Recovery Plan Description . Click Next .
  8. In the Power On Sequence tab, add VMs to the stage. For more information, see Stage Management
  9. Click Next .
  10. In the Network Settings tab, update the network settings as required for the newly added VMs. For more information, see Creating a Recovery Plan (Xi Leap).
  11. Click Done .
    The VMs are added to the recovery plan.

Removing Guest VMs Individually from a Recovery Plan

You can also remove VMs directly from a recovery plan in Xi Cloud Services. To remove VMs directly from a protection policy, perform the following procedure.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to Recovery Plans in the left pane.
  3. Select the recovery plan from which you want to remove VM.
  4. Click Update from the Actions drop-down menu.
    The Update Recovery Plan dialog box appears.
  5. In the General tab, check Recovery Plan Name , Recovery Plan Description . Click Next .
  6. In the Power On Sequence tab, select the VMs and click More Actions > Remove .
    Note: You see More Actions in a stage only when one or more VMs in the stage are selected. When none of the VMs in the stage are selected, you see Actions .
  7. Click Next .
  8. In the Network Settings tab, update the network settings as required for the newly added VMs. For more information, see Stage Management.
  9. Click Done .
    The VMs are removed from the selected recovery plan.

Updating a Recovery Plan

You can update an existing recovery plan in Xi Cloud Services. To update a recovery plan, perform the following procedure.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to Recovery Plans in the left pane.
  3. Select the recovery plan that you want to update.
  4. Click Update from the Actions drop-down menu.
    The Update Recovery Plan dialog box appears.
  5. Make the required changes to the recovery plan. For information about the various fields and options, see Creating a Recovery Plan (Xi Leap).
  6. Click Done .
    The selected recovery plan is updated.

Validating a Recovery Plan

You can validate a recovery plan from the recovery AZ. For example, if you perform the validation in the Xi Cloud Services (primary AZ being an on-prem AZ), Leap validates failover from the on-prem AZ to Xi Cloud Services. Recovery plan validation only reports warnings and errors. Failover is not performed. In this procedure, you need to specify which of the two paired AZs you want to treat as the primary, and then select the other AZ as the secondary.

About this task

To validate a recovery plan, do the following.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to Recovery Plans in the left pane.
  3. Select the recovery plan that you want to validate.
  4. Click Validate from the Actions drop-down menu.
  5. In the Validate Recovery Plan dialog box, do the following.
    1. In Primary Location , select the primary location.
    2. In Recovery Location , select the recovery location.
    3. Click Proceed .
    The validation process lists any warnings and errors.
  6. Click Back .
    A summary of the validation is displayed. You can close the dialog box.
  7. To return to the detailed results of the validation, click the link in the Validation Errors column.
    The selected recovery plan is validated for its correct configuration.

Protection and Manual DR (Xi Leap)

Manual data protection involves manually creating recovery points, replicating recovery points, and recovering guest VMs at the recovery AZ. You can also automate some of these tasks. For example, the last step—that of manually recovering guest VMs at the recovery AZ—can be performed by a recovery plan while the underlying recovery point creation and replication can be performed by protection policies. Conversely, you can configure protection policies to automate recovery point creation and replication, and then recover guest VMs manually at the recovery AZ.

Creating Recovery Points Manually (Out-of-Band Snapshots)

About this task

To create recovery points manually in Xi Cloud Services, do the following.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to VMs in the left pane.
  3. Select the VMs for which you want to create a recovery point.
  4. Click Create Recovery Point from the Actions drop-down menu.
  5. To verify that the recovery point is created, click the name of the VM, click the Recovery Points tab, and verify that a recovery point is created.

Replicating Recovery Points Manually

You can manually replicate recovery points only from the AZ where the recovery points exist.

About this task

To replicate recovery points manually from Xi Cloud Service, do the following.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to VMs in the left pane.
  3. Click the VM whose recovery point you want to replicate, and then click Recovery Points in the left.
    The Recovery Points view lists all the recovery points of the VM.
  4. Select the recovery points that you want to replicate.
  5. Click Replicate from the Actions drop-down menu.
  6. In the Replicate dialog box, do the following.
    1. In Recovery Location , select the location where you want to replicate the recovery point.
    2. In Target Cluster , select the cluster where you want to replicate the recovery point.
    3. Click Replicate Recovery Point .

Recovering a Guest VM from a Recovery Point Manually

You can recover a VM by cloning a VM from a recovery point.

About this task

To recover a VM from a recovery point at Xi Cloud Services, do the following.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to VMs in the left pane.
  3. Click the VM whose recovery point you want to replicate, and then click Recovery Points in the left.
    The Recovery Points view lists all the recovery points of the VM.
  4. Select the recovery point from which you want to recover the VM.
  5. Click Restore from the Actions drop-down menu.
  6. In the Restore dialog box, do the following.
    1. In the text box provided for specifying a name for the VM, specify a new name or do nothing to use the automatically generated name.
    2. Click Restore .
    Warning: The following are the limitations of the manually recovered VMs (VMs recovered without the use of a recovery plan).
    • The VMs recover without a VNIC if the recovery is performed at the remote AZ.
    • VM categories are not applied.
    • NGT needs be reconfigured.

Entity Synchronization Between Paired AZs

When paired with each other, AZs (AZs) synchronize disaster recovery configuration entities. Paired AZs synchronize the following disaster recovery configuration entities.

Protection Policies
A protection policy is synchronized whenever you create, update, or delete the protection policy.
Recovery Plans
A recovery plan is synchronized whenever you create, update, or delete the recovery plan. The list of AZs (AZs) to which Xi Leap must synchronize a recovery plan is derived from the VMs that are included in the recovery plan. The VMs used to derive the AZ list are VM categories and individually added VMs.

If you specify VM categories in a recovery plan, Leap determines which protection policies use those VM categories, and then synchronizes the recovery plans to the AZs specified in those Protection Plans.

If you include VMs individually in a recovery plan, Leap uses the recovery points of those VMs to determine which protection policies created those recovery points, and then synchronizes the recovery plans to the AZs specified in those protection policies. If you create a recovery plan for VM categories or VMs that are not associated with a protection policy, Leap cannot determine the AZ list and therefore cannot synchronize the recovery plan. If a recovery plan includes only individually added VMs and a protection policy associated with a VM has not yet created VM recovery points, Leap cannot synchronize the recovery plan to the AZ specified in that protection policy. However, recovery plans are monitored every 15 minutes for the availability of recovery points that can help derive AZ information. When recovery points become available, Xi Leap derives the AZ by the process described earlier and synchronizes the recovery plan to the AZ.

VM Categories used in Protection Policies and Recovery Plans
A VM category is synchronized when you specify the VM category in a protection policy or recovery plan.
Issues such as a loss of network connectivity between paired AZs or user actions such as unpairing of AZs followed by repairing of those availability zones can affect VM synchronization.
Tip: Nutanix recommends to unprotect all the VMs on the AZ before unpairing it to avoid getting into a state where the entities have stale configurations after repairing of AZs.

If you update VMs in either or both AZs before such issues are resolved or before unpaired AZs are paired again, VM synchronization is not possible. Also, during VM synchronization, if a VM cannot be synchronized because of an update failure or conflict (for example, you updated the same VM in both AZs during a network connectivity issue), no further VMs are synchronized. Entity synchronization can resume only after you resolve the error or conflict. To resolve a conflict, use the Entity Sync option, which is available in the web console. Force synchronization from the AZ that has the desired configuration. Forced synchronization overwrites conflicting configurations in the paired AZ.
Note: Forced synchronization cannot resolve errors arising from conflicting values in VM specifications (for example, the paired AZ already has a VM with the same name).

If you do not update entities before a connectivity issue is resolved or before you pair the AZs again, the synchronization behavior described earlier resumes. Also, pairing previously unpaired AZs trigger an automatic synchronization event. For recommendations to avoid facing such issues, see Entity Synchronization Recommendations (Xi Leap).

Entity Synchronization Recommendations (Xi Leap)

Consider the following recommendations to avoid inconsistencies and the resulting synchronization issues.

  • During network connectivity issues, do not update entities at both the availability zones (AZs) in a pair. You can safely make updates at any one AZ. After the connectivity issue is resolved, force synchronization from the AZ in which you made updates. Failure to adhere to this recommendation results in synchronization failures.

    You can safely create entities at either or both the AZs as long as you do not assign the same name to entities at the two AZs. After the connectivity issue is resolved, force synchronization from the AZ where you created entities.

  • If one of the AZs becomes unavailable, or if any service in the paired AZ is down perform force synchronization from the paired AZ after the issue is resolved.

Forcing Entity Synchronization (Xi Leap)

Entity synchronization, when forced from an AZ (AZ), overwrites the corresponding entities in paired AZs. Forced synchronization also creates, updates, and removes those entities from paired AZs.

About this task

The AZ (AZ) to which a particular entity is forcefully synchronized depends on which AZ requires the entity (seeEntity Synchronization Between Paired AZs). To avoid inadvertently overwriting required entities, ensure to force VM synchronization from the AZ in which the entities have the desired configuration.

If a AZ is paired with two or more AZs (AZs), you cannot select one or more AZs with which to synchronize entities.

To force entity synchronization from Xi Cloud Services, do the following.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click the settings button (gear icon) at the top-right corner of the window.
  3. Click Entity Sync in the menu.
  4. In the Entity Sync dialog box, review the message at the top of the dialog box, and then do the following.
    1. To review the list of entities that will be synchronized to an AZ , click the number of ENTITIES adjacent to an availability zone.
    2. After you review the list of entities, click Back .
  5. Click Sync Entities .

Migrating Guest VMs from a Protection Domain to a Protection Policy

You can protect a guest VM either with a protection domain in Prism Element or with a protection policy in Prism Central. If you have guest VMs in protection domains, migrate those guest VMs to protection policies to orchestrate their disaster recovery using Leap.

Before you begin

Migration from protection domains to protection policies is a disruption process. For successful migration,
  • Ensure that the guest VMs have no on-going replication.
  • Ensure that the guest VMs do not have volume groups.
  • Ensure that the guest VMs are not in consistency groups.

About this task

To migrate a guest VM from a protection domain to a protection policy manually, perform the following procedure.

Tip: To automate the migration using a script, refer KB 10323 .

Procedure

  1. Unprotect the guest VM from the protection domain.
    Caution: Do not delete the guest VM snapshots in the protection domain. Prism Central reads those guest VM snapshots to generate new recovery points without full replication between the primary and recovery Nutanix clusters. If you delete the guest VM snapshots, the VM data replicates afresh (full replication). Nutanix recommends keeping the VM snapshots in the protection domain until the first recovery point for the guest VM is available on Prism Central.
    Caution: Use the automated script for migrating guest VMs from a large protection domain. A large protection domain consists of more than 500 guest VMs. If you migrate the guest VMs manually from a large protection domain, the VM data replicates afresh (full replication).

  2. Log on to Prism Central and protect the guest VMs with protection policies individually (see Adding Guest VMs Individually to a Protection Policy) or through VM categories.

Leap Administration Guide

Disaster Recovery (Formerly Leap) 5.20

Product Release Date: 2021-05-17

Last updated: 2022-10-12

Leap Overview

Legacy disaster recovery (DR) configurations use protection domains (PDs) and third-party integrations to protect your applications. These DR configurations replicate data between on-prem Nutanix clusters. Protection domains provide limited flexibility in terms of supporting complex operations (for example, VM boot order, network mapping). With protection domains, you have to perform manual tasks to protect new guest VMs as and when your application scales up.

Leap offers an entity-centric automated approach to protect and recover applications. It uses categories to group the guest VMs and automate the protection of the guest VMs as the application scales. Application recovery is more flexible with network mappings, an enforceable VM start sequence, and inter-stage delays. Application recovery can also be validated and tested without affecting your production workloads. Asynchronous, NearSync, and Synchronous replication schedules ensure that an application and its configuration details synchronize to one or more recovery locations for a smoother recovery.

Note: You can protect a guest VM either with legacy DR solution (protection domain-based) or with new Leap. To see various Nutanix DR solutions, refer Nutanix Disaster Recovery Solutions.

Leap works with sets of physically isolated locations called availability zones. An instance of Prism Central represents an availability zone. One availability zone serves as the primary site for an application while one or more paired availability zones serve as the recovery sites.

Figure. A primary on-prem AZ and one recovery on-prem AZ
Click to enlarge A primary on-prem AZ and one recovery on-prem AZ

Figure. A primary on-prem AZ and two recovery on-prem AZs
Click to enlarge A primary on-prem site and two recovery on-prem sites

Figure. A primary on-prem AZ and two recovery AZs: one on-prem recovery AZ and one recovery AZ in Cloud (Xi Cloud Services)
Click to enlarge Disaster recovery to an on-prem AZ and Xi Cloud

Figure. A primary on-prem AZ and one recovery AZ at Xi Cloud Services
Click to enlarge Disaster recovery to an on-prem AZ and Xi Cloud

Figure. A primary Nutanix cluster and at most two recovery Nutanix clusters at the same on-prem AZ
Click to enlarge Disaster recovery to clusters at the same on-prem AZ

Figure. A primary site at Xi Cloud Services and recovery on-prem site
Click to enlarge Disaster recovery to an on-prem site

When paired, the primary site replicates the entities (protection policies, recovery plans, and recovery points) to the recovery sites in the specified time intervals (RPO). The approach helps application recovery at any of the recovery sites when there is a service disruption at the primary site (For example, natural disasters or scheduled maintenance). The entities start replicating back to the primary site when the primary site is up and running to ensure High Availability of applications. The entities you create or update synchronize continuously between the primary and recovery sites. The reverse synchronization enables you to create or update entities (protection policies, recovery plans, or guest VMs) at either the primary or the recovery sites.

This guide is primarily divided into the following two parts.

  • Protection and DR between On-Prem Sites (Leap)

    The section walks you through the procedure of application protection and DR to other Nutanix clusters at the same or different on-prem sites. The procedure also applies to protection and DR to other Nutanix clusters in supported public cloud.

  • Protection and DR between On-Prem Site and Xi Cloud Service (Xi Leap)

    Xi Leap is essentially an extension of Leap to Xi Cloud Services. You can protect applications and perform DR to Xi Cloud Services or from Xi Cloud Services to an on-prem availability zone. The section describes application protection and DR from Xi Cloud Services to an on-prem Nutanix cluster. For application protection and DR to Xi Cloud Services, refer the supported capabilities in Protection and DR between On-Prem Sites (Leap) because the protection procedure remains the same when the primary site is an on-prem availability zone.

Configuration tasks and DR workflows are largely the same regardless of the type of recovery site. For more information about the protection and DR workflow, see Leap Deployment Workflow.

Leap Terminology

The following section describes the terms and concepts used throughout the guide. Nutanix recommends gaining familiarity with these terms before you begin configuring protection and Leap or Xi Leap disaster recovery (DR).

Availability Zone

A zone that can have one or more independent datacenters inter-connected by low latency links. An availability zone can either be in your office premises (on-prem) or in Xi Cloud Services. Availability zones are physically isolated from each other to ensure that a disaster at one availability zone does not affect another availability zone. An instance of Prism Central represents an on-prem availability zone.
Note: An availability zone is referred to as a site throughout this document.

On-Prem Availability Zone

An availability zone (site) in your premises.

Xi Cloud Services

A site in the Nutanix Enterprise Cloud Platform (Xi Cloud Services).

Primary Availability Zone

A site that initially hosts guest VMs you want to protect.

Recovery Availability Zone

A site where you can recover the protected guest VMs when a planned or an unplanned event occurs at the primary site causing its downtime. You can configure at most two recovery sites for a guest VM.

Nutanix Cluster

A cluster running AHV or ESXi nodes on an on-prem availability zone, Xi Cloud Services, or any supported public cloud. Leap does not support guest VMs from Hyper-V clusters.

Prism Element

The GUI that provides you the ability to configure, manage, and monitor a single Nutanix cluster. It is a service built into the platform for every Nutanix cluster deployed.

Prism Central

The GUI that allows you to monitor and manage many Nutanix clusters (Prism Element running on those clusters). Prism Starter, Prism Pro, and Prism Ultimate are the three flavors of Prism Central. For more information about the features available with these licenses, see Software Options.

Prism Central essentially is a VM that you deploy (host) in a Nutanix cluster (Prism Element). For more information about Prism Central, see Prism Central Guide. You can set up the following configurations of Prism Central VM.

Small Prism Central
A Prism Central VM with configuration equal to or less than 8 vCPU and 32 GB memory. The VM hot adds extra 4 GB and 1 GB memory when you enable Leap and Flow respectively in small Prism Central.
Small Prism Central (Single node)
A small Prism Central deployed in a single VM.
Small Prism Central (Scaleout)
Three small Prism Centrals deployed in three VMs in the same availability zone (site).
Large Prism Central
A Prism Central VM with configuration more than 8 vCPU and 32 GB memory. The VM hot adds extra 8 GB and 1 GB memory when you enable Leap and Flow respectively in large Prism Central.
Large Prism Central (Single node)
A large Prism Central deployed in a single VM.
Large Prism Central (Scaleout)
Three large Prism Centrals deployed in three VMs in the same availability zone (site).
Note: A scaleout Prism Central works like a single node Prism Central in the availability zone (AZ). You can upgrade a single node Prism Central to scaleout Prism Central to increase the capacity, resiliency, and redundancy of Prism Central VM. For detailed information about the available configurations of Prism Central, see Prism Central Scalability in Prism Central Release Notes.

Virtual Private Cloud (VPC)

A logically isolated network service in Xi Cloud Services. A VPC provides the complete IP address space for hosting user-configured VPNs. A VPC allows creating workloads manually or by failover from a paired primary site.

The following VPCs are available in each Xi Cloud Services account. You cannot create more VPCs in Xi Cloud Services.

Production VPC
Used to host production workloads.
Test VPC
Used to test failover from a paired site.

Source Virtual Network

The virtual network from which guest VMs migrate during a failover or failback.

Recovery Virtual Network

The virtual network to which guest VMs migrate during a failover or failback operation.

Network Mapping

A mapping between two virtual networks in paired sites. A network mapping specifies a recovery network for all guest VMs of the source network. When you perform a failover or failback, the guest VMs in the source network recover in the corresponding (mapped) recovery network.

Category

A VM category is a key-value pair that groups similar guest VMs. Associating a protection policy with a VM category ensures that the protection policy applies to all the guest VMs in the group regardless of how the group scales with time. For example, you can associate a group of guest VMs with the Department: Marketing category, where Department is a category that includes a value Marketing along with other values such as Engineering and Sales .

VM categories remain the same way on on-prem sites and Xi Cloud Services. For more information about VM categories, see Category Management in Prism Central Guide .

Recovery Point

A copy of the state of a system at a particular point in time.

Crash-consistent Snapshots
A snapshot is crash-consistent if it captures all of the data components (write order consistent) at the instant of the crash. VM snapshots are crash-consistent (by default), which means that the vDisks that the snapshot captures are consistent with a single point in time. Crash-consistent snapshots are more suited for non-database operating systems and applications which may not support quiescence (freezing) and un-quiescence (thawing) and such as file servers, DHCP servers, print servers.
Application-consistent Snapshots
A snapshot is application-consistent if, in addition to capturing all of the data components (write order consistent) at the instant of the crash, the running applications have completed all their operations and flushed their buffers to disk (in other words, the application is quiesced). Application-consistent snapshots capture the same data as crash-consistent snapshots, with the addition of all data in memory and all transactions in process. Therefore, application-consistent snapshots may take longer to complete.

Application-consistent snapshots are more suited for systems and applications that can be quiesced and un-quiesced or thawed, such as database operating systems and applications such as SQL, Oracle, and Exchange.

Recoverable Entity

A guest VM that you can recover from a recovery point.

Protection Policy

A configurable policy that takes recovery points of the protected guest VMs in equal time intervals, and replicates those recovery points to the recovery sites.

Recovery Plan

A configurable policy that orchestrates the recovery of protected guest VMs at the recovery site.

Recovery Point Objective (RPO)

The time interval that refers to the acceptable data loss if there is a failure. For example, if the RPO is 1 hour, the system creates a recovery point every 1 hour. On recovery, you can recover the guest VMs with data as of up to 1 hour ago. Take Snapshot Every in the Create Protection Policy GUI represents RPO.

Recovery Time Objective (RTO)

The time period from failure event to the restored service. For example, an RTO of 30 minutes enables you to back up and run the protected guest VMs in 30 minutes after the failure event.

Nutanix Disaster Recovery Solutions

The following flowchart provides you with the detailed representation of the disaster recovery (DR) solutions of Nutanix. This decision tree covers both the DR solutions—protection domain-based DR and Leap helping you to make quick decisions on which DR strategy will best suit your environment.

Figure. Decision Tree for Nutanix DR Solutions Click to enlarge Nutanix DR Solution Decision Tree

For information about protection domain-based (legacy) DR, see Data Protection and Recovery with Prism Element guide. With Leap, you can protect your guest VMs and perform DR to on-prem availability zones (sites) or to Xi Cloud Services. A Leap deployment for DR from Xi Cloud Services to an on-prem Nutanix cluster is Xi Leap. The detailed information about Leap and Xi Leap DR configuration is available in the following sections of this guide.

Protection and DR between On-Prem Sites (Leap)

  • For information about protection with Asynchronous replication schedule and DR, see Protection with Asynchronous Replication Schedule and DR (Leap).
  • For information about protection with NearSync replication schedule and DR, see Protection with NearSync Replication Schedule and DR (Leap).
  • For information about protection with Synchronous replication schedule and DR, see Protection with Synchronous Replication Schedule (0 RPO) and DR.

Protection and DR between On-Prem Site and Xi Cloud Service (Xi Leap)

  • For information about protection with Asynchronous replication schedule and DR, see Protection with Asynchronous Replication and DR (Xi Leap).
  • For information about protection with NearSync replication schedule and DR, see Protection with NearSync Replication and DR (Xi Leap).

Leap Deployment Workflow

The workflow for entity-centric protection and disaster recovery (DR) configuration is as follows. The workflow is largely the same for both Leap and Xi Leap configurations except a few extra steps you must perform while configuring Xi Leap.

Procedure

  1. Enable Leap at the primary and recovery on-prem availability zones (Prism Central).
    Enable Leap at the on-prem availability zones (sites) only. For more information about enabling Leap, see Enabling Leap for On-Prem Site.
  2. Pair the primary and recovery sites with each other.
    Only when you pair a site, the site lists for recovery sites while configuring protection policies and recovery plans (see step 6 and step 7). For more information about pairing the sites, see Pairing Availability Zones (Leap).
  3. (only for Xi Leap configuration) Set up your environment to proceed with replicating to Xi Cloud Services.
    For more information about environment setup, see Xi Leap Environment Setup.
  4. (only for Xi Leap configuration) Reserve floating IP addresses.
    For more information about floating IP addresses, see Floating IP Address Management in Xi Infrastructure Service Management Guide .
  5. Create production and test virtual networks at the primary and recovery sites.
    Create production and test virtual networks only at the on-prem sites. Xi Cloud Services create production and test virtual networks dynamically for you. However, Xi Cloud Services provides floating IP addresses (step 4), a feature that is not available for on-prem sites. For more information about production and test virtual networks, see Nutanix Virtual Networks.
  6. Create a protection policy with replication schedules at the primary site.
    A protection policy can replicate recovery points to at most two other Nutanix clusters at the same or different sites. To replicate the recovery points, add a replication schedule between the primary site and each recovery site.
    • To create a protection policy with an Asynchronous replication schedule, see:
      • Creating a Protection Policy with an Asynchronous Replication Schedule (Leap)
      • Creating a Protection Policy with Asynchronous Replication Schedule (Xi Leap)
    • To create a protection policy with a NearSync replication schedule, see:
      • Creating a Protection Policy with a NearSync Replication Schedule (Leap)
      • Creating a Protection Policy with NearSync Replication Schedule (Xi Leap)
      Note: To maintain the efficiency of minutely replication, protection policies allow you to add NearSync replication schedule between the primary site and only one recovery site.
    • To create a protection policy with the Synchronous replication schedule, see Creating a Protection Policy with the Synchronous Replication Schedule (Leap).
      Note: To maintain the efficiency of synchronous replication, protection policies allow you to add only one recovery site when you add Synchronous replication schedule. If you already have an Asynchronous or a NearSync replication schedule in the protection policy, you cannot add another recovery site to protect the guest VMs with Synchronous replication schedule.
    You can also create a protection policy at a recovery site. Protection policies you create or update at a recovery site synchronize back to the primary site. The reverse synchronization helps when you protect more guest VMs in the same protection policy at the recovery site.
  7. Create a recovery plan at the primary site.
    A recovery plan orchestrates the failover of the protected guest VMs (step 6) to a recovery site. For two recovery sites, create two discrete recovery plans at the primary site—one for DR to each recovery site.
    • To create a recovery plan for DR to another Nutanix cluster at the same or different on-prem sites, see Creating a Recovery Plan (Leap).
    • To create a recovery plan for DR to Xi Cloud Services, see Creating a Recovery Plan (Xi Leap).
    You can also create a recovery plan at a recovery site. The recovery plan you create or update at a recovery site synchronizes back to the primary site. The reverse synchronization helps in scenarios where you add more guest VMs to the same recovery plan at the recovery site.
  8. Validate or test the recovery plan you create in step 7.
    To test a recovery plan, perform a test failover to a recovery site.
    • To perform test failover to another Nutanix cluster at the same or different on-prem sites, see Performing a Test Failover (Leap).
    • To perform test failover to Xi Cloud Services, see Failover and Failback Operations (Xi Leap).
  9. (only for Xi Leap configuration) After the failover to recovery site, enable external connectivity. To enable external connectivity, perform the following.
      1. After a planned failover, shut down the VLAN interface on the on-prem Top-of-Rack (TOR) switch.
      2. To access the Internet from Xi Cloud Services, create both inbound and outbound policy-based routing (PBR) policies on the virtual private cloud (VPC). For more information, see Policy Configuration in Xi Infrastructure Service Administration Guide .
  10. (only for Xi Leap configuration) Perform the following procedure to access the recovered guest VMs through the Internet.
      1. Assign a floating IP address to the guest VMs failed over to Xi Cloud Services. For more information, see Floating IP Address Management in Xi Infrastructure Service Administration Guide
      2. Create PBR policies and specify the internal or private IP address of the guest VMs. For more information, see Policy Configuration in Xi Infrastructure Service Administration Guide .
    Note: If a guest VM (that hosts a publicly accessible website) fails over, update the authoritative DNS server (for example, Amazon Route 53, GoDaddy, DNSmadeEasy) with the primary failover record (on-prem public IP address) and the secondary failover record (Xi floating IP address). For example, if your authoritative DNS server is Amazon Route53, configure the primary and the secondary failover records. Amazon Route53 performs the health checks on the primary failover record and returns the secondary failover record when the primary is down.

On-Prem Hardware Resource Requirements

For DR solutions with Asynchronous, NearSync, and Synchronous replication schedules to succeed, the nodes in the on-prem Availability Zones (AZs or sites) must have certain resources. This section provides information about the node, disk and Foundation configurations necessary to support the RPO-based recovery point frequencies.

  • The conditions and configurations provided in this section apply to Local and Remote recovery points.

  • Any node configuration with two or more SSDs, each SSD being 1.2 TB or greater capacity, supports recovery point frequency for NearSync.

  • Any node configuration that supports recovery point frequency of six (6) hours also supports AHV-based Synchronous replication schedules because a protection policy with Synchronous replication schedule takes recovery points of the protected VMs every 6 hours. See Protection with Synchronous Replication Schedule (0 RPO) and DR for more details about Synchronous replication.

  • Both the primary cluster and replication target cluster must fulfill the same minimum resource requirements.

  • Ensure that any new node or disk additions made to the on-prem sites (Availability Zones) meet the minimum requirements.

  • Features such as Deduplication and RF3 may require additional memory depending on the DR schedules and other workloads run on the cluster.

Note: In case of on-prem deployments, the default minimum recovery point frequency using the default Foundation configuration is 6 hours. To increase the recovery point frequency, you must also modify the Foundation configuration (SSD and CVM) accordingly. For example, an all-flash setup with a capacity between 48 TB to 92 TB has the default recovery point frequency is 6 hours. If you want to decrease the recovery point interval to one (1) hour, you must modify the default Foundation configuration to:
  • 14 vCPUs for CVM
  • 40 GB for CVM

The table lists the supported frequency for the recovery points across various hardware configurations.

Table 1. Recovery Point Frequency
Type of disk Capacity per node Minimum recovery point frequency Foundation Configuration - SSD and CVM requirements
Hybrid Total HDD tier capacity of 32 TB or lower. Total capacity (HDD + SSD) of 40 TB or lower.
  • NearSync
  • Async (Hourly)
No change required—Default Foundation configuration.
  • 2 x SSDs

  • Each SSD must be minimum 1.2 TB or more for NearSync.

Total HDD tier capacity between 32-64 TB. Total capacity (HDD + SSD) of 92 TB or lower.

Up to 64 TB HDD

Up to 32 TB SSD (4 x 7.68 TB SSDs)

  • NearSync
  • Async (Hourly)
Modify Foundation configurations to minimum:
  • 4 x SSDs
  • Each SSD must be minimum 1.2 TB or more for NearSync.
  • 14 vCPU for CVM
  • 40 GB for CVM

Total HDD tier capacity between 32-64 TB. Total capacity (HDD + SSD) of 92 TB or lower.

Up to 64 TB HDD

Up to 32 TB SSD

Async (every 6 Hours) No change required—Default Foundation configuration.

Total HDD tier capacity between 64-80 TB. Total capacity (HDD + SSD) of 96 TB or lower.

Async (every 6 Hours) No change required—Default Foundation configuration.

Total HDD tier capacity greater than 80 TB. Total capacity (HDD + SSD) of 136 TB or lower.

Async (every 6 Hours) Modify Foundation configurations to minimum:
  • 12 vCPU for CVM
  • 36 GB for CVM
All Flash Total capacity of 48 TB or lower
  • NearSync
  • Async (Hourly)
No change required—Default Foundation configuration.
Total capacity between 48-92 TB
  • NearSync
  • Async (Hourly)
Modify Foundation configurations to minimum:
  • 14 vCPU for CVM
  • 40 GB for CVM
Total capacity between 48-92 TB Async (every 6 Hours) No change required—Default Foundation configuration.
Total capacity greater than 92 TB Async (every 6 Hours) Modify Foundation configurations to minimum:
  • 12 vCPU for CVM
  • 36 GB for CVM

Protection and DR between On-Prem Sites (Leap)

Leap protects your guest VMs and orchestrates their disaster recovery (DR) to other Nutanix clusters when events causing service disruption occur at the primary availability zone (site). For protection of your guest VMs, protection policies with Asynchronous, NearSync, or Synchronous replication schedules generate and replicate recovery points to other on-prem availability zones (sites). Recovery plans orchestrate DR from the replicated recovery points to other Nutanix clusters at the same or different on-prem sites.

Protection policies create a recovery point—and set its expiry time—in every iteration of the specified time period (RPO). For example, the policy creates a recovery point every 1 hour for an RPO schedule of 1 hour. The recovery point expires at its designated expiry time based on the retention policy—see step 3 in Creating a Protection Policy with an Asynchronous Replication Schedule (Leap). If there is a prolonged outage at a site, the Nutanix cluster retains the last recovery point to ensure you do not lose all the recovery points. For NearSync replication (lightweight snapshot), the Nutanix cluster retains the last full hourly snapshot. During the outage, the Nutanix cluster does not clean up the recovery points due to expiry. When the Nutanix cluster comes online, it cleans up the recovery points that are past expiry immediately.

For High Availability of a guest VM, Leap enables replication of its recovery points to one or more on-prem sites. A protection policy can replicate recovery points to maximum two on-prem sites. For replication, you must add a replication schedule between sites. You can set up the on-prem sites for protection and DR in the following arrangements.

Figure. The Primary and recovery Nutanix clusters at the different on-prem AZs
Click to enlarge The Primary and recovery Nutanix clusters at the different AZs

Figure. The Primary and recovery Nutanix clusters at the same on-prem AZ
Click to enlarge The Primary and recovery Nutanix clusters at the same on-prem AZ

The replication to multiple sites enables DR to Nutanix clusters at all the sites where the recovery points replicate or exist. To enable performing DR to a Nutanix cluster at the same or different site (recovery site), you must create a recovery plan. To enable performing DR to two different Nutanix clusters at the same or different recovery sites, you must create two discrete recovery plans—one for each recovery site. In addition to performing DR to Nutanix clusters running the same hypervisor type, you can also perform cross-hypervisor disaster recovery (CHDR)—DR from AHV clusters to ESXi clusters, or from ESXi clusters to AHV clusters.

The protection policies and recovery plans you create or update synchronize continuously between the primary and recovery on-prem sites. The reverse synchronization enables you to create or update entities (protection policies, recovery plans, and guest VMs) at either the primary or the recovery sites.

The following section describes protection of your guest VMs and DR to a Nutanix cluster at the same or different on-prem sites. The workflow is the same for protection and DR to a Nutanix cluster in supported public cloud platforms. For information about protection of your guest VMs and DR from Xi Cloud Services to an on-prem Nutanix cluster (Xi Leap), see Protection and DR between On-Prem Site and Xi Cloud Service (Xi Leap).

Leap Requirements

The following are the general requirements of Leap. Along with the general requirements, there are specific requirements for protection with the following supported replication schedules.

  • For information about the on-prem node, disk and Foundation configurations required to support Asynchronous, NearSync, and Synchronous replication schedules, see On-Prem Hardware Resource Requirements.
  • For specific requirements of protection with Asynchronous replication schedule (1 hour or greater RPO), see Asynchronous Replication Requirements (Leap).
  • For specific requirements of protection with NearSync replication schedule (1–15 minutes RPO), see NearSync Replication Requirements (Leap).
  • For specific requirements of protection with Synchronous replication schedule (0 RPO), see Synchronous Replication Requirements.

License Requirements

The AOS license required depends on the features that you want to use. For information about the features that are available with an AOS license, see Software Options.

Hypervisor Requirements

The underlying hypervisors required differ in all the supported replication schedules. For more information about underlying hypervisor requirements for the supported replication schedules, see:

  • Asynchronous Replication Requirements (Leap)
  • NearSync Replication Requirements (Leap)
  • Synchronous Replication Requirements

Nutanix Software Requirements

  • Each on-prem availability zone (site) must have a Leap enabled Prism Central instance. To enable Leap in Prism Central, see Enabling Leap for On-Prem Site.
    Note: If you are using ESXi, register at least one vCenter Server to Prism Central. You can also register two vCenter Servers, each to a Prism Central at different sites. If you register both the Prism Central to the single vCenter Server, ensure that each ESXi cluster is part of different datacenter object in vCenter.

  • The primary and recovery Prism Central and Prism Element on the Nutanix clusters must be running on the supported AOS versions. For more information about the required versions for the supported replication schedules, see:
    • Asynchronous Replication Requirements (Leap)
    • NearSync Replication Requirements (Leap)
    • Synchronous Replication Requirements
    Tip:

    Nutanix supports replications between the all the latest supported LTS and STS released AOS versions. To check the list of the latest supported AOS versions, see KB-5505. To determine if the AOS versions currently running on your clusters are EOL, see the EOL document .

    Upgrade the AOS version to the next available supported LTS/STS release. To determine if an upgrade path is supported, check the Upgrade Paths page before you upgrade the AOS.

    Note: If both clusters have different AOS versions that are EOL, upgrade the cluster with lower AOS version to match the cluster with higher AOS version and then perform the upgrade to the next supported LTS version.

    For example, the clusters are running AOS versions 5.5.x and 5.10.x respectively. Upgrade the cluster on 5.5.x to 5.10.x. After both the clusters are on 5.10.x, proceed to upgrade each cluster to 5.15.x (supported LTS). Once both clusters are on 5.15.x you can upgrade the clusters to 5.20.x or newer.

    Nutanix recommends that both the primary and the replication clusters or sites run the same AOS version.

User Requirements

You must have one of the following roles in Prism Central.

  • User admin
  • Prism Central admin
  • Prism Self Service admin
  • Xi admin

To view the available roles or create a role, click the hamburger icon at the top-left corner of the window and go to Administration > Roles in the left pane.

Firewall Port Requirements

To allow two-way replication between Nutanix clusters at the same or different sites, you must enable certain ports in your external firewall. To know about the required ports, see Disaster Recovery - Leap in Port Reference.

Networking Requirements

Requirements for static IP address preservation after failover
You can preserve one IP address of a guest VM (with static IP address) for its failover (DR) to an IPAM network. After the failover, the other IP addresses of the guest VM have to be reconfigured manually. To preserve an IP address of a guest VM (with static IP address), ensure that:
Caution: By default, you cannot preserve statically assigned DNS IP addresses after failover (DR) of guest VMs. However, you can create custom in-guest scripts to preserve the statically assigned DNS IP addresses. For more information, see Creating a Recovery Plan (Leap).
  • Both the primary and the recovery Nutanix clusters run AOS 5.11 or newer.
  • The protected guest VMs have Nutanix Guest Tools (NGT) version 1.5 or newer installed.

    For information about installing NGT, see Nutanix Guest Tools in Prism Web Console Guide .

  • The protected guest VMs have at least one empty CD-ROM slot.

    The empty CD-ROM is required for mounting NGT at the recovery site.

  • The protected guest VMs can reach the Controller VM from both the sites.
  • The protected guest VMs have NetworkManager command-line tool (nmcli) version 0.9.10.0 or newer installed.
    Also, the NetworkManager must manage the networks on Linux VMs. To enable NetworkManager on a Linux VM, in the interface configuration file, set the value of the NM_CONTROLLED field to yes . After setting the field, restart the network service on the VM.
    Tip: In CentOS, the interface configuration file is /etc/sysconfig/network-scripts/ifcfg-eth0 .
Requirements for static IP address mapping of guest VMs between source and target virtual networks
You can explicitly define IP addresses for guest VMs that have static IP addresses on the primary site. On recovery, such guest VMs retain the explicitly defined IP address. To map static IP addresses of guest VMs between source and target virtual networks, ensure that:
  • Both the primary and the recovery Nutanix clusters run AOS 5.17 or newer.
  • The protected guest VMs have static IP addresses at the primary site.
  • The protected guest VMs have Nutanix Guest Tools (NGT) version 1.5 or newer installed.

    For information about installing NGT, see Nutanix Guest Tools in Prism Web Console Guide .

  • The protected guest VMs have at least one empty CD-ROM slot.

    The empty CD-ROM is required for mounting NGT at the recovery site.

  • The protected guest VMs can reach the Controller VM from both the sites.
  • The recovery plan selected for failover has VM-level IP address mapping configured.
Virtual network design requirements
You can design the virtual subnets that you plan to use for DR to the recovery site so that they can accommodate the guest VMs running in the source virtual network.
  • Maintain a uniform network configuration for all the virtual LANs (VLANs) with the same VLAN ID and network range in all the Nutanix clusters at a site. All such VLANs must have the same subnet name, IP address range, and IP address prefix length ( (Gateway IP/Prefix Length) ).

    For example, if you have VLAN with id 0 and network 10.45.128.0/17, and three clusters PE1, PE2, and PE3 at the site AZ1, all the clusters must maintain the same name, IP address range, and IP address prefix length ( (Gateway IP/Prefix Length) ), for VLAN with id 0.

  • To use a virtual network as a recovery virtual network, ensure that the virtual network meets the following requirements.
    • The network prefix is the same as the network prefix of the source virtual network. For example, if the source network address is 192.0.2.0/24, the network prefix of the recovery virtual network must also be 24.
    • The gateway IP address is the same as the gateway IP address in the source network. For example, if the gateway IP address in the source virtual network 192.0.2.0/24 is 192.0.2.10, the last octet of the gateway IP address in the recovery virtual network must also be 10.
  • To use a single Nutanix cluster as a target for DR from multiple primary Nutanix clusters, ensure that the number of virtual networks on the recovery cluster is equal to the sum of the number of virtual networks on the individual primary Nutanix clusters. For example, if there are two primary Nutanix clusters, with one cluster having m networks and the other cluster having n networks, ensure that the recovery cluster has m + n networks. Such a design ensures that all recovered VMs attach to a network.

Additional Requirements

  • Both the primary and recovery Nutanix clusters must have an external IP address.
  • Both the primary and recovery Prism Centrals and Nutanix clusters must have a data services IP address.
  • The Nutanix cluster that hosts the Prism Central must meet the following requirements.
    • The Nutanix cluster must be registered to the Prism Central instance.
    • The Nutanix cluster must have an iSCSI data services IP address configured on it.
    • The Nutanix cluster must also have sufficient memory to support a hot add of memory to all Prism Central nodes when you enable Leap. A small Prism Central instance (4 vCPUs, 16 GB memory) requires a hot add of 4 GB, and a large Prism Central instance (8 vCPUs, 32 GB memory) requires a hot add of 8 GB. If you enable Nutanix Flow, each Prism Central instance requires an extra hot-add of 1 GB.
  • Each node in a scaled-out Prism Central instance must have a minimum of 4 vCPUs and 16 GB memory.

    For more information about the scaled-out deployments of a Prism Central, see Leap Terminology.

  • The protected guest VMs must have Nutanix VM mobility drivers installed.

    Nutanix VM mobility drivers are required for accessing the guest VMs after failover. Without Nutanix VM mobility drivers, the guest VMs become inaccessible after a failover.

  • Maintain a uniform network configuration for all the virtual LANs (VLANs) with the same VLAN ID and network range in all the clusters at an availability zone (site). All such VLANs must have the same subnet name, IP address range, and IP address prefix length ( (Gateway IP/Prefix Length) ).

    For example, if you have VLAN with id 0 and network 10.45.128.0/17, and three clusters PE1, PE2, and PE3 at the site AZ1, all the clusters must maintain the same name, IP address range, and IP address prefix length ( (Gateway IP/Prefix Length) ), for VLAN with id 0.

Leap Limitations

Consider the following general limitations before configuring protection and disaster recovery (DR) with Leap. Along with the general limitations, there are specific protection limitations with the following supported replication schedules.

  • For specific limitations of protection with Asynchronous replication schedule (1 hour or greater RPO), see Asynchronous Replication Limitations (Leap).
  • For specific limitations of protection with NearSync replication schedule (1–15 minutes RPO), see NearSync Replication Limitations (Leap).
  • For specific limitations of protection with Synchronous replication schedule (0 RPO), see Synchronous Replication Limitations.

Virtual Machine Limitations

You cannot do or implement the following.

  • Deploy witness VMs.
  • Protect multiple guest VMs that use disk sharing (for example, multi-writer sharing, Microsoft Failover Clusters, Oracle RAC).

  • Protect VMware fault tolerance enabled guest VMs.

  • Recover vGPU console enabled guest VMs efficiently.

    When you perform DR of vGPU console-enabled guest VMs, the VMs recover with the default VGA console (without any alert) instead of vGPU console. The guest VMs fail to recover when you perform cross-hypervisor disaster recovery (CHDR). For more information about DR and backup behavior of guest VMs with vGPU, see vGPU Enabled Guest VMs.

  • Configure NICs for a guest VM across both the virtual private clouds (VPC).

    You can configure NICs for a guest VM associated with either production or test VPC.

Volume Groups Limitation

You cannot protect volume groups.

Network Segmentation Limitation

You cannot apply network segmentation for management traffic (any traffic not on the backplane network) in Leap.

You get an error when you try to enable network segmentation for management traffic on a Leap enabled Nutanix Cluster or enable Leap in a network segmentation enabled Nutanix cluster. For more information about network segmentation, see Securing Traffic Through Network Segmentation in the Security Guide .
Note: However, you can apply network segmentation for backplane traffic at the primary and recovery clusters. Nutanix does not recommend this because when you perform a planned failover of guest VMs having network segmentation for backplane enabled, the guest VMs fail to recover and the guest VMs at the primary AZ are removed.

Virtual Network Limitation

Although there is no limit to the number of VLANs that you can create, only the first 500 VLANs list in the drop-down of Network Settings while creating a recovery plan. For more information about VLANs in the recovery plan, see Nutanix Virtual Networks.

Nutanix to vSphere Cluster Mapping Limitation

Due to the way the Nutanix architecture distributes data, there is limited support for mapping a Nutanix cluster to multiple vSphere clusters. If a Nutanix cluster is split into multiple vSphere clusters, migrate and recovery operations fail.

Failover Limitation

After the failover, the recovered guest VMs do not retain their associated labels.
Tip: Assign categories to the guest VMs instead of labels because VM categories are retained after the failover.

vGPU Enabled Guest VMs

The following table list the behavior of guest VMs with vGPU to disaster recovery (DR) and backup deployments.

Table 1.
Primary cluster Recovery cluster DR or Backup Identical vGPU models Unidentical vGPU models or no vGPU
AHV AHV Nutanix Disaster Recovery Supported:
  • Recovery point creation
  • Replication
  • Restore
  • Migrate
  • VM start
  • Failover and Failback
Supported:
  • Recovery point creation
  • Replication
  • Restore
  • Migrate
Unsupported:
  • VM start
  • Failover and Failback
    Note: Only for Synchronous replication, protection of guest VMs fail.
Backup: HYCU Guest VMs with vGPU fail to recover. Guest VMs with vGPU fail to recover.
Backup: Veeam Guest VMs with vGPU fail to recover.
  • Guest VMs with vGPU recover but with older vGPU.
  • Guest VMs with vGPU recover but do not start.
Tip: The VMs start when you disable vGPU on the guest VM
ESXi ESXi Nutanix Disaster Recovery Guest VMs with vGPU cannot be protected. Guest VMs with vGPU cannot be protected.
Backup Guest VMs with vGPU cannot be protected. Guest VMs with vGPU cannot be protected.
AHV ESXi Nutanix Disaster Recovery vGPU is disabled after failover of Guest VMs with vGPU. vGPU is disabled after failover of Guest VMs with vGPU.
ESXi AHV Nutanix Disaster Recovery Guest VMs with vGPU cannot be protected. Guest VMs with vGPU cannot be protected.

Leap Configuration Maximums

For the maximum number of entities you can configure with different replication schedules and perform failover (disaster recovery), see Nutanix Configuration Maximums. The limits have been tested for Leap production deployments. Nutanix does not guarantee the system to be able to operate beyond these limits.

Tip: Upgrade your NCC version to 3.10.1 to get configuration alerts.

Leap Recommendations

Nutanix recommends the following best practices for configuring protection and disaster recovery (DR) with Leap.

General Recommendations

  • Create all entities (protection policies, recovery plans, and VM categories) at the primary AZ (AZ).
  • Upgrade Prism Central before upgrading Prism Element on the Nutanix clusters registered to it. For more information about upgrading Prism Central, see Upgrading Prism Central in the Acropolis Upgrade Guide .
  • Do not include the guest VMs protected with Asynchronous, NearSync, and Synchronous replication schedules in the same recovery plan. You can include guest VMs protected with Asynchronous or NearSync replication schedules in the same recovery plan. However, if you combine these guest VMs with the guest VMs protected by Synchronous replication schedules in a recovery plan, the recovery fails.
  • Disable Synchronous replication before unpairing the AZs.

    If you unpair the AZs while the guest VMs in the Nutanix clusters are still in synchronization, the Nutanix cluster becomes unstable. For more information about disabling Synchronous replication, see Synchronous Replication Management.

Recommendation for Migrating Protection Domains to Protection Policies

You can protect a guest VM either with legacy DR solution (protection domain-based) or with Leap. To protect a legacy DR-protected guest VM with Leap, you must migrate the guest VM from protection domain to a protection policy. During the migration, do not delete the guest VM snapshots in the protection domain. Nutanix recommends keeping the guest VM snapshots in the protection domain until the first recovery point for the guest VM is available on Prism Central. For more information, see Migrating Guest VMs from a Protection Domain to a Protection Policy.

Recommendation for DR to Nutanix Clusters at the Same On-Prem Availability Zone

If the single Prism Central that you use for protection and DR to Nutanix clusters at the same availability zone (site) becomes inactive, you cannot perform a failover when required. To avoid the single point of failure in such deployments, Nutanix recommends installing the single Prism Central at a different site (different fault domain).

Recommendation for Virtual Networks

  • Map the networks while creating a recovery plan in Prism Central.
  • Recovery plans do not support overlapping subnets in a network-mapping configuration. Do not create virtual networks that have the same name or overlapping IP address ranges.

Recommendation for Container Mapping

Create storage containers with the same name on both the primary and recovery Nutanix clusters.

Leap automatically maps the storage containers during the first replication (seeding) of a guest VM. If a storage container with the same name exists on both the primary and recovery Nutanix clusters, the recovery points replicate to the same name storage container only. For example, if your protected guest VMs are in the SelfServiceContainer on the primary Nutanix cluster, and the recovery Nutanix cluster also has SelfServiceContainer , the recovery points replicate to SelfServiceContainer only. If a storage container with the same name does not exist at the recovery AZ, the recovery points replicate to a random storage container at the recovery AZ. For more information about creating storage containers on the Nutanix clusters, see Creating a Storage Container in Prism Web Console Guide .

General Recommendations

  • Create all entities (protection policies, recovery plans, and VM categories) at the primary availability zone (site).
  • Upgrade Prism Central before upgrading Prism Element on the Nutanix clusters registered to it.
  • Do not include the guest VMs protected with Asynchronous, NearSync, and Synchronous replication schedules in the same recovery plan. You can include guest VMs protected with Asynchronous or NearSync replication schedules in the same recovery plan. However, if you combine these guest VMs with the guest VMs protected by Synchronous replication schedules in a recovery plan, the recovery fails.
  • Disable Synchronous replication before unpairing the sites.

    If you unpair the sites while the guest VMs in the Nutanix clusters are still in synchronization, the Nutanix cluster becomes unstable. For more information about disabling Synchronous replication, see Synchronous Replication Management.

Leap Service-Level Agreements (SLAs)

Leap enables protection of your guest VMs and disaster recovery (DR) to one or more Nutanix clusters at the same or different on-prem sites. A Nutanix cluster is essentially an AHV or an ESXi cluster running AOS. In addition to performing DR to Nutanix clusters running the same hypervisor type, you can also perform cross-hypervisor disaster recovery (CHDR)—DR from AHV clusters to ESXi clusters, or from ESXi clusters to AHV clusters.

Leap supports DR (and CHDR) to maximum two different Nutanix clusters at the same or different availability zones (sites). You can protect your guest VMs with the following replication schedules.

  • Asynchronous replication schedule (1 hour or greater RPO). For information about protection with Asynchronous replication schedule, see Protection with Asynchronous Replication Schedule and DR (Leap).
  • NearSync replication schedule (1–15 minute RPO). For information about protection with NearSync replication schedule, see Protection with NearSync Replication Schedule and DR (Leap).
  • Synchronous replication schedule (0 RPO). For information about protection with Synchronous replication schedule, see Protection with Synchronous Replication Schedule (0 RPO) and DR.

    To maintain the efficiency in protection and DR, Leap allows to protect a guest VM with Synchronous replication schedule to only one AHV cluster and at the different on-prem availability zone.

Leap Views

The disaster recovery (DR) views enable you to perform CRUD operations on the following types of Leap entities.

  • Configured entities (for example, availability zones, protection policies, and recovery plans)
  • Created entities (for example, guest VMs, and recovery points)

This chapter describes the views of Prism Central (on-prem site).

Availability Zones View

The Availability Zones view under the hamburger icon > Administration lists all of your paired availability zones.

The following figure is a sample view, and the tables describe the fields and the actions that you can perform in this view.

Figure. AZs View Click to enlarge AZs View

Table 1. Availability Zones View Fields
Field Description
Name Name of the availability zone.
Region Region to which the availability zone belongs.
Type Type of availability zone. Availability zones that are backed by on-prem Prism Central instances are shown to be of type physical. The availability zone that you are logged in to is shown as a local availability zone.
Connectivity Status Status of connectivity between the local availability zone and the paired availability zone.
Table 2. Workflows Available in the Availability Zones View
Workflow Description
Connect to Availability Zone (on-prem Prism Central only) Connect to an on-prem Prism Central or to a Xi Cloud Services for data replication.
Table 3. Actions Available in the Actions Menu
Action Description
Disconnect Disconnect the remote availability zone. When you disconnect an availability zone, the pairing is removed.

Protection Policies View

The Protection Policies view under the hamburger icon > Data Protection lists all of configured protection policies from all the paired availability zones.

The following figure is a sample view, and the tables describe the fields and the actions that you can perform in this view.

Figure. Protection Policies View Click to enlarge Protection Policies View

Table 1. Protection Policies View Fields
Field Description
Policy Name Name of the protection policy.
Schedules Number of schedules configured in the protection policy. If the protection policy has multiple schedules, a drop-down icon is displayed. Click the drop-down icon to see the primary location:primary Nutanix cluster , recovery location:recovery Nutanix cluster , and RPO of the schedules in the protection policy.
Alerts Number of alerts issued for the protection policy.
Table 2. Workflows Available in the Protection Policies View
Workflow Description
Create protection policy Create a protection policy.
Table 3. Actions Available in the Actions Menu
Action Description
Update Update the protection policy.
Clone Clone the protection policy.
Delete Delete the protection policy.

Recovery Plans View

The Recovery Plans view under the hamburger icon > Data Protection lists all of configured recovery plans from all the paired availability zones.

The following figure is a sample view, and the tables describe the fields and the actions that you can perform in this view.

Figure. Recovery Plans View Click to enlarge Recovery Plans View

Table 1. Recovery Plans View Fields
Field Description
Name Name of the recovery plan.
Primary Location Replication source site for the recovery plan.
Recovery Location Replication target site for the recovery plan.
Entities Sum of the following VMs:
  • Number of local, live VMs that are specified in the recovery plan.
  • Number of remote VMs that the recovery plan can recover at this site.
Last Validation Status Status of the most recent validation of the recovery plan.
Last Test Status Status of the most recent test performed on the recovery plan.
Last Failover Status Status of the most recent failover performed on the recovery plan.
Table 2. Workflows Available in the Recovery Plans View
Workflow Description
Create Recovery Plan Create a recovery plan.
Table 3. Actions Available in the Actions Menu
Action Description
Validate Validates the recovery plan to ensure that the VMs in the recovery plan have a valid configuration and can be recovered.
Test Tests the recovery plan.
Clean-up test VMs Cleans up the VMs failed over as a result of testing recovery plan.
Update Updates the recovery plan.
Failover Performs a failover.
Delete Deletes the recovery plan.

Dashboard Widgets

The dashboard includes widgets that display the statuses of configured protection policies and recovery plans. If you have not configured these VMs, the widgets display a summary of the steps required to get started with Leap.

To view these widgets, click the Dashboard tab.

The following figure is a sample view of the dashboard widgets.

Figure. Dashboard Widgets for Leap Click to enlarge Dashboard Widgets

Enabling Leap for On-Prem Site

To perform disaster recovery (DR) to Nutanix clusters at different on-prem available zones (sites), enable Leap at both the primary and recovery sites (Prism Central). Without enabling Leap, you can configure protection policies and recovery plans that synchronize to the paired sites but you cannot perform failover and failback operations. To perform DR to different Nutanix clusters at the same site, enable Leap in the single Prism Central.

About this task

To enable Leap, perform the following procedure.

Note: You cannot disable Leap once you have enabled it.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the settings button (gear icon) at the top-right corner of the window.
  3. Click Enable Leap in the Setup section on the left pane.
    Figure. Enabling Leap
    Click to enlarge Enabling Leap

    The Leap dialog box run prechecks. If any precheck fails, resolve the issue that is causing the failure and click check again .
  4. Click Enable after all the prechecks pass.
    Leap is enabled after at least 10 seconds.

Pairing Availability Zones (Leap)

To replicate entities (protection policies, recovery plans, and recovery points) to different on-prem availability zones (sites) bidirectionally, pair the sites with each other. To replicate entities to different Nutanix clusters at the same site bidirectionally, you need not pair the sites because the primary and the recovery Nutanix clusters are registered to the same site (Prism Central). Without pairing the sites, you cannot perform DR to a different site.

About this task

To pair an on-prem AZ with another on-prem AZ, perform the following procedure at either of the on-prem AZs.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Administration > Availability Zones in the left pane.
    Figure. Pairing Availability Zone
    Click to enlarge Pairing Availability Zone

  3. Click Connect to Availability Zone .
    Specify the following information in the Connect to Availability Zone window.
    Figure. Connect to Availability Zone
    Click to enlarge Connect to Availability Zone

    1. Availability Zone Type : Select Physical Location from the drop-down list.
      A physical location is an on-prem availability zone (site). To pair the on-prem site with Xi Cloud Services, select XI from the drop-down list, and enter the credentials of your Xi Cloud Services account in step c and set d.
    2. IP Address for Remote PC : Enter the IP address of the recovery site Prism Central.
    3. Username : Enter the username of your recovery site Prism Central.
    4. Password : Enter the password of your recovery site Prism Central.
  4. Click Connect .
    Both the on-prem AZs are paired to each other.

Protection and Automated DR

Automated disaster recovery (DR) configurations use protection policies to protect your guest VMs, and recovery plans to orchestrate the recovery of those guest VMs to different Nutanix clusters at the same or different availability zones (sites). You can automate protection of your guest VMs with the following supported replication schedules in Leap.

  • Asynchronous replication schedule (1 hour or greater RPO). For information about protection with Asynchronous replication schedule, see Protection with Asynchronous Replication Schedule and DR (Leap).
  • NearSync replication schedule (1–15 minute RPO). For information about protection with NearSync replication schedule, see Protection with NearSync Replication Schedule and DR (Leap).
  • Synchronous replication schedule (0 RPO). For information about protection with Synchronous replication schedule, see Protection with Synchronous Replication Schedule (0 RPO) and DR.

    To maintain the efficiency in protection and DR, Leap allows to protect a guest VM with Synchronous replication schedule to only one AHV cluster and at the different on-prem availability zone.

Protection with Asynchronous Replication Schedule and DR (Leap)

Asynchronous replication schedules enable you to protect your guest VMs with an RPO of 1 hour or beyond. A protection policy with an Asynchronous replication schedule creates a recovery point in an hourly time interval, and replicates it to the recovery availability zones (sites) for High Availability. For guest VMs protected with Asynchronous replication schedule, you can perform disaster recovery (DR) to different Nutanix clusters at same or different sites. In addition to performing DR to Nutanix clusters running the same hypervisor type, you can also perform cross-hypervisor disaster recovery (CHDR)—DR from AHV clusters to ESXi clusters, or from ESXi clusters to AHV clusters.

Note: Nutanix provides multiple DR solutions to protect your environment. See Nutanix Disaster Recovery Solutions for the detailed representation of the DR offerings of Nutanix.

Asynchronous Replication Requirements (Leap)

The following are the specific requirements for protecting your guest VMs with Asynchronous replication schedule. Ensure that you meet the following requirements in addition to the general requirements of Leap.

For information about the general requirements of Leap, see Leap Requirements.

For information about node, disk and Foundation configurations required to support Asynchronous replication schedules, see On-Prem Hardware Resource Requirements.

Hypervisor Requirements

AHV or ESXi

  • The AHV clusters must be running on AHV versions that come bundled with the supported version of AOS.
  • The ESXi clusters must be running on version ESXi 6.5 GA or newer.

Nutanix Software Requirements

Each on-prem site must have a Leap enabled Prism Central instance.

The primary and recovery Prism Central and Prism Element on the Nutanix clusters must be running the following versions of AOS.

  • AHV clusters
    • AOS 5.17 or newer for DR to different Nutanix clusters at the same site.
    • AOS 5.10 or newer for DR to Nutanix clusters at the different sites.
  • ESXi clusters
    • AOS 5.17 or newer for DR to different Nutanix clusters at the same site.
    • AOS 5.11 or newer for DR to Nutanix clusters at the different sites.

Cross Hypervisor Disaster Recovery (CHDR) Requirements

Guest VMs protected with Asynchronous replication schedule support cross-hypervisor disaster recovery. You can perform failover (DR) to recover guest VMs from AHV clusters to ESXi clusters or guest VMs from ESXi clusters to AHV clusters by considering the following requirements.

  • Both the primary and the recovery Nutanix clusters must be running AOS 5.17 or newer for CHDR to Nutanix clusters at the same availability zone (site).
  • Both the primary and the recovery Nutanix clusters must be running AOS 5.11.2 or newer for CHDR to Nutanix clusters at different availability zones (sites).
  • Install and configure Nutanix Guest Tools (NGT) on all the guest VMs. For more information, see Enabling and Mounting Nutanix Guest Tools in Prism Web Console Guide .

    NGT configures the guest VMs with all the required drivers for VM portability. For more information about general NGT requirements, see Nutanix Guest Tools Requirements and Limitations in Prism Web Console Guide .

  • CHDR supports guest VMs with flat files only.
  • CHDR supports IDE/SCSI and SATA disks only.
  • For all the non-boot SCSI disks of Windows guest VMs, set the SAN policy to OnlineAll so that they come online automatically.
  • In vSphere 6.7, guest VMs are configured with UEFI secure boot by default. Upon CHDR to an AHV cluster, these guest VMs do not start if the host does not support the UEFI secure boot feature. For more information about supportability of UEFI secure boot on Nutanix clusters, see Compatibility Matrix.

  • For information about operating systems that support UEFI and Secure Boot, see UEFI and Secure Boot Support for CHDR.

  • Nutanix does not support vSphere inventory mapping (for example, VM folder and resource pools) when protecting workloads between VMware clusters.

  • Nutanix does not support vSphere snapshots or delta disk files.

    If you have delta disks attached to a guest VM and you proceed with failover, you get a validation warning and the guest VM does not recover. Contact Nutanix Support for assistance.

Table 1. Operating Systems Supported for CHDR (Asynchronous Replication)
Operating System Version Requirements and limitations
Windows
  • Windows 2008 R2 or newer versions
  • Windows 7 or newer versions
  • Only 64-bit operating systems are supported.
Linux
  • CentOS 6.5 and 7.0
  • RHEL 6.5 or newer and RHEL 7.0 or newer.
  • Oracle Linux 6.5 and 7.0
  • Ubuntu 14.04
  • SLES operating system is not supported.

Additional Requirement

The storage container name of the protected guest VMs must be the same on both the primary and recovery clusters. Therefore, a storage container must exist on the recovery cluster with the same name as the one on the primary cluster. For example, if the protected VMs are in the SelfServiceContainer storage container on the primary cluster, there must also be a SelfServiceContainer storage container on the recovery cluster.

Asynchronous Replication Limitations (Leap)

Consider the following specific limitations before protecting your guest VMs with Asynchronous replication schedule. These limitations are in addition to the general limitations of Leap.

For information about the general limitations of Leap, see Leap Limitations.

  • You cannot restore guest VMs with incompatible GPUs at the recovery Nutanix cluster.
  • You cannot protect guest VMs configured as part of a network function chain.
  • You cannot retain hypervisor-specific properties after cross hypervisor disaster recovery (CHDR).

    CHDR does not preserve hypervisor-specific properties (for example, multi-writer flags, independent persistent and non-persistent disks, changed block tracking (CBT), PVSCSI disk configurations).

Creating a Protection Policy with an Asynchronous Replication Schedule (Leap)

To protect the guest VMs in an hourly replication schedule, configure an Asynchronous replication schedule while creating the protection policy. The policy takes recovery points of those guest VMs in the specified time intervals (hourly) and replicates them to the recovery availability zones (sites) for High Availability. To protect the guest VMs at the same or different recovery sites, the protection policy allows you to configure Asynchronous replication schedules to at most two recovery sites—a unique replication schedule to each recovery site. The policy synchronizes continuously to the recovery sites in a bidirectional way.

Before you begin

See Asynchronous Replication Requirements (Leap) and Asynchronous Replication Limitations (Leap) before you start.

About this task

To create a protection policy with an Asynchronous replication schedule, do the following at the primary site. You can also create a protection policy at the recovery site. Protection policies you create or update at a recovery site synchronize back to the primary site.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Policies > Protection Policies in the left pane.
    Figure. Protection Policy Configuration: Protection Policies
    Click to enlarge Protection Policy Configuration: Protection Policies

  3. Click Create Protection Policy .
    Specify the following information in the Create Protection Policy window.
    Figure. Protection Policy Configuration: Select Primary Location
    Click to enlarge Protection Policy Configuration: Select Primary Location

    1. Policy name : Enter a name for the protection policy.
      Caution: The name can be of only alphanumeric, dot, dash, and underscore characters.
    2. In the Primary Location pane, specify the following information.
        1. Location : From the drop-down list, check an availability zone (site) that hosts the guests VMs to protect.

          The drop-down lists all the sites paired with the local site. Local AZ represents the local site (Prism Central). For your primary site, you can check either the local site or a non-local site.

        2. Cluster : From the drop-down list, check the Nutanix cluster that hosts the guest VMs to protect.

          The drop-down lists all the Nutanix clusters registered to Prism Central representing the selected site. If you want to protect the guest VMs from multiple Nutanix clusters in the same protection policy, check the clusters that host those guest VMs. All Clusters protects the guest VMs of all Nutanix clusters registered to Prism Central.

        3. Click Save .

          Clicking Save activates the Recovery Location pane. After saving the primary site configuration, you can optionally add a local schedule (step iv) to retain the recovery points at the primary site.

        4. Click + Add Local Schedule if you want to retain recovery points locally in addition to retaining recovery points in a replication schedule (step d.iv). For example, you can create a local schedule to retain 15 minute recovery points locally and also an hourly replication schedule to retain recovery points and replicate them to a recovery site every 2 hours. The two schedules apply differently on the guest VMs.

          Specify the following information in the Add Schedule window.

          Figure. Protection Policy Configuration: Add Local Schedule
          Click to enlarge Protection Policy Configuration: Add Local Schedule

          1. Take Snapshot Every : Specify the frequency in minutes , hours , days , or weeks at which you want the recovery points to be taken locally.
          2. Retention Type : Specify one of the following two types of retention policy.
            • Linear : Implements a simple retention scheme at the local site. If you set the retention number to n, the local site retains the n recent recovery points.

              When you enter the frequency in minutes , the system selects the Roll-up retention type by default because minutely recovery points do not support Linear retention types.

            • Roll-up : Rolls up the recovery points into a single recovery point at the local site.

              For more information about the roll-up recovery points, see step d.iii.

          3. Retention on Local AZ:PE_A3_AHV : Specify the retention number for the local site.
          4. If you want to take application-consistent recovery points, check Take App-Consistent Recovery Point .

            Irrespective of the local or replication schedules, the recovery points are of the specified type. If you check Take App-Consistent Recovery Point , the recovery points generated are application-consistent and if you do not check Take App-Consistent Recovery Point , the recovery points generated are crash-consistent. If the time in the local schedule and the replication schedule match, the single recovery point generated is application-consistent.

            Note: See Application-consistent Recovery Point Conditions and Limitations before you take application-consistent snapshot.
          5. Click Save Schedule .
    3. In the Recovery Location pane, specify the following information.
      Figure. Protection Policy Configuration: Select Recovery Location
      Click to enlarge Protection Policy Configuration: Select Recovery Location

        1. Location : From the drop-down list, select the availability zone (site) where you want to replicate the recovery points.

          The drop-down lists all the sites paired with the local site. Local AZ represents the local site (Prism Central). Select Local AZ if you want to configure DR to a different Nutanix cluster at the same site.

          If you do not select a site, local recovery points that are created by the protection policy do not replicate automatically. You can, however, replicate the recovery points manually and use recovery plans to recover the guest VMs. For more information, see Manual Disaster Recovery (Leap).

        2. Cluster : From the drop-down list, select the Nutanix cluster where you want to replicate the recovery points.

          The drop-down lists all the Nutanix clusters registered to Prism Central representing the selected site. You can select one cluster at the recovery site. If you want to replicate the recovery points to more clusters at the same or different sites, add another recovery site with a replication schedule. For more information to add another recovery site with a replication schedule, see step e.

          Note: Selecting auto-select from the drop-down list replicates the recovery points to any available cluster at the recovery site. Select auto-select from the drop-down list only if all the clusters at the recovery site are up and running.
          Caution: If the primary Nutanix cluster contains an IBM POWER Systems server, you can replicate recovery points to an on-prem site only if that on-prem site contains an IBM Power Systems server.
        3. Click Save .

          Clicking Save activates the + Add Schedule button between the primary and the recovery site. After saving the recovery site configuration, you can optionally add a local schedule to retain the recovery points at the recovery site.

        4. Click + Add Local Schedule if you want to retain recovery points locally in addition to retaining recovery points in a replication schedule (step d.iv). For example, you can create a local schedule to retain one hourly recovery points locally to supplement the hourly replication schedule. The two schedules apply differently on the guest VMs after failover, when the recovery points replicate back to the primary site.

          Specify the following information in the Add Schedule window.

          Figure. Protection Policy Configuration: Add Local Schedule
          Click to enlarge Protection Policy Configuration: Add Local Schedule

          1. Take Snapshot Every : Specify the frequency in minutes , hours , days , or weeks at which you want the recovery points to be taken locally.
          2. Retention Type : Specify one of the following two types of retention policy.
            • Linear : Implements a simple retention scheme at the local site. If you set the retention number to n, the local site retains the n recent recovery points.

              When you enter the frequency in minutes , the system selects the Roll-up retention type by default because minutely recovery points do not support Linear retention types.

            • Roll-up : Rolls up the recovery points into a single recovery point at the local site.

              For more information about the roll-up recovery points, see step d.iii.

          3. Retention on 10.xx.xx.xxx:PE_C1_AHV : Specify the retention number for the local site.
          4. If you want to take application-consistent recovery points, check Take App-Consistent Recovery Point .

            Irrespective of the local or replication schedules, the recovery points are of the specified type. If you check Take App-Consistent Recovery Point , the recovery points generated are application-consistent and if you do not check Take App-Consistent Recovery Point , the recovery points generated are crash-consistent. If the time in the local schedule and the replication schedule match, the single recovery point generated is application-consistent.

            Note: See Application-consistent Recovery Point Conditions and Limitations before you take application-consistent snapshot.
          5. Click Save Schedule .
    4. Click + Add Schedule to add a replication schedule between the primary and the recovery site.

      Specify the following information in the Add Schedule window. The window auto-populates the Primary Location and Recovery Location that you have selected in step b and step c.

      Figure. Protection Policy Configuration: Add Schedule (Asynchronous)
      Click to enlarge Protection Policy Configuration: Add Schedule (Asynchronous)

        1. Protection Type : Click Asynchronous .
        2. Take Snapshot Every : Specify the frequency in hours , days , or weeks at which you want the recovery points to be taken.

          The specified frequency is the RPO. For more information about RPO, see Leap Terminology.

        3. Retention Type : Specify one of the following two types of retention policy.
          • Linear : Implements a simple retention scheme at both the primary (local) and the recovery (remote) site. If you set the retention number for a given site to n, that site retains the n recent recovery points. For example, if the RPO is 1 hour, and the retention number for the local site is 48, the local site retains 48 hours (48 X 1 hour) of recovery points at any given time.
            Tip: Use linear retention policies for small RPO windows with shorter retention periods or in cases where you always want to recover to a specific RPO window.
          • Roll-up : Rolls up the recovery points as per the RPO and retention period into a single recovery point at a site. For example, if you set the RPO to 1 hour, and the retention time to 5 days, the 24 oldest hourly recovery points roll up into a single daily recovery point (one recovery point = 24 hourly recovery points) after every 24 hours. The system keeps one day (of rolled-up hourly recovery points) and 4 days of daily recovery points.
            Note:
            • If the retention period is n days, the system keeps 1 day of RPO (rolled-up hourly recovery points) and n-1 days of daily recovery points.
            • If the retention period is n weeks, the system keeps 1 day of RPO, 1 week of daily and n-1 weeks of weekly recovery points.
            • If the retention period is n months, the system keeps 1 day of RPO, 1 week of daily, 1 month of weekly, and n-1 months of monthly recovery points.
            • If the retention period is n years, the system keeps 1 day of RPO, 1 week of daily, 1 month of weekly, and n-1 months of monthly recovery points.
            Note: The recovery points that are used to create a rolled-up recovery point are discarded.
            Tip: Use roll-up retention policies for anything with a longer retention period. Roll-up policies are more flexible and automatically handle recovery point aging/pruning while still providing granular RPOs for the first day.
        4. To specify the retention number for the primary and recovery sites, do the following.
          • Retention on Local AZ: PE_A3_AHV : Specify the retention number for the primary site.

            This field is unavailable if you do not specify a recovery location.

          • Retention on 10.xx.xx.xxx:PE_C1_AHV : Specify the retention number for the recovery site.

            If you select linear retention, the remote and local retention count represents the number of recovery points to retain at any given time. If you select roll-up retention, these numbers specify the retention period.

        5. If you want to enable reverse retention of the recovery points, check Reverse retention for VMs on recovery location .
          Note: Reverse retention for VMs on recovery location is available only when the retention numbers on the primary and recovery sites are different.

          Reverse retention maintains the retention numbers of recovery points even after failover to a recovery site in the same or different availability zones. For example, if you retain two recovery points at the primary site and three recovery points at the recovery site, and you enable reverse retention, a failover event does not change the initial retention numbers when the recovery points replicate back to the primary site. The recovery site still retains two recovery points while the primary site retains three recovery points. If you do not enable reverse retention, a failover event changes the initial retention numbers when the recovery points replicate back to the primary site. The recovery site retains three recovery points while the primary site retains two recovery points.

          Maintaining the same retention numbers at a recovery site is required if you want to retain a particular number of recovery points, irrespective of where the guest VM is after its failover.

        6. If you want to take application-consistent recovery points, check Take App-Consistent Recovery Point .

          Application-consistent recovery points ensure that application consistency is maintained in the replicated recovery points. For application-consistent recovery points, install NGT on the guest VMs running on AHV clusters. For guest VMs running on ESXi clusters, you can take application-consistent recovery points without installing NGT, but the recovery points are hypervisor-based, and leads to VM stuns (temporary unresponsive VMs) after failover to the recovery sites.

          Note: See Application-consistent Recovery Point Conditions and Limitations before you take application-consistent snapshot.
          Caution: Application-consistent recovery points fail for EFI-boot enabled Windows 2019 VM running on ESXi when NGT is not installed. Nutanix recommends installing NGT on guest VMs running on ESXi also.
        7. Click Save Schedule .
    5. Click + Add Recovery Location at the top-right if you want to add an additional recovery site for the guest VMs in the protection policy.
      • To add an on-prem site for recovery, see Protection and DR between On-Prem Sites (Leap)
      • To add Xi Cloud Services for recovery, see Protection and DR between On-Prem Site and Xi Cloud Service (Xi Leap).
      Figure. Protection Policy Configuration: Additional Recovery Location Click to enlarge Protection Policy Configuration: Additional Recovery Location

    6. Click + Add Schedule to add a replication schedule between the primary site and the additional recovery site you specified in step e.
      Perform step d again in the Add Schedule window to add the replication schedule. The window auto-populates the Primary Location and the additional Recovery Location that you have selected in step b and step c.

      By default, recovery point creation begins immediately after you create the protection policy. If you want to specify when recovery point creation must begin, click Immediately at the top-right corner, and then, in the Start Time dialog box, do the following.

        1. Click Start protection at specific point in time .
        2. Specify the time at which you want to start taking recovery points.
        3. Click Save .
    7. Click Next .
      Clicking Next shows a list of VM categories where you can optionally check one or more VM categories to protect in the protection policy. DR configurations using Leap allows you to protect a guest VM by using only one protection policy. Therefore, VM categories specified in another protection policy are not in the list. If you protect a guest VM in another protection policy by specifying the VM category of the guest VM (category-based inclusion), and if you protect the guest VM from the VMs page in this policy (individual inclusion), the individual inclusion supersedes the category-based inclusion. Effectively, only the protection policy that protected the individual guest VM protects the guest VM.

      For example, the guest VM VM_SherlockH is in the category Department:Admin , and you add this category to the protection policy named PP_AdminVMs . Now, if you add VM_SherlockH from the VMs page to another protection policy named PP_VMs_UK , VM_SherlockH is protected in PP_VMs_UK and unprotected from PP_AdminVMs .

    8. If you want to protect the guest VMs category wise, check the VM categories that you want to protect from the list and click Add .
      Figure. Protection Policy Configuration: Add VM Categories Click to enlarge Protection Policy Configuration: Add VM Categories

      Prism Central includes built-in VM categories for frequently encountered applications (for example, MS Exchange and Oracle). If the VM category or value you want is not available, first create the category with the required values, or update an existing category so that it has the values you require. Doing so ensures that the VM categories and values are available for selection. You can add VMs to the category either before or after you configure the protection policy. If the guest VMs have a common characteristic, such as belonging to a specific application or location, create a VM category and add the guest VMs into the category.

      If you do not want to protect the guest VMs category wise, proceed to the next step without checking VM categories. You can add the guest VMs individually to the protection policy later from the VMs page (see Adding Guest VMs individually to a Protection Policy).

    9. Click Create .
      The protection policy with an Asynchronous replication schedule is created. To verify the protection policy, see the Protection Policies page. If you check VM categories in step h, the protection policy starts generating recovery points of the guest VMs in those VM categories. To see the generated recovery points, click the hamburger icon at the top-left corner of the window and go to VM Recovery Points . Click the recovery points for its information. You can see the time estimated for the very first replication (seeding) to the recovery sites.
      Figure. Recovery Points Overview Click to enlarge Recovery Points Overview

Application-consistent Recovery Point Conditions and Limitations

This topic describes the conditions and limitations for application-consistent recovery points that you can generate through a protection policy. For information about the operating systems that support the AOS version you have deployed, see the Compatibility Matrix.

  • Before taking an application-consistent recovery point, consider the workload type of your guest VM.

    Applications running in your guest VM must be able to quiesce I/O operations. For example, For example, you can quiesce I/O operations for database applications and similar workload types.

  • Before taking an application-consistent recovery point, install and enable Nutanix Guest Tools (NGT) on your guest VM.

    For installing and enabling NGT, see Nutanix Guest Tools in the Prism Web Console Guide .

    For guest VMs running on ESXi, consider these points.

  • Install and enable NGT on guest VMs running on ESXi also. Application-consistent recovery points fail for EFI boot-enabled Windows 2019 VMs running on ESXi without installing NGT.

  • (vSphere) If you do not enable NGT and then try to take an application-consistent recovery point, the system creates a Nutanix native recovery point with a single vSphere host-based recovery point. The system deletes the vSphere host-based recovery point. If you enable NGT and then try to take application-consistent recovery point, the system directly captures a Nutanix native recovery point.
  • Do not delete the .snapshot folder in the vCenter.

  • The following table lists the operating systems that support application-consistent recovery points with NGT installed.
Table 1. Supported Operating Systems (NGT Installed)
Operating system Version
Windows
  • Windows 2008 R2 through Windows 2019
Linux
  • CentOS 6.5 through 6.9 and 7.0 through 7.3
  • Red Hat Enterprise Linux (RHEL) 6.5 through 6.9 and 7.0 through 7.3.
  • Oracle Linux 6.5 and 7.0
  • SUSE Linux Enterprise Server (SLES) 11 SP1 through 11 SP4 and 12 SP1 through 12 SP3
  • Ubuntu 14.04

Application-consistent Recovery Points with Microsoft Volume Shadow Copy Service (VSS)

  • To take application-consistent recovery points on Windows guest VMs, enable Microsoft VSS services.

    When you configure a protection policy and select Take App-Consistent Recovery Point , the Nutanix cluster transparently invokes the VSS (also known as Shadow copy or volume snapshot service).

    Note: This option is available for ESXi and AHV only. However, you can use third-party backup products to invoke VSS for Hyper-V.
  • To take application-consistent recovery points on guest VMs that use VSS, systems invoke Nutanix native in-guest VmQuiesced Snapshot Service (VSS) agent. VSS framework takes application-consistent recovery points without causing VM stuns (temporary unresponsive VMs).
  • VSS framework enables third-party backup providers like Commvault and Rubrik to take application-consistent snapshots on Nutanix platform in a hypervisor-agnostic manner.

  • The default and only backup type for VSS snapshots is VSS_BT_COPY (copy backup).

    Third party Backup products can choose between VSS_BT_FULL (full backup )and VSS_BT_COPY (copy backup) backup types.

  • Guest VMs with delta, SATA, and IDE disks do not support Nutanix VSS recovery points.
  • Guest VMs with iSCSI attachments (LUNs) do not support Nutanix VSS recovery points.

    Nutanix VSS recovery points fail for such guest VMs.

  • Do not take Nutanix enabled application-consistent recovery points while using any third-party backup provider enabled VSS snapshots (for example, Veeam).

Pre-freeze and Post-thaw scripts

  • You can take application-consistent recovery points on NGT and Volume Shadow Copy Service (VSS) enabled guest VMs. However, some applications require more steps before or after the VSS operations to fully quiesce the guest VMs to an appropriate restore point or state in which the system can capture a recovery point. Such applications need pre-freeze and post-thaw scripts to run the necessary extra steps.
  • Any operation that the system must perform on a guest VM before replication or a recovery point capture is a pre-freeze operation. For example, if a guest VM hosts a database, you can enable hot backup of the database before replication using a pre-freeze script. Similarly, any operation that the system must perform on guest VM after replication or a recovery point capture is a post-thaw operation.
    Tip: Vendors such as CommVault provide pre-freeze and post-thaw scripts. You can also write your own pre-freeze and post-thaw scripts.
Script Requirements
  • For Windows VMs, you must administrator and have read, write, and execute permissions on the scripts.
  • For Linux VMs, you must have root ownership and root access with 700 permissions on the scripts.
  • For completion of any operation before or after replication or recovery point capture, you must have both the pre_freeze and post_thaw scripts for the operation.
  • Timeout for both the scripts is 60 seconds.
  • A script must return 0 to indicate a successful run. A non-zero return value implies that the script execution failed. The necessary log entries are available in the NGT logs.
    Tip: (AHV) For a non-zero return value from the pre-freeze script, the system captures a non application-consistent snapshot and raises an alert on the Prism web console. Similarly, for a non-zero return value from the post-thaw script, the system attempts to capture an application-consistent snapshot once again. If the attempt fails, the system captures a non application-consistent snapshot, and raises an alert on the Prism web console.
  • Irrespective of whether the pre-freeze script execution is successful, the corresponding post-thaw script runs.
Script Location
You can define Python or shell scripts or any executable or batch files at the following locations in Linux or Windows VMs. The scripts can contain commands and routines necessary to run specific operations on one or more applications.
  • In Windows VMs,
    • Batch script file path for pre_freeze scripts:
      C:\Program Files\Nutanix\Scripts\pre_freeze.bat
    • Batch script file path for post_thaw scripts:
      C:\Program Files\Nutanix\Scripts\post_thaw.bat
  • In Linux VMs,
    • Shell script file path for production failover:
      /usr/local/sbin/pre_freeze

      Replace pre_freeze with the script name (without extension).

    • Shell script file path for test failover:
      /usr/local/sbin/post_thaw

      Replace post_thaw with the script name (without extension).

      Note: The scripts must have root ownership and root access with 700 permissions.
Script Sample
Note: The following are only sample scripts and therefore must be modified to fit your deployment.
  • For Linux VMs
    #!/bin/sh
    #pre_freeze-script
    date >> '/scripts/pre_root.log'
    echo -e "\n attempting to run pre_freeze script for MySQL as root user\n" >> /scripts/pre_root.log
    if [ "$(id -u)" -eq "0" ]; then
    python '/scripts/quiesce.py' &
    echo -e "\n executing query flush tables with read lock to quiesce the database\n" >> /scripts/pre_freeze.log
    echo -e "\n Database is in quiesce mode now\n" >> /scripts/pre_freeze.log
    else
    date >> '/scripts/pre_root.log'
    echo -e "not root useri\n" >> '/scripts/pre_root.log'
    fi
    #!/bin/sh
    #post_thaw-script
    date >> '/scripts/post_root.log'
    echo -e "\n attempting to run post_thaw script for MySQL as root user\n" >> /scripts/post_root.log
    if [ "$(id -u)" -eq "0" ]; then
    python '/scripts/unquiesce.py'
    else
    date >> '/scripts/post_root.log'
    echo -e "not root useri\n" >> '/scripts/post_root.log'
    fi
  • For Windows VMs
    @echo off 
    echo Running pre_freeze script >C:\Progra~1\Nutanix\script\pre_freeze_log.txt
    @echo off 
    echo Running post_thaw script >C:\Progra~1\Nutanix\script\post_thaw_log.txt
Note: If any of these scripts prints excessive output to the console session, the script freezes. To avoid script freeze, perform the following.
  • Add @echo off to your scripts.
  • Redirect the script output to a log file.
If you receive a non-zero return code from the pre-freeze script, the system captures a non application-consistent recovery point and raises an alert on the Prism web console. If you receive a non-zero return code from the post-thaw script, the system attempts to capture an application-consistent snapshot once again. If that attempt fails, the system captures a non application-consistent snapshot, and raises an alert on the Prism web console.
Applications supporting application-consistent recovery points without scripts
Only the following applications support application-consistent recovery points without pre-freeze and post-thaw scripts.
  • Microsoft SQL Server 2008, 2012, 2016, and 2019
  • Microsoft Exchange 2010
  • Microsoft Exchange 2013
  • Microsoft Exchange 2016

  • Nutanix does not support application-consistent recovery points on Windows VMs that have mounted VHDX disks.
  • The system captures hypervisor-based recovery points only when you have VMware Tools running on the guest VM and the guest VM does not have any independent disks attached to it.

    If these requirements are not met, the system captures crash-consistent snapshots.

  • The following table provides detailed information on whether a recovery point is application-consistent or not depending on the operating systems and hypervisors running in your environment.
    Note:
    • Installed and active means that the guest VM has the following.
      • NGT installed.
      • VSS capability enabled.
      • Powered on.
      • Actively communicating with the CVM.
Table 2. Application-consistent Recovery Points
Server ESXi AHV
NGT status Result NGT status Result
Microsoft Windows Server edition Installed and active. Also pre-freeze and post-thaw scripts are present. Nutanix script-based VSS snapshots Installed and active. Also pre-freeze and post-thaw scripts are present. Nutanix script-based VSS snapshots
Installed and active Nutanix VSS-enabled snapshots. Installed and active Nutanix VSS-enabled snapshots
Not enabled Hypervisor-based application-consistent or crash-consistent snapshots. Not enabled Crash-consistent snapshots
Microsoft Windows Client edition Installed and active. Also pre-freeze and post-thaw scripts are present. Nutanix script-based VSS snapshots Installed and active. Also pre-freeze and post-thaw scripts are present. Nutanix script-based VSS snapshots
Not enabled Hypervisor-based snapshots or crash-consistent snapshots. Not enabled Crash-consistent snapshots
Linux VMs Installed and active. Also pre-freeze and post-thaw scripts are present. Nutanix script-based VSS snapshots Installed and active. Also pre-freeze and post-thaw scripts are present. Nutanix script-based VSS snapshots
Not enabled Hypervisor-based snapshots or crash-consistent snapshots. Not enabled Crash-consistent snapshots

Creating a Recovery Plan (Leap)

To orchestrate the failover (disaster recovery) of the protected guest VMs to the recovery site, create a recovery plan. After a failover, a recovery plan recovers the protected guest VMs to the recovery site. If you have configured two on-prem recovery sites in a protection policy, create two recovery plans for DR—one for recovery to each recovery site. The recovery plan synchronizes continuously to the recovery site in a bidirectional way.

About this task

To create a recovery plan, do the following at the primary site. You can also create a recovery plan at a recovery site. The recovery plan you create or update at a recovery site synchronizes back to the primary site.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Policies > Recovery Plans in the left pane.
    Figure. Recovery Plan Configuration: Recovery Plans
    Click to enlarge Recovery Plan Configuration: Protection Policies

  3. Click Create Recovery Plan .
    Specify the following information in the Create Recovery Plan window.
    Figure. Recovery Plan Configuration: General Click to enlarge Recovery Plan Configuration: General

  4. In the General tab, enter Recovery Plan Name , Recovery Plan Description , Primary Location , Recovery Location , and click Next .
    From Primary Location and Recovery Location drop-down lists, you can select either the local availability zone (site) or a non-local site to serve as your primary and recovery sites respectively. Local AZ represents the local site (Prism Central). If you are configuring recovery plan to recover the protected guest VMs to another Nutanix cluster at the same site, select Local AZ from both Primary Location and Recovery Location drop-down lists.
  5. In the Power On Sequence tab, click + Add Entities to add the guest VMs to the start sequence.
    Figure. Recovery Plan Configuration: Add Entities
    Click to enlarge Recovery Plan Configuration: Adding Entities

    1. In the Search Entities by , select VM Name from the drop-down list to specify guest VMs by name.
    2. In the Search Entities by , select Category from the drop-down list to specify guest VMs by category.
    3. To add the guest VMs or VM categories to the stage, select the VMs or VM categories from the list.
      Note: The VMs listed in the search result are in the active state of replication.
    4. Click Add .
      The selected guest VMs are added to the start sequence in a single stage by default. You can also create multiple stages to add guest VMs and define the order of their power-on sequence. For more information about stages, see Stage Management.
      Caution: Do not include the guest VMs protected with Asynchronous, NearSync, and Synchronous replication schedules in the same recovery plan. You can include guest VMs protected with Asynchronous or NearSync replication schedules in the same recovery plan. However, if you combine these guest VMs with the guest VMs protected by Synchronous replication schedules in a recovery plan, the recovery fails.
    5. To automate in-guest script execution on the guest VMs during recovery, select the individual guest VMs or VM categories in the stage and click Manage Scripts .
      Note: In-guest scripts allow you to automate various task executions upon recovery of the guest VMs. For example, in-guest scripts can help automate the tasks in the following scenarios.

      • After recovery, the guest VMs must use new DNS IP addresses and also connect to a new database server that is already running at the recovery site.

        Traditionally, to achieve this new configuration, you would manually log on to the recovered VM and modify the relevant files. With in-guest scripts, you have to write a script to automate the required steps and enable the script when you configure a recovery plan. The recovery plan execution automatically invokes the script and performs the reassigning of DNS IP address and reconnection to the database server at the recovery site.

      • If guest VMs are part of domain controller siteA.com at the primary site AZ1 , and after the guest VMs recover at the site AZ2 , you want to add the recovered guest VMs to the domain controller siteB.com .

        Traditionally, to reconfigure, you would manually log on to the VM, remove the VM from an existing domain controller, and then add the VM to a new domain controller. With in-guest scripts, you can automate the task of changing the domain controller.

      Note: In-guest script execution requires NGT version 1.9 or newer installed on the VM. The in-guest scripts run as a part of the recovery plan only if they have executable permissions for the following.
      • Administrator user (Windows)
      • Root user (Linux)
      Note: You can define a batch or shell script that executes automatically in the guest VMs after their disaster recovery. Place two scripts—one for production failover and the other for test failover—at the following locations in the guest VMs with the specified name.
      • In Windows VMs,
        • Batch script file path for production failover:
          C:\Program Files\Nutanix\scripts\production\vm_recovery
        • Batch script file path for test failover:
          C:\Program Files\Nutanix\scripts\test\vm_recovery
      • In Linux VMs,
        • Shell script file path for production failover:
          /usr/local/sbin/production_vm_recovery
        • Shell script file path for test failover:
          /usr/local/sbin/test_vm_recovery
      Note: When an in-guest script runs successfully, it returns code 0 . Any non-zero error code signifies that the execution of the in-guest script was unsuccessful.
      Figure. Recovery Plan Configuration: In-guest Script Execution
      Click to enlarge Recovery Plan Configuration: In-guest Script execution

        1. To enable script execution, click Enable .

          A command prompt icon appears against the guest VMs or VM categories to indicate that in-guest script execution is enabled on those guest VMs or VM categories.

        2. To disable script execution, click Disable .
  6. In the Network Settings tab, map networks in the primary cluster to networks at the recovery cluster.
    Figure. Recovery Plan Configuration: Network Settings
    Click to enlarge Recovery Plan Configuration: Network Mapping

    Network mapping enables replicating the network configurations of the primary Nutanix clusters to the recovery Nutanix clusters, and recover guest VMs into the same subnet at the recovery Nutanix cluster. For example, if a guest VM is in the vlan0 subnet at the primary Nutanix cluster, you can configure the network mapping to recover that guest VM in the same vlan0 subnet at the recovery Nutanix cluster. To specify the source (primary Nutanix cluster) and destination (recovery Nutanix cluster) network information for network mapping, do the following in Local AZ (Primary) and PC 10.xx.xx.xxx (Recovery) panes.
    1. Under Production in Virtual Network or Port Group drop-down list, select the production subnet that contains the protected guest VMs. (optional) If the virtual network is a non-IPAM network, specify the gateway IP address and prefix length in the Gateway IP/Prefix Length field.
    2. Under Test Failback in Virtual Network or Port Group drop-down list, select the test subnet that you want to use for testing failback from the recovery Nutanix cluster. (optional) If the virtual network is a non-IPAM network, specify the gateway IP address and prefix length in the Gateway IP/Prefix Length field.
    3. To add more network mappings, click Add Networks at the top-right corner of the page, and then repeat the steps 6.a-6.b.
      Note: The primary and recovery Nutanix clusters must have identical gateway IP addresses and prefix length. Therefore you cannot use a test failover network for two or more network mappings in the same recovery plan.
    4. Click Done .
    Note: For ESXi, you can configure network mapping for both standard and distributed (DVS) port groups. For more information about DVS, see VMware documentation.
    Caution: Leap does not support VMware NSX-T datacenters. For more information about NSX-T datacenters, see VMware documentation.
  7. To perform VM-level static IP address mapping between the primary and the recovery sites, click Advanced Settings , click Custom IP Mapping , and then do the following.
    Note: The Custom IP Mapping shows all the guest VMs with static IP address configured, NGT installed, and VNIC in the source subnet specified in the network mapping.
    1. To locate the guest VM, type the name of the guest VM in the filter field.
      A guest VM that has multiple NICs lists in multiple rows, allowing you to specify an IP address mapping for each VNIC. All the fields auto-populate with the IP addresses generated based on the offset IP address-mapping scheme.
    2. In the Test Failback field for the local site, Production field for the remote (recovery) site, and Test Failover for the remote site, edit the IP addresses.
      Perform this step for all the IP addresses that you want to map.
      Caution: Do not edit the IP address assigned to the VNIC in the local site. If you do not want to map static IP addresses for a particular VNIC, you can proceed with the default entries.
    3. Click Save .
    4. If you want to edit one or more VM-level static IP address mappings, click Edit , and then change the IP address mapping.
  8. If VM-level static IP address mapping is configured between the primary and the recovery Nutanix clusters and you want to use the default, offset-based IP address-mapping scheme, click Reset to Matching IP Offset .
  9. Click Done .
    The recovery plan is created. To verify the recovery plan, see the Recovery Plans page. You can modify the recovery plan to change the recovery location, add, or remove the protected guest VMs. For information about various operations that you can perform on a recovery plan, see Recovery Plan Management.
Stage Management

A stage defines the order in which the protected guest VMs start at the recovery cluster. You can create multiple stages to prioritize the start sequence of the guest VMs. In the Power On Sequence , the VMs in the preceding stage start before the VMs in the succeeding stages. On recovery, it is desirable to start some VMs before the others. For example, database VMs must start before the application VMs. Place all the database VMs in the stage before the stage containing the application VMs, in the Power On Sequence .

Figure. Recovery Plan Configuration: Power On Sequence Click to enlarge Recovery Plan Configuration: Power On Sequence

To Add a Stage in the Power-On Sequence and Add Guest VMs to It, Do the Following.

  1. Click +Add New Stage .
  2. Click +Add Entities .
  3. To add guest VMs to the current stage in the power-on sequence, do the following.
    1. In the Search Entities by , select VM Name from the drop-down list to specify guest VMs by name.
    2. In the Search Entities by , select Category from the drop-down list to specify guest VMs by category.
    3. To add the guest VMs or VM categories to the stage, select the guest VMs or VM categories from the list.
      Note: The VMs listed in the search result are in the active state of replication.
  4. Click Add .

To Remove a Stage from the Power-On Sequence, Do the Following.

Click Actions > Remove Stage
Note: You see Actions in a stage only when none of the VMs in the stage are selected. When one or more VMs in the stage are selected, you see More Actions .

To Change the Position of a Stage in the Power-On Sequence, Do the Following.

  • To move a stage up or down in the power-on sequence, click or respectively, at the top-right corner of the stage.
  • To expand or collapse a stage, click + or - respectively, at the top-right corner of the stage.
  • To move VMs to a different stage, select the VMs, do the following.
    1. Click More Actions > Move .
    2. Select the target stage from the list.
    Note: You see Move in the More Actions only when you have defined two or more stages.

To Set a Delay Between the Power-On Sequence of Two Stages, Do the Following.

  1. Click +Add Delay .
  2. Enter the time in seconds.
  3. Click Add .

To Add Guest VMs to an Existing Stage, Do the Following.

  1. Click Actions > Add Entities .
    Note: You see Actions in a stage only when none of the VMs in the stage are selected. When one or more VMs in the stage are selected, you see More Actions .
  2. To add VMs to the current stage in the power-on sequence, do the following.
    1. In the Search Entities by , select VM Name from the drop-down list to specify guest VMs by name.
    2. In the Search Entities by , select Category from the drop-down list to specify guest VMs by category.
    3. To add the guest VMs or VM categories to the stage, select the guest VMs or VM categories from the list.
      Note: The VMs listed in the search result are in the active state of replication.
  3. Click Add .

To Remove Guest VMs from an Existing Stage, Do the Following.

  1. Select the VMs from the stage.
  2. Click More Actions > Remove .
    Note: You see More Actions in a stage only when one or more VMs in the stage are selected. When none of the VMs in the stage are selected, you see Actions .

To Move Guest VMs to a Different Stage, Do the Following.

  1. Select the VMs from the stage.
  2. Click More Actions > Move .
    Note: You see More Actions in a stage only when one or more VMs in the stage are selected. When none of the VMs in the stage are selected, you see Actions .
  3. Select the target stage from the list.

Failover and Failback Management

You perform failover of the protected guest VMs when unplanned failure events (for example, natural disasters) or planned events (for example, scheduled maintenance) happen at the primary availability zone (site) or the primary cluster. The protected guest VMs migrate to the recovery site where you perform the failover operations. On recovery, the protected guest VMs start in the Nutanix cluster you specify in the recovery plan that orchestrates the failover.

The following are the types of failover operations.

Test failover
To ensure that the protected guest VMs failover efficiently to the recovery site, you perform a test failover. When you perform a test failover, the guest VMs recover in the virtual network designated for testing purposes at the recovery site. However, the guest VMs at the primary site are not affected. Test failovers rely on the presence of VM recovery points at the recovery sites.
Planned failover
To ensure VM availability when you foresee service disruption at the primary site, you perform a planned failover to the recovery site. For a planned failover to succeed, the guest VMs must be available at the primary site. When you perform a planned failover, the recovery plan first creates a recovery point of the protected guest VM, replicates the recovery point to the recovery site, and then starts the guest VM at the recovery site. The recovery point used for migration is retained indefinitely. After a planned failover, the guest VMs no longer run at the primary site.
Unplanned failover
To ensure VM availability when a disaster causing service disruption occurs at the primary site, you perform an unplanned failover to the recovery site. In an unplanned failover, you can expect some data loss to occur. The maximum data loss possible is equal to the least RPO you specify in the protection policy, or the data that was generated after the last manual recovery point for a given guest VM. In an unplanned failover, by default, the protected guest VMs recover from the most recent recovery point. However, you can recover from an earlier recovery point by selecting a date and time of the recovery point.

At the recovery site, the guest VMs can recover using the recovery points replicated from the primary site only. The guest VMs cannot recover using the local recovery points. For example, if you perform an unplanned failover from the primary site AZ1 to the recovery site AZ2 , the guest VMs recover at AZ2 using the recovery points replicated from AZ1 to AZ2 .

You can perform a planned or an unplanned failover in different scenarios of network failure. For more information about network failure scenarios, see Leap and Xi Leap Failover Scenarios.

At the recovery site after a failover, the recovery plan creates only the VM category that was used to include the guest VM in the recovery plan. Manually create the remaining VM categories at the recovery site and associate the guest VMs with those categories.

The recovered guest VMs generate recovery points as per the replication schedule that protects it even after recovery. The recovery points replicate back to the primary site when the primary site starts functioning. The approach for reverse replication enables you to perform failover of the guest VMs from the recovery site back to the primary site (failback). The same recovery plan applies to both the failover and the failback operations. Only, for failover, you must perform the failover operations on the recovery plan at the recovery site while for failback, you must perform the failover operations on the recovery plan at the primary site. For example, if a guest VM fails over from AZ1 (Local) to AZ2 , the failback fails over the same VMs from AZ2 (Local) back to AZ1 .

Leap and Xi Leap Failover Scenarios

You have the flexibility to perform a real or simulated failover for the full and partial workloads (with or without networking). The term virtual network is used differently on on-prem clusters and Xi Cloud Services. In Xi Cloud Services, the term virtual network is used to describe the two built-in virtual networks—production and test. Virtual networks on the on-prem clusters are virtual subnets bound to a single VLAN. Manually create these virtual subnets, and create separate virtual subnets for production and test purposes. Create these virtual subnets before you configure recovery plans. When configuring a recovery plan, you map the virtual subnets at the primary site to the virtual subnets at the recovery site.

Figure. Failover in Network Mapping Click to enlarge Failover in Network Mapping

The following are the various scenarios that you can encounter in Leap configurations for disaster recovery (DR) to an on-prem availability zone (site) or to Xi Cloud (Xi Leap). Each scenario is explained with the required network-mapping configuration for Xi Leap. However, the configuration remains the same irrespective of disaster recovery (DR) using Leap or Xi Leap. You can either create a recovery plan with the following network mappings (see Creating a Recovery Plan (Leap)) or update an existing recovery plan with the following network mappings (see Updating a Recovery Plan).

Scenario 1: Leap Failover (Full Network Failover)

Full network failure is the most common scenario. In this case, it is desirable to bring up the whole primary site in the Xi Cloud. All the subnets must failover, and the WAN IP address must change from the on-prem IP address to the Xi WAN IP address. Floating IP addresses can be assigned to individual guest VMs, otherwise, everything use Xi network address translation (NAT) for external communication.

Perform the failover when the on-prem subnets are down and jump the host available on the public Internet through the floating IP address of Xi production network.

Figure. Full Network Failover Click to enlarge Full Network Failover

To set up the recovery plan that orchestrates the full network failover, perform the following.

  1. Open the Network Settings page to configure network mappings in a recovery plan.
  2. Select the Local AZ > Production > Virtual Network or Port Group .

    The selection auto-populates the Xi production and test failover subnets.

  3. Select the Outbound Internet Access switch to allow the Xi NAT to use for Internet access.
  4. Dynamically assign the floating IP addresses to the guest VMs you select in the recovery plan.

    Perform steps 1–4 for every subnet.

Figure. Recovery Plan Configuration: Network Settings Click to enlarge Recovery Plan Configuration: Network Settings

Scenario 2: Xi Network Failover (Partial Network Failover)

You want to failover one or more subnets from the primary site to Xi Cloud. The communications between the sites happen through the VPN or using the external NAT or floating IP addresses. A use case of this type of scenario is that the primary site needs maintenance, but some of its subnets must see no downtime.

Perform partial failover when some subnets are active in the production networks at both on-prem and Xi Cloud, and jump the host available on the public Internet through the floating IP address of Xi production network.

On-prem guest VMs can connect to the guest VMs on the Xi Cloud Services.

Figure. Partial Network Failover Click to enlarge Partial Network Failover

To set up the recovery plan that orchestrates the partial network failover, perform the following.

  1. Open the Network Settings page to configure network mappings in a recovery plan.
  2. Select the Local AZ > Production > Virtual Network or Port Group .

    The selection auto-populates the Xi production and test failover subnets.

  3. Select the Outbound Internet Access switch to allow the Xi NAT to use for Internet access.
  4. Dynamically assign the floating IP addresses to the guest VMs you select in the recovery plan.

    Perform steps 1–4 for one or more subnets based on the maintenance plan.

Figure. Recovery Plan Configuration: Network Settings Click to enlarge Recovery Plan Configuration: Network Settings

Scenario 3: Xi Network Failover (Partial Subnet Network Failover)

You want to failover some guest VMs to Xi Cloud, while keeping the other guest VMs up and running at the on-prem cluster (primary site). A use case of this type of scenario is that the primary site needs maintenance, but some of its guest VMs must see no downtime.

This scenario requires changing IP addresses for the guest VMs running at Xi Cloud. Since you cannot have the subnet active on both the sites, create a subnet to host the failed over guest VMs. Jump the host available on the public Internet through the floating IP address of Xi production network.

On-prem guest VMs can connect to the guest VMs on the Xi Cloud Services.

Figure. Partial Subnet Network Failover Click to enlarge Partial Subnet Network Failover

To set up the recovery plan that orchestrates the partial subnet network failover, perform the following.

  1. Open the Network Settings page to configure network mappings in a recovery plan.
  2. Select the Local AZ > Production > Virtual Network or Port Group .

    The selection auto-populates the Xi production and test failover subnets for a full subnet failover

    Note: In this case, you have created subnets on the Xi Cloud Services also. Choose that subnets to avoid full subnet failover (scenario 1).
  3. Select the Outbound Internet Access switch to allow the Xi NAT to use for Internet access.
  4. Dynamically assign the floating IP addresses to the guest VMs you select in the recovery plan.

    Perform steps 1–4 for one or more subnets based on the maintenance plan.

Figure. Recovery Plan Configuration: Network Settings Click to enlarge Recovery Plan Configuration: Network Settings

Scenario 4: Xi Network Failover (Test Failover and Failback)

You want to test all the preceding three scenarios by creating an isolated test network so that no routing or IP address conflict happens. Clone all the guest VMs from a local recovery point and bring up to test failover operations. Test failover test when all on-prem subnets are active and on-prem guest VMs can connect to the guest VMs at the Xi Cloud. Jump the host available on the public Internet through the floating IP address of Xi production network.

Figure. Test Failover & Failback Click to enlarge Test Failover & Failback

In this case, focus on the test failover section when creating the recovery plan. When you select a local AZ production subnet, it copies to the test network. You can go one step further and create a test subnet at the Xi Cloud.

Figure. Recovery Plan Configuration: Network Settings Click to enlarge Recovery Plan Configuration: Network Settings

After the guest VMs test failover to Xi Cloud, you can do a test failback to the primary site.
Note: Make a test subnet in advance for the failback to the on-prem cluster.
Figure. Recovery Plan Configuration: Network Settings Click to enlarge Recovery Plan Configuration: Network Settings

Failover and Failback Operations

You can perform test failover, planned failover, and unplanned failover of the guest VMs protected with Asynchronous replication schedule across different Nutanix clusters at the same or different on-prem availability zones (sites). The steps to perform test, planned, and unplanned failover are largely the same irrespective of the replication schedules that protect the guest VMs.

Performing a Test Failover (Leap)

After you create a recovery plan, you can run a test failover periodically to ensure that the failover occurs smoothly when required. To perform a test failover, do the following procedure at the recovery site. If you have two recovery sites for DR, perform the test at the site where you want to recover the guest VMs.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Policies > Recovery Plans in the left pane.
  3. Select the recovery plan that you want to test.
  4. Click Test from the Actions drop-down menu.
    Figure. Test Failover (Drop-down) Click to enlarge Recovery Plan Configuration: General

    Test Recovery Plan window shows. The window auto-populates the Failover From and Failover To locations from the recovery plan you select in step 3. Failover To location is Local AZ by default and is unavailable for editing.
    Figure. Test Recovery Plan Click to enlarge Test Recovery Plan

  5. Click + Add target clusters if you want to failover to specific Nutanix clusters at the recovery site.
    If you do not add target clusters, the recovery plan recovers the guest VMs to any eligible cluster at the recovery site.
  6. Click Test .
    The Test Recovery Plan dialog box lists the errors and warnings, if any, and allows you to stop or continue the test operation. If there are no errors or you resolve the errors in step 7, the guest VMs failover to the recovery cluster.
  7. If you see errors, do the following.
    • To review errors or warnings, click View Details in the description.

      Resolve the error conditions and then restart the test procedure.

    • Select one of the following.
      • To stop the failover operation, click Abort .
      • To continue the failover operation despite the warnings, click Execute Anyway .
        Note: You cannot continue the failover operation when the validation fails with errors.
Cleaning up Test VMs (Leap)

After testing a recovery plan, you can remove the test VMs that the recovery plan creates in the recovery test network. To clean up the test VMs, do the following at the recovery site where the test failover created the test VMs.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Policies > Recovery Plans in the left pane.
  3. Select the recovery plans whose test VMs you want to remove.
  4. Click Clean Up Test VMs from the Actions drop-down menu.
    Clean Up Test VMs dialog box shows with the name of the recovery plan you selected in step 3.
  5. Click Clean .
    Figure. Clean Up Test VMs Click to enlarge Clean Up Test VMs

Performing a Planned Failover (Leap)

If there is a planned event (for example, scheduled maintenance of guest VMs) at the primary availability zone (site), perform a planned failover to the recovery site. To perform a planned failover, do the following procedure at the recovery site. If you have two recovery sites for DR, perform the failover at the site where you want to recover the guest VMs.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Policies > Recovery Plans in the left pane.
  3. Select a recovery plan for the failover operation.
  4. Click Failover from the Actions drop-down menu.
    Note: If you select more than one recovery plan in step 3, the Failover action is available only when the selected recovery plans have the same primary and recovery locations.
    Specify the following information in the Failover from Recovery Plan window. The window auto-populates the Failover From and Failover To locations from the recovery plan you select in step 3.
    Figure. Planned Failover
    Click to enlarge Planned Failover

    1. Failover Type : Click Planned Failover .
      Warning: Do not check Live Migrate VMs . Live migration works only for the planned failover of the guest VMs protected in Synchronous replication schedule. If you check Live Migrate VMs for the planned failover of the guest VMs protected in Asynchronous or NearSync replication schedule, the failover task fails.
    2. Click + Add target clusters if you want to failover to specific Nutanix clusters at the recovery site.
      Figure. Planned Failover: Select Recovery Cluster
      Click to enlarge Planned Failover: Select Recovery Cluster

      If you do not add target clusters, the recovery plan recovers the guest VMs to any eligible cluster at the recovery site.
  5. Click Failover .
    The Failover from Recovery Plan dialog box lists the errors and warnings, if any, and allows you to stop or continue the failover operation. If there are no errors or you resolve the errors in step 6, the guest VMs failover to the recovery Nutanix cluster.
  6. If you see errors, do the following.
    • To review errors or warnings, click View Details in the description.

      Resolve the error conditions and then restart the failover procedure.

    • Select one of the following.
      • To stop the failover operation, click Abort .
      • To continue the failover operation despite the warnings, click Execute Anyway .
        Note: You cannot continue the failover operation when the validation fails with errors.
    Note:

    The entities of AHV/ESXi clusters recover at a different path on the ESXi clusters if their files conflict with the existing files on the recovery ESXi cluster. For example, there is a file name conflict if a VM (VM1) migrates to a recovery cluster that already has a VM (VM1) in the same container.

    However, the entities recover at a different path with VmRecoveredAtAlternatePath alert only if the following conditions are met.

    • Prism Element running on both the primary and the recovery Nutanix clusters are of version 5.17 or newer.
    • A path for the entity recovery is not defined while initiating the failover operation.
    • The protected entities do not have shared disk/s.

    If these conditions are not satisfied, the failover operation fails.

Performing an Unplanned Failover (Leap)

If there is an unplanned event (for example, a natural disaster or network failure) at the primary availability zone (site), perform an unplanned failover to the recovery site. To perform an unplanned failover, do the following procedure at the recovery site. If you have two recovery sites for DR, perform the failover at the site where you want to recover the guest VMs.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Policies > Recovery Plans in the left pane.
  3. Select a recovery plan for the failover operation.
  4. Click Failover from the Actions drop-down menu.
    Note: If you select more than one recovery plan in step 3, the Failover action is available only when the selected recovery plans have the same primary and recovery locations.
    Specify the following information in the Failover from Recovery Plan window. The window auto-populates the Failover From and Failover To locations from the recovery plan you select in step 3.
    Figure. Unplanned Failover
    Click to enlarge Unplanned Failover

    1. Failover Type : Click Unplanned Failover and do one of the following.
      • Click Recover from latest Recovery Point to use the latest recovery point for recovery.
      • Click Recover from specific point in time to use a recovery point taken at a specific point in time for recovery.
        Note: If you click Recover from specific point in time , select a Nutanix cluster that hosts the specific point in time recovery point (step 4.b). If you do not select a cluster, or select multiple clusters where the same recovery points exist, the guest VMs fail to recover efficiently because the system encounters more than one recovery point at the recovery site. For example, if a primary site AZ1 replicates the same recovery points to two clusters CLA and CLB at site AZ2 , select either the cluster CLA or the cluster CLB as the target cluster when you click to recover from a specific point in time. If you select both CLA and CLB , the guest VMs fail to recover.

    2. Click + Add target clusters if you want to failover to specific Nutanix clusters at the recovery site.
      Figure. Unplanned Failover: Select Recovery Cluster
      Click to enlarge Unplanned Failover: Select Recovery Cluster

      If you do not add target clusters, the recovery plan recovers the guest VMs to any eligible cluster at the recovery site.
    Note: If recovery plans contain VM categories, the VMs from those categories recover in the same category after an unplanned failover to the recovery site. Also, the recovery points keep generating at the recovery site for those recovered VMs. Since the VM count represents the number of recoverable VMs (calculated from recovery points), the recovered VMs and their newly created recovery points sum up. Their sum gives double the count of the originally recovered VMs on the recovery plans page. Now, if some VMs belonging to the given category at the primary or recovery site are deleted, the VM count at both sites still stay the same until the recovery points of deleted VMs expire. For example, when two VMs have failed over, the recovery plans page at the recovery site shows four VMs (two replicated recovery points from source and two newly generated recovery points). The page shows four VMs even if the VMs are deleted from the primary or recovery site. The VM count synchronizes and becomes consistent in the subsequent RPO cycle conforming to the retention policy set in the protection policy (due to the expiration of recovery points).
  5. Click Failover .
    The Failover from Recovery Plan dialog box lists the errors and warnings, if any, and allows you to stop or continue the failover operation. If there are no errors or you resolve the errors in step 6, the guest VMs failover to the recovery Nutanix cluster.
  6. If you see errors, do the following.
    • To review errors or warnings, click View Details in the description.

      Resolve the error conditions and then restart the failover procedure.

    • Select one of the following.
      • To stop the failover operation, click Abort .
      • To continue the failover operation despite the warnings, click Execute Anyway .
        Note: You cannot continue the failover operation when the validation fails with errors.
    Note: To avoid conflicts when the primary site becomes active after the failover, shut down the guest VMs associated with this recovery plan. Manually power off the guest VMs on either primary or recovery site after the failover is complete. You can also block the guest VMs associated with this recovery plan through the firewall.

    Note:

    The entities of AHV/ESXi clusters recover at a different path on the ESXi clusters if their files conflict with the existing files on the recovery ESXi cluster. For example, there is a file name conflict if a VM (VM1) migrates to a recovery cluster that already has a VM (VM1) in the same container.

    However, the entities recover at a different path with VmRecoveredAtAlternatePath alert only if the following conditions are met.

    • Prism Element running on both the primary and the recovery Nutanix clusters are of version 5.17 or newer.
    • A path for the entity recovery is not defined while initiating the failover operation.
    • The protected entities do not have shared disk/s.

    If these conditions are not satisfied, the failover operation fails.

Performing Failback (Leap)

A failback is failover of the guest VMs from the recovery availability zone (site) back to the primary site. The same recovery plan applies to both the failover and the failback operations. Only, for failover, you must perform the failover operations on the recovery plan at the recovery site while for failback, you must perform the failover operations on the recovery plan at the primary site.

About this task

To perform a failback, do the following procedure at the primary site.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Policies > Recovery Plans in the left pane.
  3. Select a recovery plan for the failover operation.
  4. Click Failover from the Actions drop-down menu.
    Note: If you select more than one recovery plan in step 3, the Failover action is available only when the selected recovery plans have the same primary and recovery locations.
    Specify the following information in the Failover from Recovery Plan window. The window auto-populates the Failover From and Failover To locations from the recovery plan you select in step 3.
    Figure. Unplanned Failover
    Click to enlarge Unplanned Failover

    1. Failover Type : Click Unplanned Failover and do one of the following.
      Tip: You can also click Planned Failover to perform planned failover procedure for a failback.
      • Click Recover from latest Recovery Point to use the latest recovery point for recovery.
      • Click Recover from specific point in time to use a recovery point taken at a specific point in time for recovery.
    2. Click + Add target clusters if you want to failover to specific Nutanix clusters at the primary site.
      If you do not add target clusters, the recovery plan recovers the guest VMs to any eligible cluster at the primary site.
    Note: If recovery plans contain VM categories, the VMs from those categories recover in the same category after an unplanned failover to the recovery site. Also, the recovery points keep generating at the recovery site for those recovered VMs. Since the VM count represents the number of recoverable VMs (calculated from recovery points), the recovered VMs and their newly created recovery points sum up. Their sum gives double the count of the originally recovered VMs on the recovery plans page. Now, if some VMs belonging to the given category at the primary or recovery site are deleted, the VM count at both sites still stay the same until the recovery points of deleted VMs expire. For example, when two VMs have failed over, the recovery plans page at the recovery site shows four VMs (two replicated recovery points from source and two newly generated recovery points). The page shows four VMs even if the VMs are deleted from the primary or recovery site. The VM count synchronizes and becomes consistent in the subsequent RPO cycle conforming to the retention policy set in the protection policy (due to the expiration of recovery points).
  5. Click Failover .
    The Failover from Recovery Plan dialog box lists the errors and warnings, if any, and allows you to stop or continue the failover operation. If there are no errors or you resolve the errors in step 6, the guest VMs failover to the recovery cluster.
  6. If you see errors, do the following.
    • To review errors or warnings, click View Details in the description.

      Resolve the error conditions and then restart the failover procedure.

    • Select one of the following.
      • To stop the failover operation, click Abort .
      • To continue the failover operation despite the warnings, click Execute Anyway .
        Note: You cannot continue the failover operation when the validation fails with errors.
    Note: To avoid conflicts when the primary site becomes active after the failover, shut down the guest VMs associated with this recovery plan. Manually power off the guest VMs on either primary or recovery site after the failover is complete. You can also block the guest VMs associated with this recovery plan through the firewall.

    Note:

    The entities of AHV/ESXi clusters recover at a different path on the ESXi clusters if their files conflict with the existing files on the recovery ESXi cluster. For example, there is a file name conflict if a VM (VM1) migrates to a recovery cluster that already has a VM (VM1) in the same container.

    However, the entities recover at a different path with VmRecoveredAtAlternatePath alert only if the following conditions are met.

    • Prism Element running on both the primary and the recovery Nutanix clusters are of version 5.17 or newer.
    • A path for the entity recovery is not defined while initiating the failover operation.
    • The protected entities do not have shared disk/s.

    If these conditions are not satisfied, the failover operation fails.

Monitoring a Failover Operation (Leap)

After you trigger a failover operation, you can monitor failover-related tasks. To monitor a failover, perform the following procedure at the recovery site. If you have two recovery sites for DR, perform the procedure at the site where you trigger the failover.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Policies > Recovery Plans in the left pane.
  3. Click the name of the recovery plan for which you triggered failover.
  4. Click the Tasks tab.
    The left pane displays the overall status. The table in the details pane lists all the running tasks and their individual statuses.

Self-Service Restore

The self-service restore (also known as file-level restore) feature allows you to do a self-service data recovery from the Nutanix data protection recovery points with minimal intervention. You can perform self-service data recovery on both on-prem and Xi Cloud Services.

You must deploy NGT 2.0 or newer on guest VMs to enable self-service restore from Prism Central. For more information about enabling and mounting NGT, see Enabling and Mounting Nutanix Guest Tools in the Prism Web Console Guide . When you enable self-service restore and attach a disk by logging into the VM, you can recover files within the guest OS. If you fail to detach the disk from the VM, the disk is detached automatically from the VM after 24 hours.

Note:
  • You can enable self-service restore for a guest VM through a web interface or nCLI.
  • NGT performs the in-guest actions For more information about in-guest actions, see Nutanix Guest Tools in the Prism Web Console Guide .
  • Self-service restore supports only full snapshots generated from Asynchronous and NearSync replication schedules.
Self-Service Restore Requirements

The requirements of self-service restore of Windows and Linux VMs are as follows.

General Requirements of Self-Service Restore

The following are the general requirements of self-service restore. Ensure that you meet the requirements before configuring self-service restore for guest VMs.

License Requirements

AOS Ultimate. For more information about the features available with AOS Starter license, see Software Options.

Hypervisor Requirements

Two AHV or ESXi clusters, each registered to the same or different Prism Centrals.

  • AHV (running AOS 5.18 or newer)

    The on-prem clusters must be running the version of AHV that comes bundled with the supported version of AOS.

  • ESXi (running AOS 5.18 or newer)

    The on-prem clusters must be running on version ESXi 6.5 GA or newer.

Nutanix Software Requirements

Prism Centrals and their registered on-prem clusters (Prism Elements) must be running the following versions of AOS.

  • AOS 5.18 or newer with AHV.
  • AOS 5.18 or newer with ESXi.
  • You have installed NGT 2.0 or newer. For more information about enabling and mounting NGT, see Enabling and Mounting Nutanix Guest Tools in the Prism Web Console Guide .
  • You have set disk.enableUUID=true in the .vmx file for the guest VMs running on ESXi.
  • You have configured Nutanix recovery points by adding guest VM to an Asynchronous protection policy.
  • You have attached an IDE/SCSI or SATA disk

Requirements for Guest VMs Running Windows OS

The following are the specific requirements of self-service restore for guest VMs running Windows OS. Ensure that you meet the requirements before proceeding.

  • You have enough logical drive letters to bring the disk online.
  • You have one of the following Windows OS as the guest OS.
    • Windows Server 2008 R2 or newer
    • Windows 7 through Windows 10

Requirements for Guest VMs Running Linux OS

The following are the specific requirements of self-service restore for guest VMs running Linux OS. Ensure that you meet the requirements before proceeding.

  • You have appropriate file systems to recover. Self-service restore supports only extended file systems (ext2, ext3, and ext4) and XFS file systems.
  • Logical Volume Manager (LVM) disks for which the volume group corresponds to only a single physical disk are mounted.
  • You have one of the following Linux OS as the guest OS.
    • CentOS 6.5 through 6.9 and 7.0 through 7.3
    • Red Hat Enterprise Linux (RHEL) 6.5 through 6.9 and 7.0 through 7.3
    • Oracle Linux 6.5 and 7.0
    • SUSE Linux Enterprise Server (SLES) 11 SP1 through 11 SP4 and 12 SP1 through 12 SP3
    • Ubuntu 14.04 for both AHV and ESXi
    • Ubuntu 16.10 for AHV only
Self-Service Restore Limitations

The limitations of self-service restore of Windows and Linux VMs are as follows.

General Limitations of Self-Service Restore

The following are the general limitations of self-service restore.

  • Volume groups are not supported.
  • Snapshots created in AOS 4.5 or later releases are only supported.
  • PCI and delta disks are not supported.

Limitations of VMs Running Windows OS

The following are the specific limitations of self-service restore for guest VMs running Windows OS.

  • File systems. Self-service restore does not support dynamic disks consisting of NTFS on simple volumes, spanned volumes, striped volumes, mirrored volumes, and RAID-5 volumes.
  • Only 64-bit OSes are supported.
  • Self-service restore does not support disks created as Microsoft Storage Space devices by using Microsoft Windows Server 2016 or newer.

Limitations of VMs Running Linux OS

Whenever the snapshot disk has an inconsistent filesystem (as indicated by the fsck check), the disk is only attached and not mounted.

Enabling Self-Service Restore

After enabling NGT for a guest VM, you can enable the self-service restore for that guest VM. Also, you can enable the self-service restore for a guest VM while you are installing NGT on that guest VM.

Before you begin

For more information, see Enabling and Mounting Nutanix Guest Tools in the Prism Web Console Guide .

Ensure that you have installed and enabled NGT 2.0 or newer on the guest VM.

About this task

To enable self-service restore, perform the following procedure.

Procedure

  1. Log on to Prism Central.
  2. Click the hamburger icon at the top-left corner of the window. Go to Virtual Infrastructure > VMs in the left pane.
  3. Select the guest VM where you want to enable self-service restore.
  4. Click Manage NGT Applications from the Actions drop-down menu.
    Figure. Enabling Self-Service Restore Click to enlarge Enable Self -Service Restore

    Note: If the guest VM does not have NGT installed, click Install NGT from the Actions drop-down menu and select to enable Self Service Restore (SSR) .
  5. Click Enable below the Self Service Restore (SSR) panel.
  6. Click Confirm .
    Self-service restore feature is enabled on the guest VM. You can now restore the desired files from the VM.
Self-Service Restore for Windows VMs

You can restore the desired files from the VM through the web interface or by using the ngtcli utility of self-service restore.

Restoring a File through Web Interface (Windows VM)

After you install NGT in the Windows guest VM, you can restore the desired files from the VM through the web interface.

Before you begin

Ensure that you have configured your Windows VM to use NGT. For more information, see Installing NGT on Windows Machines in the Prism Web Console Guide .

About this task

To restore a file in Windows guest VMs by using web interface, perform the following.

Procedure

  1. Log in to the guest Windows VM by using administrator credentials.
  2. Click the Nutanix SSR icon on the desktop.
  3. Type the administrator credentials of the VM.
    Note: If you use:
    • NETBIOS domain name in username field (for example, domain\username ), then you will be able to log on to SSR only if your account is explicitly added to Administrators group on the server. If username is added to any domain group, which is then added to Administrators group, then logon will fail. Also, you must type NETBIOS domain name in capital letters (domain name has to be written in the same way as you see in the output of command net localgroup administrators ).
    • FQDN in username (for example domain.com\username ), then you will only be able to logon if username user is a member of the domain admins group.
    Note: The snapshots that are taken for that day are displayed. You also have an option to select the snapshots for the week, month, and the year. In addition, you can also define a custom range of dates and select the snapshot.
    Figure. Snapshot Selection Click to enlarge Select Snapshot

  4. Select the appropriate tab, This Week , This Month , This Year .
    You can also customize the selection by clicking Custom Range tab and selecting the date range in the From and To fields.
  5. Select the check box of the disks that you want to attach from the snapshot.
  6. Select Mount from the Disk Action drop-down menu.
    Figure. Mounting of Disks Click to enlarge disk mount

    The selected disk or disks are mounted and the relevant disk label is displayed.
  7. Go to the attached disk label drive in the VM and restore the desired files.
  8. To view the list of all the mounted snapshots, select Mounted Snapshots .
    This page displays the original snapshot drive letters and its corresponding current drive letters. The original drive letters get assigned to the disk at the time of the snapshot. Mounted drive letters are on which the snapshotted disk is mounted right now.
    Figure. List of Mounted Snapshots Click to enlarge mounted snapshots list

    1. To detach a disk, click the disk label and click Unmount .
      You can unmount all the disks at once by clicking Select All and then clicking Unmount .
  9. To detach a disk, select the check box of the disk that you want to unmount and then from the Disk Action drop-down menu, select Unmount .
Restoring a File through Ngtcli (Windows VM)

After you install NGT in the Windows guest VM, you can restore the desired files from the VM through the ngtcli utility.

Before you begin

Ensure that you have configured your Windows VM to use NGT. For more information, see Installing NGT on Windows Machines in the Prism Web Console Guide .

About this task

To restore a file in Windows guest VMs by using ngtcli, perform the following.

Procedure

  1. Log in to the guest Windows VM by using administrator credentials.
  2. Open the command prompt as an administrator.
  3. Go to the ngtcli directory in Program Files > Nutanix .
    > cd c:\Program Files\Nutanix\ngtcli
    Tip:
    > python ngtcli.py
    creates a terminal with auto-complete.
  4. Run the ngtcli.cmd command.
  5. List the snapshots and virtual disks that are present for the guest VM.
    ngtcli> ssr ls-snaps

    The snapshot ID, disk labels, logical drives, and create time of the snapshot is displayed. You can use this information and take a decision to restore the files from the relevant snapshot that has the data.

    List the snapshots with a specific number.
    ngtcli> ssr ls-snaps snapshot-count=count_value

    Replace count_value with the number that you want to list.

  6. Attach the disk from the snapshots.
    ngtcli> ssr attach-disk disk-label=disk_label snapshot-id=snap_id

    Replace disk_label with the name of the disk that you want to attach.

    Replace snap_id with the snapshot ID of the disk that you want to attach.

    For example, to attach a disk with snapshot ID 16353 and disk label scsi0:1, type the folllowing command.

    ngtcli> ssr attach-disk snapshot-id=16353 disk-label=scsi0:1
    After successfully running the command, a new disk with label gets attached to the guest VM.
    Note: If sufficient logical drive letters are not present, bringing disks online action fails. In this case, you should detach the current disk, create enough free slots by detaching other self-service disks and reattach the disk again.
  7. Go to the attached disk label drive and restore the desired files.
  8. Detach a disk.
    ngtcli> ssr detach-disk attached-disk-label=attached_disk_label

    Replace attached_disk_label with the name of the disk that you want to attach.

    Note: If the disk is not removed by the guest VM administrator, the disk is automatically removed after 24 hours.
  9. View all the attached disks to the VM.
    ngtcli> ssr list-attached-disks
Self-Service Restore for Linux VMs

The Linux guest VM user with sudo privileges can restore the desired files from the VM through the web interface or by using the ngtcli utility.

Restoring a File through Web Interface (Linux VM)

After you install NGT in the Linux guest VM, you can restore the desired files from the VM through the web interface.

Before you begin

About this task

To restore a file in Linux guest VMs by using web interface, perform the following.

Procedure

  1. Log in to the guest Linux VM as a user with sudo privileges.
  2. Click the Nutanix SSR icon on the desktop.
  3. Type the root or sudo user credentials of the VM.
    The snapshots that are taken for that day is displayed. You also have an option to select the snapshots for the week, month, and the year. In addition, you can also define a custom range of dates and select the snapshot. For example, in the following figure snapshot taken on this month is displayed.
    Figure. Snapshot Selection Click to enlarge Select Snapshot

  4. Select the appropriate tab, This Week , This Month , This Year .
    You can also customize the selection by clicking Custom Range tab and selecting the date range in the From and To fields.
  5. Select the check box of the disks that you want to attach from the snapshot.
  6. Select Mount from the Disk Action drop-down menu.

    The selected disk or disks are mounted and the relevant disk label is displayed.

    Figure. Mounting of Disks Click to enlarge

  7. Go to the attached disk label partitions in the VM and restore the desired files.
    Note: If the disk gets updated between the snapshots, the restore process may not work as expected. If this scenario occurs, you need to contact support to help with the restore process.
  8. To view the list of all the mounted snapshots, select Mounted Snapshots .
    This page displays the original snapshot drive letters and its corresponding current drive letters. The original drive letters gets assigned to the disk at the time of the snapshot. Mounted drive letters are on which the snapshotted disk is mounted right now.
    Figure. List of Mounted Snapshots Click to enlarge

    1. To detach a disk, click the disk label and click Unmount .
      You can unmount all the disks at once by clicking Select All and then clicking Unmount .
  9. To detach a disk, select the check box of the disk that you want to unmount and then from the Disk Action drop-down menu, select Unmount .
Restoring a File through Ngtcli (Linux VM)

After you install NGT in the Linux guest VM, you can restore the desired files from the VM through the ngtcli utility.

Before you begin

About this task

To restore a file in Linux guest VMs by using ngtcli, perform the following.

Procedure

  1. Log in to the guest Linux VM with sudo or root user credentials.
  2. Go to the ngtcli directory.
    > cd /usr/local/nutanix/ngt/ngtcli
  3. Run the python ngtcli.py command.
    Tip: This command creates a terminal with auto-complete.
  4. List the snapshots and virtual disks that are present for the guest VM.
    ngtcli> ssr ls-snaps

    The snapshot ID, disk labels, logical drives, and create time of the snapshot is displayed. You can use this information and take a decision to restore the files from the relevant snapshot that has the data.

    To list the snapshots with a specific number.
    ngtcli> ssr ls-snaps snapshot-count=count_value

    Replace count_value with the number that you want to list.

  5. Attach the disk from the snapshots.
    ngtcli> ssr attach-disk disk-label=disk_label snapshot-id=snap_id

    Replace disk_label with the name of the disk that you want to attach.

    Replace snap_id with the snapshot ID of the disk that you want to attach.

    For example, to attach a disk with snapshot ID 1343 and disk label scsi0:2,

    ngtcli> ssr attach-disk snapshot-id=1343 disk-label=scsi0:2

    After successfully running the command, a new disk with new label is attached to the guest VM.

  6. Go to the attached disk label partition and restore the desired files.
    Note: If the disk gets updated between the snapshots, the restore process may not work as expected. If this scenario occurs, you need to contact support to help with the restore process.
  7. Detach a disk.
    ngtcli> ssr detach-disk attached-disk-label=attached_disk_label

    Replace attached_disk_label with the name of the disk that you want to attach.

    For example, to remove the disk with disk label scsi0:3, type the following command.

    ngtcli> ssr detach-disk attached-disk-label=scsi0:3
    Note: If the disk is not removed by the guest VM administrator, the disk is automatically removed after 24 hours.
  8. View all the attached disks to the VM.
    ngtcli> ssr list-attached-disks

Protection with NearSync Replication Schedule and DR (Leap)

NearSync replication enables you to protect your guest VMs with an RPO of as low as 1 minute. A protection policy with a NearSync replication creates a recovery point in a minutely time interval (between 1–15 minutes), and replicates it to the recovery availability zones (sites) for High Availability. For guest VMs protected with NearSync replication schedule, you can perform disaster recovery (DR) to a different Nutanix cluster at same or different sites. In addition to DR to Nutanix clusters of the same hypervisor type, you can also perform cross-hypervisor disaster recovery (CHDR)—disaster recovery from AHV clusters to ESXi clusters, or from ESXi clusters to AHV clusters.

Note: Nutanix provides multiple DR solutions to secure your environment. See Nutanix Disaster Recovery Solutions for the detailed representation of the DR offerings of Nutanix.

The following are the advantages of protecting your guest VMs with a NearSync replication schedule.

  • Protection for the mission-critical applications. Securing your data with minimal data loss if there is a disaster, and providing you with more granular control during the recovery process.
  • No minimum network latency or distance requirements.
  • Low stun time for guest VMs with heavy I/O applications.

    Stun time is the time of application freeze when the recovery point is taken.

  • Allows resolution to a disaster event in minutes.

To implement the NearSync feature, Nutanix has introduced a technology called lightweight snapshots (LWSs). LWS recovery points are created at the metadata level only, and they continuously replicate incoming data generated by workloads running on the active clusters. LWS recovery points are stored in the LWS store, which is allocated on the SSD tier. When you configure a protection policy with a NearSync replication schedule, the system allocates the LWS store automatically.

Note: The maximum LWS store allocation for each node is 360 GB. For the hybrid systems, it is 7% of the SSD capacity on that node.

Transitioning in and out of NearSync

When you create a NearSync replication schedule, the schedule remains an hourly schedule until its transition into a minutely schedule is complete.

To transition into NearSync (minutely) replication schedule, initial seeding of the recovery site with the data is performed, the recovery points are taken on an hourly basis, and replicated to the recovery site. After the system determines that the recovery points containing the seeding data have replicated within a specified amount of time (default is an hour), the system automatically transitions the replication schedule into NearSync schedule depending on the bandwidth and the change rate. After you transition into the NearSync replication schedule, you can see the configured minutely recovery points in the web interface.

The following are the characteristics of the process.

  • Until you are transitioned into NearSync replication schedule, you can see only the hourly recovery points in Prism Central.
  • If for any reason, a guest VM transitions out of NearSync replication schedule, the system raises alerts in the Alerts dashboard, and the minutely replication schedule transitions out to the hourly replication schedule. The system continuously tries to get into the minutely replication schedule that you have configured. If the transition is successful, the replication schedule automatically transitions back into NearSync, and alerts specific to this condition are raised in the Alerts dashboard.

To transition out of the NearSync replication schedule, you can do one of the following.

  • Delete the NearSync replication schedule that you have configured.
  • Update the NearSync replication schedule to use an hourly RPO.
  • Unprotect the guest VMs.
    Note: There is no transitioning out of the NearSync replication schedule on the addition or deletion of a guest VM.

Repeated transitioning in and out of NearSync replication schedule can occur because of the following reasons.

  • LWS store usage is high.
  • The change rate of data is high for the available bandwidth between the primary and the recovery sites.
  • Internal processing of LWS recovery points is taking more time because the system is overloaded.

Retention Policy

Depending on the RPO (1–15 minutes), the system retains the recovery points for a specific time period. For a NearSync replication schedule, you can configure the retention policy for days, weeks, or months on both the primary and recovery sites instead of defining the number of recovery points you want to retain. For example, if you desire an RPO of 1 minute and want to retain the recovery points for 5 days, the retention policy works in the following way.

  • For every 1 minute, a recovery point is created and retained for a maximum of 15 minutes.
    Note: The recent 15 recovery points are only visible in Prism Central and are available for the recovery operation.
  • For every hour, a recovery point is created and retained for 6 hours.
  • One daily recovery point is created and retained for 5 days.

You can also define recovery point retention in weeks or months. For example, if you configure a 3-month schedule, the retention policy works in the following way.

  • For every 1 minute, a recovery point is created and retained for 15 minutes.
  • For every hour, a recovery point is created and retained for 6 hours.
  • One daily recovery point is created and retained for 7 days.
  • One weekly recovery point is created and retained for 4 weeks.
  • One monthly recovery point is created and retained for 3 months.
Note:
  • You can define different retention policies on the primary and recovery sites.
  • The system retains subhourly and hourly recovery points for 15 minutes and 6 hours respectively. Maximum retention time for days, weeks, and months is 7 days, 4 weeks, and 12 months respectively.
  • If you change the replication schedule from an hourly schedule to a minutely schedule (Asynchronous to NearSync), the first recovery point is not created according to the new schedule. The recovery points are created according to the start time of the old hourly schedule (Asynchronous). If you want to get the maximum retention for the first recovery point after modifying the schedule, update the start time accordingly for NearSync.

NearSync Replication Requirements (Leap)

The following are the specific requirements for protecting your guest VMs with NearSync replication schedule. Ensure that you meet the following requirements in addition to the general requirements of Leap.

For more information about the general requirements of Leap, see Leap Requirements.

For information about node, disk and Foundation configurations required to support NearSync replication schedules, see On-Prem Hardware Resource Requirements.

Hypervisor Requirements

AHV or ESXi

  • The AHV clusters must be running on version 20190916.189 or newer.
  • The ESXi clusters must be running on version ESXi 6.5 GA or newer.

Nutanix Software Requirements

Each on-prem site must have a Leap enabled Prism Central instance.

The primary and recovery Prism Centrals and their registered Nutanix clusters must be running the following versions of AOS.

  • AOS 5.17.1 or newer for DR to different Nutanix clusters at the same site.
  • AOS 5.17 or newer for DR to Nutanix clusters at the different sites.

Cross Hypervisor Disaster Recovery (CHDR) Requirements

Guest VMs protected with NearSync replication schedule support cross-hypervisor disaster recovery. You can perform failover (DR) to recover guest VMs from AHV clusters to ESXi clusters or guest VMs from ESXi clusters to AHV clusters by considering the following requirements.

  • Both the primary and the recovery Nutanix clusters must be running AOS 5.18 or newer.
  • Install and configure Nutanix Guest Tools (NGT) on all the guest VMs. For more information, see Enabling and Mounting Nutanix Guest Tools in Prism Web Console Guide .

    NGT configures the guest VMs with all the required drivers for VM portability. For more information about general NGT requirements, see Nutanix Guest Tools Requirements and Limitations in Prism Web Console Guide .

  • CHDR supports guest VMs with flat files only.
  • CHDR supports IDE/SCSI disks only.
    Tip: From AOS 5.19.1, CHDR supports SATA disks also.
  • For all the non-boot SCSI disks of Windows guest VMs, set the SAN policy to OnlineAll so that they come online automatically.
  • In vSphere 6.7, guest VMs are configured with UEFI secure boot by default. Upon CHDR to an AHV cluster, these guest VMs do not start if the host does not support the UEFI secure boot feature. For more information about supportability of UEFI secure boot on Nutanix clusters, see the Compatibility Matrix.

  • For information about operating systems that support UEFI and Secure Boot, see UEFI and Secure Boot Support for CHDR.

  • Nutanix does not support vSphere inventory mapping (for example, VM folder and resource pools) when protecting workloads between VMware clusters.

  • Nutanix does not support vSphere snapshots or delta disk files. If you have delta disks attached to a VM and you proceed with failover, you get a validation warning and the VM does not recover. Contact Nutanix Support for assistance.
Table 1. Operating Systems Supported for CHDR (Asynchronous Replication)
Operating System Version Requirements and limitations
Windows
  • Windows 2008 R2 or newer versions
  • Windows 7 or newer versions
  • Only 64-bit operating systems are supported.
Linux
  • CentOS 6.5 and 7.0
  • RHEL 6.5 or newer and RHEL 7.0 or newer.
  • Oracle Linux 6.5 and 7.0
  • Ubuntu 14.04
  • SLES operating system is not supported.

Additional Requirements

  • Both the primary and the recovery Nutanix clusters must be of minimum three-nodes.
  • The recovery site container must have as much space as the protected VMs working size set of the primary site. For example, if you are protecting a VM that is using 30 GB of space on the container of the primary site, the same amount of space is required on the recovery site container.

NearSync Replication Limitations (Leap)

Consider the following specific limitations before protecting your guest VMs with NearSync replication schedule. These limitations are in addition to the general limitations of Leap.

For information about the general limitations of Leap, see Leap Limitations.

  • All files associated with the VMs running on ESXi must be located in the same folder as the VMX configuration file. The files not located in the same folder as the VMX configuration file might not recover on a recovery cluster. On recovery, the guest VM with such files fails to start with the following error message. Operation failed: InternalTaskCreationFailure: Error creating host specific VM change power state task. Error: NoCompatibleHost: No host is compatible with the virtual machine
  • Deduplication enabled on storage containers having guest VMs protected with NearSync replication schedule lowers the replication speed.
  • Cross hypervisor disaster recovery (CHDR) does not preserve hypervisor-specific properties (for example, multi-writer flags, independent persistent and non-persistent disks, changed block tracking (CBT), PVSCSI disk configurations).

  • On CHDR, NearSync replication schedules do not support retrieving recovery points from the recovery sites.

    For example, if you have 1 day retention at the primary site and 5 days retention at the recovery site, and you want to go back to a recovery point from 5 days ago. NearSync replication schedule does not support replicating 5 days retention back from the recovery site to the primary site.

Creating a Protection Policy with a NearSync Replication Schedule (Leap)

To protect the guest VMs in a minutely replication schedule, configure a NearSync replication schedule while creating the protection policy. The policy takes recovery points of the protected guest VMs in the specified time intervals (1–15 minutes) and replicates them to the recovery availability zone (site) for High Availability. To maintain the efficiency of minutely replication, the protection policy allows you to configure a NearSync replication schedule to only one recovery site. When creating a protection policy, you can specify only VM categories. If you want to include VMs individually, you must first create the protection p policy—which can also include VM categories and then include the VMs individually in the protection policy from the VMs page.

Before you begin

Ensure that the primary and the recovery AHV or ESXi clusters at the same or different sites are NearSync capable. A cluster is NearSync capable if the capacity of each SSD in the cluster is at least 1.2 TB.

See NearSync Replication Requirements (Leap) and NearSync Replication Limitations (Leap) before you start.

About this task

To create a protection policy with a NearSync replication schedule, do the following at the primary site. You can also create a protection policy at the recovery site. Protection policies you create or update at a recovery site synchronize back to the primary site.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Policies > Protection Policies in the left pane.
    Figure. Protection Policy Configuration: Protection Policies
    Click to enlarge Protection Policy Configuration: Protection Policies

  3. Click Create Protection Policy .
    Specify the following information in the Create Protection Policy window.
    Figure. Protection Policy Configuration: Select Primary Location
    Click to enlarge Protection Policy Configuration: Select Primary Location

    1. Policy name : Enter a name for the protection policy.
      Caution: The name can be of only alphanumeric, dot, dash, and underscore characters.
    2. In the Primary Location pane, specify the following information.
        1. Location : From the drop-down list, select an availability zone (site) that hosts the guest VMs to protect.

          The drop-down lists all the sites paired with the local site. Local AZ represents the local site (Prism Central). For your primary site, you can check either the local site or a non-local site.

        2. Cluster : From the drop-down list, select the cluster that hosts the guest VMs to protect.

          The drop-down lists all the Nutanix clusters registered to Prism Central representing the selected site. If you want to protect the guest VMs from multiple Nutanix clusters in the same protection policy, select the clusters that host those guest VMs. All Clusters protects the guest VMs of all Nutanix clusters registered to Prism Central.

        3. Click Save .

          Clicking Save activates the Recovery Location pane. After saving the primary site configuration, you can optionally add a local schedule to retain the recovery points at the primary site.

        4. Click + Add Local Schedule if you want to retain recovery points locally in addition to retaining recovery points in a replication schedule (step d.iv). For example, you can create a local schedule to retain 15 minute recovery points locally and also an hourly replication schedule to retain recovery points and replicate them to a recovery site every 2 hours. The two schedules apply differently on the guest VMs.

          Specify the following information in the Add Schedule window.

          Figure. Protection Policy Configuration: Add Local Schedule
          Click to enlarge Protection Policy Configuration: Add Local Schedule

          1. Take Snapshot Every : Specify the frequency in minutes , hours , days , or weeks at which you want the recovery points to be taken locally.
          2. Retention Type : Specify one of the following two types of retention policy.
            • Linear : Implements a simple retention scheme at the local site. If you set the retention number to n, the local site retains the n recent recovery points.

              When you enter the frequency in minutes, the system selects the Roll-up retention type by default because minutely recovery points do not support Linear retention types.

            • Roll-up : Rolls up the recovery points into a single recovery point at the local site.

              For more information about the roll-up recovery points, see step d.iii.

          3. Retention on Local AZ:PE_A3_AHV : Specify the retention number for the local site.
          4. If you want to take application-consistent recovery points, check Take App-Consistent Recovery Point .

            Irrespective of the local or replication schedules, the recovery points are of the specified type. If you check Take App-Consistent Recovery Point , the recovery points generated are application-consistent and if you do not check Take App-Consistent Recovery Point , the recovery points generated are crash-consistent. If the time in the local schedule and the replication schedule match, the single recovery point generated is application-consistent.

            Note: See Application-consistent Recovery Point Conditions and Limitations before you take application-consistent snapshot.
          5. Click Save Schedule .
    3. In the Recovery Location pane, specify the following information.
      Figure. Protection Policy Configuration: Select Recovery Location
      Click to enlarge Protection Policy Configuration: Select Recovery Location

        1. Location : From the drop-down list, select the availability zone (site) where you want to replicate the recovery points.

          The drop-down lists all the sites paired with the local site. Local AZ represents the local site (Prism Central). Select Local AZ if you want to configure DR to a different Nutanix cluster at the same site.

          If you do not select a site, local recovery points that are created by the protection policy do not replicate automatically. You can, however, replicate the recovery points manually and use recovery plans to recover the guest VMs. For more information, see Manual Disaster Recovery (Leap).

        2. Cluster : From the drop-down list, select the cluster where you want to replicate the recovery points.

          The drop-down lists all the Nutanix clusters registered to Prism Central representing the selected site. You can select one cluster at the recovery site. To maintain the efficiency of minutely replication, a protection policy allows you to configure only one recovery site for a NearSync replication schedule. However, you can add another Asynchronous replication schedule for replicating recovery points to the same or different sites. For more information to add another recovery site with a replication schedule, see step e.

          Note: Selecting auto-select from the drop-down list replicates the recovery points to any available cluster at the recovery site. Select auto-select from the drop-down list only if all the clusters at the recovery site are NearSync capable and are up and running. A cluster is NearSync capable if the capacity of each SSD in the cluster is at least 1.2 TB. All-flash clusters do not have any specific SSD sizing requirements.

          Caution: If the primary Nutanix cluster contains an IBM POWER Systems server, you can replicate recovery points to an on-prem site only if that on-prem site contains an IBM Power Systems server.
        3. Click Save .

          Clicking Save activates the + Add Schedule button between the primary and the recovery site. After saving the recovery site configuration, you can optionally add a local schedule to retain the recovery points at the recovery site.

        4. Click + Add Local Schedule if you want to retain recovery points locally in addition to retaining recovery points in a replication schedule (step d.iv). For example, you can create a local schedule to retain one hourly recovery points locally to supplement the hourly replication schedule. The two schedules apply differently on the guest VMs after failover, when the recovery points replicate back to the primary site.

          Specify the following information in the Add Schedule window.

          Figure. Protection Policy Configuration: Add Local Schedule
          Click to enlarge Protection Policy Configuration: Add Local Schedule

          1. Take Snapshot Every : Specify the frequency in minutes , hours , days , or weeks at which you want the recovery points to be taken locally.
          2. Retention Type : Specify one of the following two types of retention policy.
            • Linear : Implements a simple retention scheme at the local site. If you set the retention number to n, the local site retains the n recent recovery points.

              When you enter the frequency in minutes, the system selects the Roll-up retention type by default because minutely recovery points do not support Linear retention types.

            • Roll-up : Rolls up the recovery points into a single recovery point at the local site.

              For more information about the roll-up recovery points, see step d.iii.

          3. Retention on 10.xx.xx.xxx:PE_C1_AHV : Specify the retention number for the local site.
          4. If you want to take application-consistent recovery points, check Take App-Consistent Recovery Point .

            Irrespective of the local or replication schedules, the recovery points are of the specified type. If you check Take App-Consistent Recovery Point , the recovery points generated are application-consistent and if you do not check Take App-Consistent Recovery Point , the recovery points generated are crash-consistent. If the time in the local schedule and the replication schedule match, the single recovery point generated is application-consistent.

          5. Click Save Schedule .
    4. Click + Add Schedule to add a replication schedule between the primary and the recovery site.

      Specify the following information in the Add Schedule window. The window auto-populates the Primary Location and Recovery Location that you have selected in step b and step c.

      Figure. Protection Policy Configuration: Add Schedule (NearSync)
      Click to enlarge Protection Policy Configuration: Add Schedule (NearSync)

        1. Protection Type : Click Asynchronous .
        2. Take Snapshot Every : Specify the frequency in minutes (anywhere between 1-15 minutes) at which you want the recovery points to be taken.

          The specified frequency is the RPO. For more information about RPO, see Leap Terminology.

        3. Retention Type : When you enter the frequency in minutes in step ii, the system selects the Roll-up retention type by default because NearSync replication schedules do not support Linear retention types.
          Roll-up retention type rolls up the recovery points as per the RPO and retention period into a single recovery point at a site. For example, if you set the RPO to 1 hour, and the retention time to 5 days, the 24 oldest hourly recovery points roll up into a single daily recovery point (one recovery point = 24 hourly recovery points) after every 24 hours. The system keeps one day (of rolled-up hourly recovery points) and 4 days of daily recovery points.
          Note:
          • If the retention period is n days, the system keeps 1 day of RPO (rolled-up hourly recovery points) and n-1 days of daily recovery points.
          • If the retention period is n weeks, the system keeps 1 day of RPO, 1 week of daily and n-1 weeks of weekly recovery points.
          • If the retention period is n months, the system keeps 1 day of RPO, 1 week of daily, 1 month of weekly, and n-1 months of monthly recovery points.
          • If the retention period is n years, the system keeps 1 day of RPO, 1 week of daily, 1 month of weekly, and n-1 months of monthly recovery points.
          Note: The recovery points that are used to create a rolled-up recovery point are discarded.
          Tip: Use roll-up retention policies for anything with a longer retention period. Roll-up policies are more flexible and automatically handle recovery point aging/pruning while still providing granular RPOs for the first day.
        4. To specify the retention number for the primary and recovery sites, do the following.
          • Retention on Local AZ: PE_A3_AHV : Specify the retention number for the primary site.

            This field is unavailable if you do not specify a recovery location.

          • Retention on 10.xx.xx.xxx:PE_C1_AHV : Specify the retention number for the recovery site.
        5. If you want to enable reverse retention of the recovery points, check Reverse retention for VMs on recovery location .

          Reverse retention maintains the retention numbers of recovery points even after failover to a recovery site in the same or different availability zones. For example, if you retain two recovery points at the primary site and three recovery points at the recovery site, and you enable reverse retention, a failover event does not change the initial retention numbers when the recovery points replicate back to the primary site. The recovery site still retains two recovery points while the primary site retains three recovery points. If you do not enable reverse retention, a failover event changes the initial retention numbers when the recovery points replicate back to the primary site. The recovery site retains three recovery points while the primary site retains two recovery points.

          Maintaining the same retention numbers at a recovery site is required if you want to retain a particular number of recovery points, irrespective of where the guest VM is after its failover.

        6. If you want to take application-consistent recovery points, check Take App-Consistent Recovery Point .

          Application-consistent recovery points ensure that application consistency is maintained in the replicated recovery points. For application-consistent recovery points, install NGT on the guest VMs running on AHV clusters. For guest VMs running on ESXi clusters, you can take application-consistent recovery points without installing NGT, but the recovery points are hypervisor-based, and leads to VM stuns (temporary unresponsive VMs) after failover to the recovery sites.

          Caution: Application-consistent recovery points fail for EFI-boot enabled Windows 2019 VM running on ESXi when NGT is not installed. Nutanix recommends installing NGT on guest VMs running on ESXi also.
        7. Click Save Schedule .
    5. Click + Add Recovery Location if you want to add an additional recovery site for the guest VMs in the protection policy.
      • To add an on-prem site for recovery, see Protection and DR between On-Prem Sites (Leap)
      • To add Xi Cloud Services for recovery, see Protection and DR between On-Prem Site and Xi Cloud Service (Xi Leap).
      Figure. Protection Policy Configuration: Additional Recovery Location Click to enlarge Protection Policy Configuration: Additional Recovery Location

    6. Click + Add Schedule to add a replication schedule between the primary site and the additional recovery site you specified in step e.

      The Add Schedule window shows that auto-populates the Primary Location and the additional Recovery Location . Perform step d again to add the replication schedule.

      By default, recovery point creation begins immediately after you create the protection policy. If you want to specify when recovery point creation must begin, click Immediately at the top-right corner, and then, in the Start Time dialog box, do the following.

        1. Click Start protection at specific point in time .
        2. Specify the time at which you want to start taking recovery points.
        3. Click Save .
    7. Click Next .
      Clicking Next shows a list of VM categories where you can optionally check one or more VM categories to protect in the protection policy. DR configurations using Leap allows you to protect a guest VM by using only one protection policy. Therefore, VM categories specified in another protection policy are not in the list. If you protect a guest VM in another protection policy by specifying the VM category of the guest VM (category-based inclusion), and if you protect the guest VM from the VMs page in this policy (individual inclusion), the individual inclusion supersedes the category-based inclusion. Effectively, only the protection policy that protected the individual guest VM protects the guest VM.

      For example, the guest VM VM_SherlockH is in the category Department:Admin , and you add this category to the protection policy named PP_AdminVMs . Now, if you add VM_SherlockH from the VMs page to another protection policy named PP_VMs_UK , VM_SherlockH is protected in PP_VMs_UK and unprotected from PP_AdminVMs .

    8. If you want to protect the guest VMs category wise, check the VM categories that you want to protect from the list and click Add .
      Figure. Protection Policy Configuration: Add VM Categories Click to enlarge Protection Policy Configuration: Add VM Categories

      Prism Central includes built-in VM categories for frequently encountered applications (for example, MS Exchange and Oracle). If the VM category or value you want is not available, first create the category with the required values, or update an existing category so that it has the values you require. Doing so ensures that the VM categories and values are available for selection. You can add VMs to the category either before or after you configure the protection policy. If the guest VMs have a common characteristic, such as belonging to a specific application or location, create a VM category and add the guest VMs into the category.

      If you do not want to protect the guest VMs category wise, proceed to the next step without checking VM categories. You can add the guest VMs individually to the protection policy later from the VMs page (see Adding Guest VMs individually to a Protection Policy).

    9. Click Create .
      The protection policy with a NearSync replication schedule is created. To verify the protection policy, see the Protection Policies page. If you check VM categories in step h, the protection policy starts generating recovery points of the guest VMs in those VM categories. To see the generated recovery points, click the hamburger icon at the top-left corner of the window and go to VM Recovery Points . Click the recovery points for its information. You can see the time estimated for the very first replication (seeding) to the recovery sites.
      Figure. Recovery Points Overview Click to enlarge Recovery Points Overview

    Tip: DR using Leap with a NearSync replication schedule also allows you to recover the data of the minute just before the unplanned failover. For example, with a 10 minutely protection policy, you can use the internal lightweight snapshots (LWS) to recover the data of the ninth minute when there is an unplanned failover.

Creating a Recovery Plan (Leap)

To orchestrate the failover of the protected guest VMs to the recovery site, create a recovery plan. After a failover, a recovery plan recovers the protected guest VMs to the recovery site. If you have configured two recovery sites in a protection policy, create two recovery plans for DR—one for recovery to each recovery site. The recovery plan synchronizes continuously to the recovery site in a bidirectional way.

For more information about creating a recovery plan, see Creating a Recovery Plan (Leap).

Failover and Failback Operations (Leap)

You can perform test failover, planned failover, and unplanned failover of the guest VMs protected with NearSync replication schedule across different Nutanix clusters at the same or different on-prem availability zone (site). The steps to perform test, planned, and unplanned failover are largely the same irrespective of the replication schedules that protect the guest VMs.

Refer Failover and Failback Management for test, planned, and unplanned failover procedures.

Protection with Synchronous Replication Schedule (0 RPO) and DR

Synchronous replication enables you to protect your guest VMs with a zero recovery point objective (0 RPO). A protection policy with Synchronous replication schedule replicates all the writes on the protected guest VMs synchronously to the recovery availability zone (sites) for High Availability. The policy also takes recovery points of those protected VMs every 6 hours—the first snapshot is taken immediately—for raw node (HDD+SSD) size up to 120 TB. Since the replication is synchronous, the recovery points are crash-consistent only. For guest VMs (AHV) protected with Synchronous replication schedule, you can perform DR only to an AHV cluster at the same or different site. Replicating writes synchronously and also generating recovery points helps to eliminate data losses due to:

  • Unplanned failure events (for example, natural disasters and network failure).
  • Planned failover events (for example, scheduled maintenance).

Nutanix recommends that the round-trip latency (RTT) between AHV clusters be less than 5 ms for optimal performance of Synchronous replication schedules. Maintain adequate bandwidth to accommodate peak writes and have a redundant physical network between the clusters.

To perform the replications synchronously yet efficiently, the protection policy limits you to configure only one recovery site if you add a Synchronous replication schedule. If you configure Synchronous replication schedule for a guest VM, you cannot add an Asynchronous or NearSync schedule to the same guest VM. Similarly, if you configure an Asynchronous or a NearSync replication schedule, you cannot add a Synchronous schedule to the same guest VM.

If you unpair the sites while the guest VMs in the Nutanix clusters are still in synchronization, the Nutanix cluster becomes unstable. Therefore, disable Synchronous replication and clear stale stretch parameters if any on both the primary and recovery Prism Element before unpairing the sites. For more information about disabling Synchronous replication, see Synchronous Replication Management.

Note: Nutanix provides multiple disaster recovery (DR) solutions to secure your environment. See Nutanix Disaster Recovery Solutions for the detailed representation of the DR offerings of Nutanix.

Synchronous Replication Requirements

The following are the specific requirements for protecting your AHV guest VMs with Synchronous replication schedule. Ensure that you meet the following requirements in addition to the general requirements of Leap.

For information about the general requirements of Leap, see Leap Requirements.

For information about node, disk and Foundation configurations required to support Synchronous replication schedules, see On-Prem Hardware Resource Requirements.

Hypervisor Requirements

AHV

The AHV clusters must be running on version 20190916.189 or newer.

Note: Synchronous replication schedules support only AHV.

Nutanix Software Requirements

  • Each on-prem availability zone (AZ) must have a Leap enabled Prism Central instance.

    The primary and recovery Nutanix Clusters can be registered with a single Prism Central instance or each can be registered with different Prism Central instances.

  • The primary and recovery Prism Central and Prism Element on the registered Nutanix clusters must be running on the same AOS version.
    • AOS 5.17 or newer.
    • AOS 5.17.1 or newer to support Synchronous replications of UEFI secure boot enabled guest VMs.

    • AOS 5.19.2 or newer for DR to an AHV cluster in the same AZ (registered to the same Prism Central). For DR to an AHV cluster in the same AZ, Prism Central must be running version 2021.3 or newer.

Additional Requirements

  • For optimal performance, maintain the round trip latency (RTT) between Nutanix clusters to less than 5 ms. Also, maintain adequate bandwidth to accommodate peak writes and have a redundant physical network between the clusters.
  • The storage container name of the protected guest VMs must be the same on both the primary and recovery clusters. Therefore, a storage container must exist on the recovery cluster with the same name as the one on the primary cluster. For example, if the protected guest VMs are in the SelfServiceContainer storage container on the primary cluster, there must also be a SelfServiceContainer storage container on the recovery cluster.
  • For hardware and Foundation configurations required to support Synchronous replication schedules, see On-Prem Hardware Resource Requirements.

  • The clusters on the primary site and the recovery site communicate over the ports 2030, 2036, 2073, and 2090. Ensure that these ports have open access between both the primary and the recovery clusters (Prism Element). For the complete list of required ports, see Port Reference.
  • If the primary and the recovery clusters (Prism Element) are in different subnets, open the ports manually for communication.
    Tip: If the primary and the recovery clusters (Prism Element) are in the same subnet, you need not open the ports manually.
    • To open the ports for communication to the recovery cluster, run the following command on all CVMs of the primary cluster.

      nutanix@cvm$ allssh 'modify_firewall -f -r remote_cvm_ip,remote_virtual_ip -p 2030,2036,2073,2090 -i eth0'

      Replace remote_cvm_ip with the IP address of the recovery cluster CVM. If there are multiple CVMs, replace remote_cvm_ip with the IP addresses of the CVMs separated by comma.

      Replace remote_virtual_ip with the virtual IP address of the recovery cluster.

    • To open the ports for communication to the primary cluster, run the following command on all CVMs of the recovery cluster.

      nutanix@cvm$ allssh 'modify_firewall -f -r source_cvm_ip,source_virtual_ip -p 2030,2036,2073,2090 -i eth0'

      Replace source_cvm_ip with the IP address of the primary cluster CVM. If there are multiple CVMs, replace source_cvm_ip with the IP addresses of the CVMs separated by comma.

      Replace source_virtual_ip with the virtual IP address of the primary cluster.

    Note: Use the eth0 interface only. eth0 is the default CVM interface that shows up when you install AOS.

Synchronous Replication Limitations

Consider the following specific limitations before protecting your guest VMs with Synchronous replication schedule. These limitations are in addition to the general limitations of Leap.

For information about the general limitations of Leap, see Leap Limitations.

  • You cannot restore guest VMs with incompatible GPUs at the recovery cluster.
  • You cannot protect guest VMs configured as part of a network function chain.
  • You cannot protect guest VMs with affinity policies.
  • You cannot resize a guest VM disk while the guest VM is in replication. See KB-9986 for more information.

Creating a Protection Policy with the Synchronous Replication Schedule (Leap)

To protect the guest VMs in an instant replication schedule, configure a Synchronous replication schedule while creating the protection policy. The policy replicates all the writes on the protected guest VMs synchronously to the recovery availability zone (site) for High Availability. For a raw node (HDD+SSD) size up to 120 TB, the policy also takes crash-consistent recovery points of those guest VMs every 6 hours and replicates them to the recovery site—the first snapshot is taken immediately. To maintain the efficiency of synchronous replication, the protection policy allows you to add only one recovery site for the protected VMs. When creating a protection policy, you can specify only VM categories. If you want to protect guest VMs individually, you must first create the protection policy—which can also include VM categories, and then include the guest VMs individually in the protection policy from the VMs page.

Before you begin

See Synchronous Replication Requirements and Synchronous Replication Limitations before you start.

About this task

To create a protection policy with the Synchronous replication schedule, do the following at the primary site. You can also create a protection policy at the recovery site. Protection policies you create or update at a recovery site synchronize back to the primary site.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Policies > Protection Policies in the left pane.
    Figure. Protection Policy Configuration: Protection Policies
    Click to enlarge Protection Policy Configuration: Protection Policies

  3. Click Create Protection Policy .
    Specify the following information in the Create Protection Policy window.
    Figure. Protection Policy Configuration: Select Primary Location
    Click to enlarge Protection Policy Configuration: Select Primary Location

    1. Policy name : Enter a name for the protection policy.
      Caution: The name can be of only alphanumeric, dot, dash, and underscore characters.
    2. In the Primary Location pane, specify the following information.
        1. Location : From the drop-down list, select an availability zone (site) that hosts the guest VMs to protect.

          The drop-down lists all the sites paired with the local site. Local AZ represents the local site (Prism Central). For your primary site, you can check either the local site or a non-local site.

        2. Cluster : From the drop-down list, select the AHV cluster that hosts the VMs to protect.

          The drop-down lists all the Nutanix clusters registered to Prism Central representing the selected site. If you want to protect the guest VMs from multiple AHV clusters in the same protection policy, select the AHV clusters that host those guest VMs. All Clusters protects the guest VMs of all Nutanix clusters registered to Prism Central. Select All Clusters only if all the clusters are running AHV.

        3. Click Save .

          Clicking Save activates the Recovery Location pane. Do not add a local schedule to retain the recovery points locally. To maintain the replication efficiency, Synchronous replication allows only the replication schedule. If you add a local schedule, you cannot click Synchronous in step d.

    3. In the Recovery Location pane, specify the following information.
      Figure. Protection Policy Configuration: Select Recovery Location
      Click to enlarge Protection Policy Configuration: Select Recovery Location

        1. Location : From the drop-down list, select the availability zone (site) where you want to replicate the recovery points.

          The drop-down lists all the sites paired with the local site. Local AZ represents the local site (Prism Central). Select Local AZ if you want to configure DR to a different AHV cluster at the same site.

          If you do not select a site, local recovery points that are created by the protection policy do not replicate automatically. You can, however, replicate the recovery points manually and use recovery plans to recover the guest VMs. For more information, see Manual Disaster Recovery (Leap).

        2. Cluster : From the drop-down list, select the AHV cluster where you want to replicate the guest VM writes synchronously and recovery points.

          The drop-down lists all the Nutanix clusters registered to Prism Central representing the selected site. You can select one AHV cluster at the recovery site. Do not select an ESXi cluster because DR configurations using Leap support only AHV cluster. If you select an ESXi cluster and configure a Synchronous replication schedule, replications fail.

          Note: Selecting auto-select from the drop-down menu replicates the recovery points to any available cluster at the recovery site. Select auto-select from the drop-down list only if all the clusters at the recovery site are running on AHV and are up and running.
        3. Click Save .

          Clicking Save activates the + Add Schedule button between the primary and the recovery site. Do not add a local schedule to retain the recovery points locally. To maintain the replication efficiency, Synchronous replication allows only the replication schedule. If you add a local schedule, you cannot click Synchronous in step d.

    4. Click + Add Schedule to add a replication schedule between the primary and the recovery site.

      Specify the following information in the Add Schedule window. The window auto-populates the Primary Location and Recovery Location that you have selected in step b and step c.

      Figure. Protection Policy Configuration: Add Schedule (Synchronous)
      Click to enlarge Protection Policy Configuration: Add Schedule (Synchronous)

        1. Protection Type : Click Synchronous .
        2. Failure Handling : Select one of the following options to handle failure. For example, if the connection between the primary and the recovery site breaks and VM writes on the primary cluster stops replicating.
          • Manual : Select this option if you want to resume the VM writes on the primary site only when you manually disable Synchronous replication.
          • Automatic : Select this option to resume VM writes on the primary site automatically after the specified Timeout after seconds.
            Note: The minimum timeout is 10 seconds.
        3. Click Save Schedule .

          Clicking Save Schedule disables the + Add Recovery Location button at the top-right because to maintain the efficiency of synchronous replication, the policy allows you to add only one recovery site.

      By default, recovery point creation begins immediately after you create the protection policy. If you want to specify when recovery point creation must begin, click Immediately at the top-right corner, and then, in the Start Time dialog box, do the following.

        1. Click Start protection at specific point in time .
        2. Specify the time at which you want to start taking recovery points.
        3. Click Save .
    5. Click Next .
      Clicking Next shows a list of VM categories where you can optionally check one or more VM categories to protect in the protection policy. DR configurations using Leap allows you to protect a guest VM by using only one protection policy. Therefore, VM categories specified in another protection policy are not in the list. If you protect a guest VM in another protection policy by specifying the VM category of the guest VM (category-based inclusion), and if you protect the guest VM from the VMs page in this policy (individual inclusion), the individual inclusion supersedes the category-based inclusion. Effectively, only the protection policy that protected the individual guest VM protects the guest VM.

      For example, the guest VM VM_SherlockH is in the category Department:Admin , and you add this category to the protection policy named PP_AdminVMs . Now, if you add VM_SherlockH from the VMs page to another protection policy named PP_VMs_UK , VM_SherlockH is protected in PP_VMs_UK and unprotected from PP_AdminVMs .

    6. If you want to protect the guest VMs category wise, check the VM categories that you want to protect from the list and click Add .
      Prism Central includes built-in VM categories for frequently encountered applications (for example, MS Exchange and Oracle). If the VM category or value you want is not available, first create the category with the required values, or update an existing category so that it has the values you require. Doing so ensures that the VM categories and values are available for selection. You can add VMs to the category either before or after you configure the protection policy. If the guest VMs have a common characteristic, such as belonging to a specific application or location, create a VM category and add the guest VMs into the category.

      If you do not want to protect the guest VMs category wise, proceed to the next step without checking VM categories. You can add the guest VMs individually to the protection policy later from the VMs page (see Adding Guest VMs individually to a Protection Policy).

    7. Click Create .
      The protection policy with Synchronous replication schedule is created. To verify the protection policy, see the Protection Policies page. If you check VM categories in step f, the protection policy starts generating recovery points of the guest VMs in those VM categories. To see the generated recovery points, click the hamburger icon at the top-left corner of the window and go to VM Recovery Points . Click the recovery points for its information. You can see the time estimated for the very first replication (seeding) to the recovery sites.
      Figure. Recovery Points Overview Click to enlarge Recovery Points Overview

Creating a Recovery Plan (Leap)

To orchestrate the failover of the protected guest VMs to the recovery site, create a recovery plan. After a failover, a recovery plan recovers the protected guest VMs to the recovery site. If you have configured two recovery sites in a protection policy, create two recovery plans for DR—one for recovery to each recovery site. The recovery plan synchronizes continuously to the recovery site in a bidirectional way.

For more information about creating a recovery plan, see Creating a Recovery Plan (Leap).

Synchronous Replication Management

Synchronous replication instantly replicates all writes on the protected guest VMs to the recovery cluster. Replication starts when you configure a protection policy and add the guest VMs to protect. You can manage the replication by enabling, disabling, pausing, or resuming the Synchronous replication on the protected guest VMs from the Prism Central.

Enabling Synchronous Replication

When you configure a protection policy with Synchronous replication schedule and add guest VMs to protect, the replication is enabled by default. However, if you have disabled the Synchronous replication on a guest VM, you have to enable it to start replication.

About this task

To enable Synchronous replication on a guest VM, perform the following procedure at the primary availability zone (site). You can also perform the following procedure at the recovery site. The operations you perform at a recovery site synchronize back to the primary site.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Virtual Infrastructure > VMs > List in the left pane.
  3. Select the guest VMs on which you want to enable Synchronous replication.
  4. Click Protect from the Actions drop-down menu.
  5. Select the protection policy in the table to include the guest VMs in the protection policy.
  6. Click Protect .
Pausing Synchronous Replication

The protected guest VMs on the primary cluster stop responding when the recovery cluster is disconnected abruptly (for example, due to network outage or internal service crash). To come out of the unresponsive state, you can pause Synchronous replication on the guest VMs. Pausing Synchronous replication temporarily suspends the replication state of the guest VMs without completely disabling the replication relationship.

About this task

To pause Synchronous replication on a guest VM, perform the following procedure.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Virtual Infrastructure > VMs > List in the left pane.
  3. Select the guest VMs on which want to pause the Synchronous replication.
  4. Click Pause Synchronous Replication from the Actions drop-down menu.
Resuming Synchronous Replication

You can resume the Synchronous replication that you had paused to come out of the unresponsive state of the primary cluster. Resuming Synchronous replication restores the replication status and reconciles the state of the guest VMs. To resume Synchronous replication on a guest VM, perform the following procedure.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Virtual Infrastructure > VMs > List in the left pane.
  3. Select the guest VMs on which want to resume Synchronous replication.
  4. Click Resume Synchronous Replication from the Actions drop-down menu.

Failover and Failback Operations (Leap)

You can perform test failover, planned failover, and unplanned failover of the guest VMs protected with Synchronous replication schedule across the AHV clusters at the different on-prem availability zone (site). The steps to perform test, planned, and unplanned failover are largely the same irrespective of the replication schedules that protects the guest VMs. Additionally, a planned failover of the guest VMs protected with Synchronous replication schedule also allows for live migration of the protected guest VMs.

Refer Failover and Failback Management for test, planned, and unplanned failover procedures.

Cross-Cluster Live Migration

Planned failover of the guest VMs protected with Synchronous replication schedule supports live migration to another AHV cluster. Live migration offers zero downtime for your applications during a planned failover event to the recovery cluster (for example, during scheduled maintenance).

Cross-Cluster Live Migration Requirements

The following are the specific requirements to successfully migrate your guest VMs with Live Migration.

Ensure that you meet the following requirements in addition to the requirements of Synchronous replication schedule (Synchronous Replication Requirements) and general requirements of Leap (Leap Requirements).

  • Stretch L2 networks across the primary and recovery sites.

    Network stretch spans your network across different sites. A stretched L2 network retains the IP addresses of guest VMs after their Live Migration to the recovery site.

  • Both the primary and recovery Nutanix clusters must have identical CPU types.

    The primary and recovery Nutanix clusters must have identical CPU feature set. If the CPU feature sets (set of CPU flags) are unidentical, Live Migration fails.

  • Both the primary and recovery Nutanix clusters must run on the same AHV version.
  • If the primary and the recovery Nutanix clusters (Prism Element) are in different subnets, open the ports 49250–49260 for communication. For the complete list of required ports, see Port Reference.
Cross-Cluster Live Migration Limitations

Consider the following limitation in addition to the limitations of Synchronous replication schedule (Synchronous Replication Limitations) and general limitations of Leap (Leap Limitations) before performing live migration of your guest VMs.

  • Live migration of guest VMs fails if the guest VMs are part of Flow security policies.
    Tip: To enable the guest VMs to retain the Flow security policies after the failover (live migration), revoke the policies on the guest VMs and Export them to the recovery site. At the recovery site, Import the policies. The guest VMs read the policies automatically after recovery.

Performing Cross-Cluster Live Migration

If due to a planned event (for example, scheduled maintenance of guest VMs) at the primary availability zone (site), you want to migrate your applications to another AHV cluster without downtime, perform a planned failover with Live Migration to the recovery site.

About this task

To live migrate the guest VMs, do the following procedure at the recovery site.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Policies > Recovery Plans in the left pane.
  3. Select a recovery plan for the failover operation.
    Caution: The Recovery Plans page displays many recovery plans. Select the recovery plan that has Stretch Networks . If you select a recovery plan having Non-stretch networks , the migration fails. For more information about selection of stretch and non-stretch networks, see Creating a Recovery Plan (Leap).
  4. Click Failover from the Actions drop-down menu.
    Specify the following information in the Failover from Recovery Plan window. The window auto-populates the Failover From and Failover To locations from the recovery plan you select in step 3.
    Note: If you select more than one recovery plan in step 3, the Failover action is available only when the selected recovery plans have the same primary and recovery locations.
    Figure. Planned Failover
    Click to enlarge Planned Failover

    1. Failover Type : Click Planned Failover and check Live Migrate VMs .
    2. Click + Add target clusters if you want to failover to specific clusters at the recovery site.
      If you do not add target clusters, the recovery plan migrates the guest VMs to any AHV cluster at the recovery site.
  5. Click Failover .
    The Failover from Recovery Plan dialog box lists the errors and warnings, if any, and allows you to stop or continue the test operation. If there are no errors or you resolve the errors in step 6, the guest VMs migrate and start at the recovery cluster. The migration might show a network latency of 300-600 ms. You cannot see the migrated guest VMs on the primary cluster because those VMs come up at the recovery cluster after the migration.
  6. If you see errors, do the following.
    • To review errors or warnings, click View Details in the description.

      Resolve the error conditions and then restart the failover procedure.

    • Select one of the following.
      • To stop the failover operation, click Abort .
      • To continue the failover operation despite the warnings, click Execute Anyway .
        Note: You cannot continue the failover operation when the validation fails with errors.

Converting a Multi-AZ Deployment to Single-AZ

To use disaster recovery (DR) features that support only single Prism Central (AZ) managed deployments, you can convert your multi-AZ deployment to single-AZ deployment. For example, in two AZ deployments where each Prism Central (Prism Central A, Prism Central B) instance hosts one Prism Element cluster (Prism Element A, Prism Element B) , you can perform the following procedure to convert to a single-AZ deployment (Prism Central A managing both Prism Element A, Prism Element B) .

Before you begin

This procedure converts deployments protected with Synchronous replication schedules. See Synchronous Replication Requirements for the supported Prism Central and AOS versions. To avoid the single point of failure in such deployments, Nutanix recommends installing the single Prism Central at a different AZ (different fault domain).

Perform this procedure to convert deployments protected in Asynchronous and NearSync replications schedules also. The conversion procedure for deployments protected in Asynchronous and NearSync replications schedules are identical except that the protection status (step 2 in the described procedure) of Asynchronous and NearSync replications schedules is available only in Focus > Data Protection .

Figure. Focus
Click to enlarge Focus

About this task

Procedure

  1. Log on to the web console of Prism Central A .
  2. Modify all the protection policies and recovery plans that refer to Prism Element B and Prism Central B .
    1. Modify the protection policies to either remove all the references to Prism Element B and Prism Central B or remove all the guest VMs from the policy.
      For more information about updating a protection policy, see Updating a Protection Policy.
    2. Modify the recovery plans to remove all the references to Prism Element B and Prism Central B .
      Note: If you do not modify the recovery plans, the recovery plans become invalid after the unregistration of Prism Element B with Prism Central B in step 2. For more information about updating a recovery plan, see Updating a Recovery Plan.
    3. Ensure that there are no issues (in Alerts ) with the modified protection policies and recovery plans.
      Note: Before unregistering Prism Element B from Prism Central B in step 2, ensure that no guest VM is protected to and from Prism Element B .
  3. Unprotect all the guest VMs replicating to and from Prism Element B and Prism Central B .
    Note: If the guest VMs are protected by VM categories, update or delete the VM categories from the protection policies and recovery plans.
    To see the unprotect status of the guest VMs, click Focus > Data Protection
  4. Ensure that the guest VMs unprotect completely.
    • To ensure all the stretch states are deleted, log on to Prism Element B through SSH as the nutanix user and run the following command.
      nutanix@cvm$ stretch_params_printer
      Empty response indicates that all stretch states are deleted.
    • To ensure all the stretch states between Prism Central B and Prism Element B are deleted, log on to Prism Central B through SSH as nutanix user and run the following commands.
      pcvm$ mcli
      mcli> mcli dr_coordinator.list
      Empty response indicates that all stretch states are deleted.
  5. Unregister Prism Element B from Prism Central B .
    After unregistering Prism Element B from Prism Central B , the system deletes all Prism Central B attributes and policies applied to guest VMs on Prism Element B (for example, VM categories).
  6. Register Prism Element B to Prism Central A .
    After registering Prism Element B to Prism Central A , reconfigure all Prism Central B attributes and policies applied to entities on the Prism Element B (for example, VM categories).
  7. Modify the protection policies and recovery plans to refer to Prism Central A and Prism Element B .
  8. Unpair Prism Central B .
    To ensure all the stretch states between Prism Central A and Prism Central B are deleted, log on to both Prism Central A and Prism Central B through SSH and run the following command. .
    pcvm$ mcli
    mcli> mcli dr_coordinator.list
    Empty response indicates that all stretch states are deleted.
    Multi-AZ deployment is converted to Single-AZ. Prism Element A and Prism Element B is registered to single Prism Central ( Prism Central A ) managed deployment.

Protection Policy Management

A protection policy automates the creation and replication of recovery points. When creating a protection policy, you specify replication schedules, retention policies for the recovery points, and the guest VMs you want to protect. You also specify a recovery availability zone (maximum 2) if you want to automate recovery point replication to the recovery availability zones (sites).

When you create, update, or delete a protection policy, it synchronizes to the recovery sites and works bidirectionally. The recovery points generated at the recovery sites replicate back to the primary site when the primary site starts functioning. For information about how Leap determines the list of sites for synchronization, see Entity Synchronization Between Paired Availability Zones.

Adding Guest VMs individually to a Protection Policy

You can also protect guest VMs individually in a protection policy from the VMs page, without the use of a VM category. To protect guest VMs individually in a protection policy, perform the following procedure.

About this task

Note: If you protect a guest VM individually, you can remove the guest VM from the protection policy only by using the procedure in Removing Guest VMs individually from a Protection Policy.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Compute & Storage > VMs in the left pane.
    Figure. Virtual Infrastructure: VMs
    Click to enlarge Protect Guest VMs Individually: Virtual Infrastructure

  3. Select the guest VMs that you want to add to a protection policy.
  4. Click Protect from the Actions drop-down menu.
    Figure. Protect Guest VMs Individually: Actions
    Click to enlarge Protect Guest VMs Individually: Actions

  5. Select the protection policy in the table to protect the selected guest VMs.
    Figure. Protect Guest VMs Individually: Protection Policy Selection
    Click to enlarge Protect Guest VMs Individually: Protection Policy Selection

  6. Click Protect .

Removing Guest VMs individually from a Protection Policy

You can remove guest VMs individually from a protection policy from the VMs page. To remove guest VMs individually from a protection policy, perform the following procedure.

About this task

Note: If a guest VM is protected under a VM category, you cannot remove the guest VM from the protection policy by the following procedure. You can remove the guest VM from the protection policy only by dissociating the guest VM from the VM category.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Compute & Storage > VMs in the left pane.
    Figure. Virtual Infrastructure: VMs
    Click to enlarge Protect Guest VMs Individually: Virtual Infrastructure

  3. Select the guest VMs that you want to remove from a protection policy.
  4. Click UnProtect from the Actions drop-down menu.

Cloning a Protection Policy

If the requirements of the protection policy that you want to create are similar to an existing protection policy, you can clone the existing protection policy and update the clone. To clone a protection policy, perform the following procedure.

About this task

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Data Protection > Protection Policies in the left pane.
    Figure. Protection Policy Configuration: Protection Policies
    Click to enlarge Protection Policy Configuration: Protection Policies

  3. Select the protection policy that you want to clone.
  4. Click Clone from the Actions drop-down menu.
  5. Make the required changes on the Clone Protection Policy page.
    For information about the fields on the page, see:
    • Creating a Protection Policy with an Asynchronous Replication Schedule (Leap)
    • Creating a Protection Policy with a NearSync Replication Schedule (Leap)
    • Creating a Protection Policy with the Synchronous Replication Schedule (Leap)
  6. Click Save .

Updating a Protection Policy

You can modify an existing protection policy in Prism Central. To update an existing protection policy, perform the following procedure.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Data Protection > Protection Policies in the left pane.
    Figure. Protection Policy Configuration: Protection Policies
    Click to enlarge Protection Policy Configuration: Protection Policies

  3. Select the protection policy that you want to update.
  4. Click Update from the Actions drop-down menu.
  5. Make the required changes on the Update Protection Policy page.
    For information about the fields on the page, see:
    • Creating a Protection Policy with an Asynchronous Replication Schedule (Leap)
    • Creating a Protection Policy with a NearSync Replication Schedule (Leap)
    • Creating a Protection Policy with the Synchronous Replication Schedule (Leap)
  6. Click Save .

Finding the Protection Policy of a Guest VM

You can use the data protection focus on the VMs page to determine the protection policies to which a guest VM belongs. To determine the protection policy, perform the following procedure.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Virtual Infrastructure > VMs > List in the left pane.
  3. Click Data Protection from the Focus menu at the top-right corner.
    The Protection Policy column that is displayed shows the protection policy to which the guest VMs belong.
    Figure. Focus
    Click to enlarge Focus

Recovery Plan Management

A recovery plan orchestrates the recovery of protected VMs at the recovery site. Recovery plans are predefined procedures (runbooks) that use stages to enforce VM power-on sequence. You can also specify the inter-stage delays to recover applications.

When you create, update, or delete a recovery plan, it synchronizes to the recovery sites and works bidirectionally. For information about how Leap determines the list of sites for synchronization, see Entity Synchronization Between Paired Availability Zones. After a failover from the primary site to a recovery site, you can failback to the primary site by using the same recovery plan.

Recovery plans are independent of protection policies and do not reference protection policies in their configuration information. Also, they do not create recovery points. While the process of planned failover includes the creation of a recovery point so that the latest data can be used for recovery, unplanned and test failovers rely on the availability of the required recovery points at the recovery site. A recovery plan therefore requires the guest VMs in the recovery plan to also be associated with a protection policy.

Adding Guest VMs individually to a Recovery Plan

You can also add guest VMs individually to a recovery plan from the VMs page, without the use of a VM category. To add VMs individually to a recovery plan, perform the following procedure.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Compute & Storage > VMs in the left pane.
    Figure. Virtual Infrastructure: VMs
    Click to enlarge Protect Guest VMs Individually: Virtual Infrastructure

  3. Select the guest VMs that you want to add to a recovery plan.
  4. Click Add to Recovery Plan from the Actions drop-down menu.
  5. Select the recovery plan where you want to add the guest VMs in the Add to Recovery Plan page.
    Tip: Click +Create New if you want to create another recovery plan to add the selected guest VM. For more information about creating a recovery plan, see Creating a Recovery Plan (Leap).
  6. Click Add .
    The Update Recovery Plan page appears. Make the required changes to the recovery plan. For information about the various fields and options, see Creating a Recovery Plan (Leap).

Removing Guest VMs individually from a Recovery Plan

You can also remove guest VMs individually from a recovery plan. To remove guest VMs individually from a recovery plan, perform the following procedure.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Data Protection > Recovery Plans in the left pane.
  3. Click the hamburger icon at the top-left corner of the window. Go to Data Protection > Recovery Plans in the left pane.
  4. Select the recovery plan from which you want to remove guest VM.
  5. Click Update from the Actions drop-down menu.
    The Update Recovery Plan page appears. Make the required changes to the recovery plan. For information about the various fields and options, see Creating a Recovery Plan (Leap).

Updating a Recovery Plan

You can update an existing recovery plan. To update a recovery plan, perform the following procedure.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Data Protection > Recovery Plans in the left pane.
  3. Select the recovery plan that you want to update.
  4. Click Update from the Actions drop-down menu.
    The Update Recovery Plan dialog box appears. Make the required changes to the recovery plan. For information about the various fields and options, see Creating a Recovery Plan (Leap).

Validating a Recovery Plan

You can validate a recovery plan from the recovery site. Recovery plan validation does not perform a failover like the test failover does, but reports warnings and errors. To validate a recovery plan, perform the following procedure.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Data Protection > Recovery Plans in the left pane.
  3. Select the recovery plan that you want to validate.
  4. Click Validate from the Actions drop-down menu.
  5. In the Validate Recovery Plan page, do the following.
    1. In Primary Location , select the primary location.
    2. In Recovery Location , select the recovery location.
    3. Click Proceed .
    The validation process lists any warnings and errors.
  6. Click Back .
    A summary of the validation is displayed. You can close the dialog box.
  7. To return to the detailed results of the validation, click the link in the Validation Errors column.
    The selected recovery plan is validated for its correct configuration. The updated recovery plan starts synchronizing to the recovery Prism Central.

Manual Disaster Recovery (Leap)

Manual data protection involves manually creating recovery points, manually replicating recovery points, and manually recovering the VMs at the recovery site. You can also automate some of these tasks. For example, the last step—that of manually recovering VMs at the recovery site—can be performed by a recovery plan while the underlying recovery point creation and replication can be performed by protection policies. Conversely, you can configure protection policies to automate recovery point creation and replication and recover VMs at the recovery site manually.

Creating Recovery Points Manually (Out-of-Band Snapshots)

About this task

To create recovery points manually, do the following.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Virtual Infrastructure > VMs > List in the left pane.
  3. Select the guest VMs for which you want to create a recovery point.
  4. Click Create Recovery Point from the Actions drop-down menu.
  5. To verify that the recovery point is created, click the name of the VM, click the Recovery Points tab, and verify that a recovery point is created.

Replicating Recovery Points Manually

You can manually replicate recovery points only from the availability zone (site) where the recovery points exist.

About this task

To replicate recovery points manually, do the following.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Virtual Infrastructure > VMs > List in the left pane.
  3. Click the guest VMs whose recovery point you want to replicate, and then click Recovery Points in the left.
    The Recovery Points view lists all the recovery points of the VM.
  4. Select the recovery points that you want to replicate.
  5. Click Replicate from the Actions drop-down menu.
  6. In the Replicate dialog box, do the following.
    1. In Recovery Location , select the location where you want to replicate the recovery point.
    2. In Target Cluster , select the cluster where you want to replicate the recovery point.
    3. Click Replicate Recovery Point .

Recovering a Guest VM from a Recovery Point Manually (Clone)

You can recover a guest VM by cloning the guest VM from a recovery point.

About this task

To recover a guest VM from a recovery point, do the following.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Virtual Infrastructure > VMs > List in the left pane.
  3. Click the VM whose recovery point you want to replicate, and then click Recovery Points in the left.
    The Recovery Points view lists all the recovery points of the VM.
  4. Select the recovery point from which you want to recover the VM.
  5. Click Restore from the Actions drop-down menu.
  6. In the Restore dialog box, do the following.
    1. In the text box provided for specifying a name for the VM, specify a new name or do nothing to use the automatically generated name.
    2. Click Restore .
    Warning: The following are the limitations of the manually recovered VMs (VMs recovered without the use of a recovery plan).
    • The VMs recover without a VNIC if the recovery is performed at the remote site.
    • VM categories are not applied.
    • NGT needs be reconfigured.

Entity Synchronization Between Paired Availability Zones

When paired with each other, availability zones (sites) synchronize disaster recovery (DR) configuration entities. Paired sites synchronize the following DR configuration entities.

Protection Policies
A protection policy is synchronized whenever you create, update, or delete the protection policy.
Recovery Plans
A recovery plan is synchronized whenever you create, update, or delete the recovery plan. The list of availability zones (sites) to which the on-prem must synchronize a recovery plan is derived from the guest VMs that are included in the recovery plan. The guest VMs used to derive the availability zone list are VM categories and individually added guest VMs.

If you specify VM categories in a recovery plan, Leap determines which protection policies use those VM categories, and then synchronizes the recovery plans to the availability zones specified in those protection plans.

If you include guest VMs individually (without VM categories) in a recovery plan, Leap uses the recovery points of those guest VMs to determine which protection policies created those recovery points, and then synchronizes the recovery plans to the availability zones (sites) specified in those protection policies. If you create a recovery plan for VM categories or guest VMs that are not associated with a protection policy, Leap cannot determine the availability zone list and therefore cannot synchronize the recovery plan. If a recovery plan includes only individually added guest VMs and a protection policy associated with a guest VM has not yet created guest VM recovery points, Leap cannot synchronize the recovery plan to the availability zone specified in that protection policy. However, recovery plans are monitored every 15 minutes for the availability of recovery points that can help derive availability zone information. When recovery points become available, the paired on-prem site derives the availability zone by the process described earlier and synchronizes the recovery plan to the availability zone.

VM Categories used in Protection Policies and Recovery Plans
A VM category is synchronized when you specify the VM category in a protection policy or recovery plan.
Issues such as a loss of network connectivity between paired availability zones or user actions such as unpairing of availability zones followed by repairing of those availability zones can affect VM synchronization.
Tip: Nutanix recommends to unprotect all the VMs on the availability zone before unpairing it to avoid getting into a state where the entities have stale configurations after repairing of availability zones.

If you update guest VMs in either or both availability zones before such issues are resolved or before unpaired availability zones are paired again, VM synchronization is not possible. Also, during VM synchronization, if a guest VM cannot be synchronized because of an update failure or conflict (for example, you updated the same VM in both availability zones during a network connectivity issue), no further VMs are synchronized. Entity synchronization can resume only after you resolve the error or conflict. To resolve a conflict, use the Entity Sync option, which is available in the web console. Force synchronization from the availability zone that has the desired configuration. Forced synchronization overwrites conflicting configurations in the paired availability zone.
Note: Forced synchronization cannot resolve errors arising from conflicting values in guest VM specifications (for example, the paired availability zone already has a VM with the same name).

If you do not update entities before a connectivity issue is resolved or before you pair the availability zones again, the synchronization behavior described earlier resumes. Also, pairing previously unpaired availability zones trigger an automatic synchronization event. For recommendations to avoid facing such issues, see Entity Synchronization Recommendations (Leap).

Entity Synchronization Recommendations (Leap)

Consider the following recommendations to avoid inconsistencies and the resulting synchronization issues.

  • During network connectivity issues, do not update entities at both the availability zones (sites) in a pair. You can safely make updates at any one site. After the connectivity issue is resolved, force synchronization from the site in which you made updates. Failure to adhere to this recommendation results in synchronization failures.

    You can safely create entities at either or both the sites as long as you do not assign the same name to entities at the two sites. After the connectivity issue is resolved, force synchronization from the site where you created entities.

  • If one of the sites becomes unavailable, or if any service in the paired site is down perform force synchronization from the paired availability zone after the issue is resolved.

Forcing Entity Synchronization (Leap)

Entity synchronization, when forced from an availability zone (site), overwrites the corresponding entities in paired sites. Forced synchronization also creates, updates, and removes those entities from paired sites.

About this task

The availability zone (site) to which a particular entity is forcefully synchronized depends on which site requires the entity (see Entity Synchronization Between Paired Availability Zones). To avoid inadvertently overwriting required entities, ensure to force VM synchronization from the site in which the entities have the desired configuration.

If a site is paired with two or more availability zones (sites), you cannot select one or more sites with which to synchronize entities.

To force entity synchronization, do the following.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the settings button (gear icon) at the top-right corner of the window.
  3. Click Entity Sync in the left pane.
  4. In the Entity Sync dialog box, review the message at the top of the dialog box, and then do the following.
    1. To review the list of entities that will be synchronized to an AVAILABILITY ZONE , click the number of ENTITIES adjacent to an availability zone.
    2. After you review the list of entities, click Back .
  5. Click Sync Entities .

Protection and DR between On-Prem Site and Xi Cloud Service (Xi Leap)

Xi Leap is essentially an extension of Leap to Xi Cloud Services. Xi Leap protects your guest VMs and orchestrates their disaster recovery (DR) to Xi Cloud Services when events causing service disruption occur at the primary availability zone (site). For protection of your guest VMs, protection policies with Asynchronous and NearSync replication schedules generate and replicate recovery points to Xi Cloud Services. Recovery plans orchestrate DR from the replicated recovery points to Xi Cloud Services.

Protection policies create a recovery point—and set its expiry time—in every iteration of the specified time period (RPO). For example, the policy creates a recovery point every 1 hour for an RPO schedule of 1 hour. The recovery point expires at its designated expiry time based on the retention policy—see step 3 in Creating a Protection Policy with Asynchronous Replication Schedule (Xi Leap). If there is a prolonged outage at a site, the Nutanix cluster retains the last recovery point to ensure you do not lose all the recovery points. For NearSync replication (lightweight snapshot), the Nutanix cluster retains the last full hourly snapshot. During the outage, the Nutanix cluster does not clean up the recovery points due to expiry. When the Nutanix cluster comes online, it cleans up the recovery points that are past expiry immediately.

If a guest VM is removed from a protection policy, Delete all the recovery points associated with the guest VM. If the recovery points are not deleted explicitly, the recovery points adhere to the expiration period set in the protection policy and will continue to incur charges until the expiry. To stop the charges immediately, log on to Xi Cloud Services and delete all of these explicitly.

For High Availability of a guest VM, Leap can enable replication of recovery points to one or more sites. A protection policy can replicate recovery points to maximum two sites. One of the two sites can be in cloud (Xi Cloud Services). For replication to Xi Cloud Services, you must add a replication schedule between the on-prem site and Xi Cloud Services. You can set up the on-prem site and Xi Cloud Services in the following arrangements.

Figure. The Primary Nutanix Cluster at on-prem site and recovery Xi Cloud Services
Click to enlarge Disaster recovery to an on-prem site and Xi Cloud

Figure. The Primary Xi Cloud Services and recovery Nutanix Cluster at on-prem site
Click to enlarge Disaster recovery to an on-prem site and Xi Cloud

The replication schedule between an on-prem site and Xi Cloud Services enables DR to Xi Cloud Services. To enable performing DR to Xi Cloud Services, you must create a recovery plan. In addition to performing DR from AHV clusters to Xi Cloud Services (only AHV), you can also perform cross-hypervisor disaster recovery (CHDR)—DR from ESXi clusters to Xi Cloud Services.

The protection policies and recovery plans you create or update synchronize continuously between the on-prem site and Xi Cloud Services. The reverse synchronization enables you to create or update entities (protection policies, recovery plans, and guest VMs) at either the primary or the recovery site.

This section describes protection of your guest VMs and DR from Xi Cloud Services to a Nutanix cluster at the on-prem site. In Xi Cloud Services, you can protect your guest VMs and DR to a Nutanix cluster at only one on-prem site. For information about protection of your guest VMs and DR to Xi Cloud Services, see Protection and DR between On-Prem Sites (Leap).

Xi Leap Requirements

The following are the general requirements of Xi Leap. Along with the general requirements, there are specific requirements for protection with the following supported replication schedules.

  • For information about the on-prem node, disk and Foundation configurations required to support Asynchronous and NearSync replication schedules, see On-Prem Hardware Resource Requirements.
  • For specific requirements of protection with Asynchronous replication schedule (1 hour or greater RPO), see Asynchronous Replication Requirements (Xi Leap).
  • For specific requirements of protection with NearSync replication schedule (1–15 minutes RPO), see NearSync Replication Requirements (Xi Leap).

License Requirements

The AOS license required depends on the features that you want to use. For information about the features that are available with an AOS license, see Software Options.

Hypervisor Requirements

The underlying hypervisors required differ in all the supported replication schedules. For more information about underlying hypervisor requirements for the supported replication schedules, see:

  • Asynchronous Replication Requirements (Xi Leap)
  • NearSync Replication Requirements (Xi Leap)

Nutanix Software Requirements

  • Each on-prem availability zone (site) must have a Leap enabled Prism Central instance. To enable Leap in Prism Central, see Enabling Leap for On-Prem Site.
    Note: If you are using ESXi, register at least one vCenter Server to Prism Central. You can also register two vCenter Servers, each to a Prism Central at different sites. If you register both the Prism Central to the single vCenter Server, ensure that each ESXi cluster is part of different datacenter object in vCenter.

  • The on-prem Prism Central and its registered Nutanix clusters (Prism Element) must be running on the supported AOS versions. For more information about the required versions for the supported replication schedules, see:
    • Asynchronous Replication Requirements (Xi Leap)
    • NearSync Replication Requirements (Xi Leap)
    Tip:

    Nutanix supports replications between the all the latest supported LTS and STS released AOS versions. To check the list of the latest supported AOS versions, see KB-5505. To determine if the AOS versions currently running on your clusters are EOL, see the EOL document .

    Upgrade the AOS version to the next available supported LTS/STS release. To determine if an upgrade path is supported, check the Upgrade Paths page before you upgrade the AOS.

    Note: If both clusters have different AOS versions that are EOL, upgrade the cluster with lower AOS version to match the cluster with higher AOS version and then perform the upgrade to the next supported LTS version.

    For example, the clusters are running AOS versions 5.5.x and 5.10.x respectively. Upgrade the cluster on 5.5.x to 5.10.x. After both the clusters are on 5.10.x, proceed to upgrade each cluster to 5.15.x (supported LTS). Once both clusters are on 5.15.x you can upgrade the clusters to 5.20.x or newer.

    Nutanix recommends that both the primary and the replication clusters or sites run the same AOS version.

User Requirements

You must have one of the following roles in Xi Cloud Services.

  • User admin
  • Prism Central admin
  • Prism Self Service admin
  • Xi admin

Firewall Port Requirements

To allow two-way replication between an on-prem Nutanix cluster and and Xi Cloud Services, you must enable certain ports in your external firewall. To know about the required ports, see Disaster Recovery - Leap in Port Reference.

Networking Requirements

Requirements for static IP address preservation after failover
You can preserve one IP address of a guest VM (with static IP address) for its failover (DR) to an IPAM network. After the failover, the other IP addresses of the guest VM have to be reconfigured manually. To preserve an IP address of a guest VM (with static IP address), ensure that:
Caution: By default, you cannot preserve statically assigned DNS IP addresses after failover (DR) of guest VMs. However, you can create custom in-guest scripts to preserve the statically assigned DNS IP addresses. For more information, see Creating a Recovery Plan (Xi Leap).
  • Both the primary and the recovery Nutanix clusters run AOS 5.11 or newer.
  • The protected guest VMs have Nutanix Guest Tools (NGT) version 1.5 or newer installed.

    For information about installing NGT, see Nutanix Guest Tools in Prism Web Console Guide .

  • The protected guest VMs have at least one empty CD-ROM slot.

    The empty CD-ROM is required for mounting NGT at the recovery site.

  • The protected guest VMs have NetworkManager command-line tool (nmcli) version 0.9.10.0 or newer installed.
    Also, the NetworkManager must manage the networks on Linux VMs. To enable NetworkManager on a Linux VM, in the interface configuration file, set the value of the NM_CONTROLLED field to yes . After setting the field, restart the network service on the VM.
    Tip: In CentOS, the interface configuration file is /etc/sysconfig/network-scripts/ifcfg-eth0 .
Requirements for static IP address mapping of guest VMs between source and target virtual networks
You can explicitly define IP addresses for protected guest VMs that have static IP addresses at the primary site. On recovery, such guest VMs retain the explicitly defined IP address. To map static IP addresses of guest VMs between source and target virtual networks, ensure that:
  • Both the primary and the recovery Nutanix clusters run AOS 5.17 or newer.
  • The protected guest VMs have static IP addresses at the primary site.
  • The protected guest VMs have Nutanix Guest Tools (NGT) version 1.5 or newer installed.

    For information about installing NGT, see Nutanix Guest Tools in Prism Web Console Guide .

  • The protected guest VMs have at least one empty CD-ROM slot.

    The empty CD-ROM is required for mounting NGT at the recovery site.

  • The protected guest VMs can reach the Controller VM from both the sites.
  • The recovery plan selected for failover has VM-level IP address mapping configured.
Virtual Network Design Requirements
You can design the virtual subnets that you plan to use for DR to the recovery site so that they can accommodate the guest VMs running in the source virtual network.
  • To use a virtual network as a recovery virtual network, ensure that the virtual network meets the following requirements.
    • The network prefix is the same as the network prefix of the source virtual network. For example, if the source network address is 192.0.2.0/24, the network prefix of the recovery virtual network must also be 24.
    • The gateway IP address is the same as the gateway IP address in the source network. For example, if the gateway IP address in the source virtual network 192.0.2.0/24 is 192.0.2.10, the last octet of the gateway IP address in the recovery virtual network must also be 10.
  • To use a single Nutanix cluster as a target for DR from multiple primary Nutanix clusters, ensure that the number of virtual networks on the recovery cluster is equal to the sum of the number of virtual networks on the individual primary Nutanix clusters. For example, if there are two primary Nutanix clusters, with one cluster having m networks and the other cluster having n networks, ensure that the recovery cluster has m + n networks. Such a design ensures that all recovered VMs attach to a network.
  • After the recovery of guest VMs to Xi Cloud Services, ensure that the router in your primary site stops advertising the subnet that hosted the guest VMs.
  • The protected guest VMs and Prism Central VM must be on different networks.

    If protected guest VMs and Prism Central VM are on the same network, the Prism Central VM becomes inaccessible when the route to the network is removed after failover.

  • Xi Cloud Services supports the following third-party VPN gateway solutions.
    • CheckPoint
    • Cisco ASA
    • PaloAlto
      Note: If you are using the Palo Alto VPN gateway solution, set the MTU value to 1356 in the Tunnel Interface settings. The replication fails for the default MTU value ( 1427 ).

    • Juniper SRX
    • Fortinet
    • SonicWall
    • VyOS

Additional Requirements

  • Both the primary and recovery Nutanix clusters must have an external IP address.
  • Both the primary and recovery Prism Centrals and Nutanix clusters must have a data services IP address.
  • The Nutanix cluster that hosts the Prism Centrals must meet the following requirements.
    • The Nutanix cluster must be registered to the Prism Central instance.
    • The Nutanix cluster must have an iSCSI data services IP address configured on it.
    • The Nutanix cluster must also have sufficient memory to support a hot add of memory to all Prism Central nodes when you enable Leap. A small Prism Central instance (4 vCPUs, 16 GB memory) requires a hot add of 4 GB, and a large Prism Central instance (8 vCPUs, 32 GB memory) requires a hot add of 8 GB. If you enable Nutanix Flow, each Prism Central instance requires an extra hot-add of 1 GB.
  • Each node in a scaled-out Prism Central instance must have a minimum of 4 vCPUs and 16 GB memory.

    For more information about the scaled-out deployments of a Prism Central, see Leap Terminology.

  • The protected guest VMs must have Nutanix VM mobility drivers installed.

    Nutanix VM mobility drivers are required for accessing the guest VMs after failover. Without Nutanix VM mobility drivers, the guest VMs become inaccessible after a failover.

Xi Leap Limitations

Consider the following general limitations before configuring protection and disaster recovery (DR) with Xi Leap. Along with the general limitations, there are specific limitations of protection with the following supported replication schedules.

  • For specific limitations of protection with Asynchronous replication schedule (1 hour or greater RPO), see Asynchronous Replication Limitations (Xi Leap).
  • For specific limitations of protection with NearSync replication schedule (1–15 minutes RPO), see NearSync Replication Limitations (Xi Leap).

Virtual Machine Limitations

  • You cannot start or replicate the following guest VMs at Xi Cloud Services.

    • VMs configured with a GPU resource.
    • VMs configured with four or more vNUMA sockets.
    • VMs configured with more than 24 vCPUs.
    • VMs configured with more than 128 GB memory.
  • You cannot deploy witness VMs.
  • You cannot protect multiple guest VMs that use disk sharing (for example, multi-writer sharing, Microsoft Failover Clusters, Oracle RAC).

  • You cannot protect VMware fault tolerance enabled guest VMs.

  • You cannot recover vGPU console enabled guest VMs efficiently.

    When you perform DR of vGPU console-enabled guest VMs, the VMs recover with the default VGA console (without any alert) instead of vGPU console. The guest VMs fail to recover when you perform cross-hypervisor disaster recovery (CHDR).

  • You cannot recover guest VMs with vGPU.

    However, you can manually restore guest VMs with vGPU.

  • You cannot configure NICs for a guest VM across both the virtual private clouds (VPC).

    You can configure NICs for a guest VM associated with either production or test VPC.

Volume Groups Limitation

You cannot protect volume groups.

Network Segmentation Limitation

You cannot apply network segmentation for management traffic (any traffic not on the backplane network) in Xi Leap.

You get an error when you try to enable network segmentation for management traffic on a Leap enabled Nutanix Cluster or enable Leap in a network segmentation enabled Nutanix cluster. For more information about network segmentation, see Securing Traffic Through Network Segmentation in the Security Guide .
Note: However, you can apply network segmentation for backplane traffic at the primary and recovery clusters. Nutanix does not recommend this because when you perform a planned failover of guest VMs having network segmentation for backplane enabled, the guest VMs fail to recover and the guest VMs at the primary AZ are removed.

Self-Service Restore

You cannot perform self-service restore.

Virtual Network Limitations

Although there is no limit to the number of VLANs that you can create, only the first 500 VLANs list in drop-down of Network Settings while creating a recovery plan. For more information about VLANs in the recovery plan, see Nutanix Virtual Networks.

Xi Leap Configuration Maximums

For the maximum number of entities you can configure with different replication schedules and perform failover (disaster recovery), see Nutanix Configuration Maximums. The limits have been tested for Xi Leap production deployments. Nutanix does not guarantee the system to be able to operate beyond these limits.

Tip: Upgrade your NCC version to 3.10.1 to get configuration alerts.

Xi Leap Recommendations

Nutanix recommends the following best practices for configuring protection and disaster recovery (DR) with Xi Leap.

Recommendation for Migrating Protection Domains to Protection Policies

You can protect a guest VM either with legacy DR solution (protection domain-based) or with Leap. To protect a legacy DR-protected guest VM with Leap, you must migrate the guest VM from protection domain to a protection policy. During the migration, do not delete the guest VM snapshots in the protection domain. Nutanix recommends keeping the guest VM snapshots in the protection domain until the first recovery point for the guest VM is available on Prism Central. For more information, see Migrating Guest VMs from a Protection Domain to a Protection Policy.

Recommendation for Virtual Networks

  • Map the networks while creating a recovery plan in Prism Central.
  • Recovery plans do not support overlapping subnets in a network-mapping configuration. Do not create virtual networks that have the same name or overlapping IP address ranges.

General Recommendations

  • Create all entities (protection policies, recovery plans, and VM categories) at the primary availability zone (site).
  • Upgrade Prism Central before upgrading the Nutanix clusters (Prism Elements) registered to it.
  • Do not include the guest VMs protected with Asynchronous, NearSync, and Synchronous replication schedules in the same recovery plan. You can include guest VMs protected with Asynchronous or NearSync replication schedules in the same recovery plan. However, if you combine these guest VMs with the guest VMs protected by Synchronous replication schedules in a recovery plan, the recovery fails.

Xi Leap Service-Level Agreements (SLAs)

Xi Leap is essentially an extension of Leap to Xi Cloud Services. Xi Leap enables protection of your guest VMs and disaster recovery (DR) to Xi Cloud Services. With reverse synchronization, Xi Leap can protect you guest VMs and enable DR from Xi Cloud Services to a Nutanix cluster at an on-prem availability zone (site). A Nutanix cluster is essentially an AHV or an ESXi cluster running AOS. In addition to performing DR from AHV clusters to Xi Cloud Services (only AHV), you can also perform cross-hypervisor disaster recovery (CHDR)—DR from ESXi clusters to Xi Cloud Services.

You can protect your guest VMs with the following replication schedules.

  • Asynchronous (1 hour or greater RPO). For information about protection with Asynchronous replication in Xi Leap, see Protection with Asynchronous Replication and DR (Xi Leap).
  • NearSync (1–15 minute RPO). For information about protection with NearSync replication in Xi Leap, see Protection with NearSync Replication and DR (Xi Leap).

Xi Leap Views

The disaster recovery views enable you to perform CRUD options on the following types of Leap VMs.

  • Configured entities (for example, availability zones, protection policies, and recovery plans)
  • Created entities (for example, VMs, and recovery points)

Some views available in the Xi Cloud Services differ from the corresponding view in on-prem Prism Central. For example, the option to connect to an availability zone is on the Availability Zones page in an on-prem Prism Central, but not on the Availability Zones page in Xi Cloud Services. However, the views of both user interfaces are largely the same. This chapter describes the views of Xi Cloud Services.

Availability Zones View in Xi Cloud Services

The Availability Zones view lists all of your paired availability zones.

The following figure is a sample view, and the tables describe the fields and the actions that you can perform in this view.

Figure. AZs View Click to enlarge AZs View

Table 1. Availability Zones View Fields
Field Description
Name Name of the availability zone.
Region Region to which the availability zone belongs.
Type Type of availability zone. Availability zones in Xi Cloud Services are shown as being of type Xi. Availability zones that are backed by on-prem Prism Central instances are shown to be of type physical. The availability zone that you are logged in to is shown as a local availability zone.
Connectivity Status Status of connectivity between the local availability zone and the paired availability zone.
Table 2. Workflows Available in the Availability Zones View
Workflow Description
Connect to Availability Zone (on-prem Prism Central only) Connect to an on-prem Prism Central or to a Xi Cloud Services for data replication.
Table 3. Actions Available in the Actions Menu
Action Description
Disconnect Disconnect the remote availability zone. When you disconnect an availability zone, the pairing is removed.

Protection Policies View in Xi Cloud Services

The Protection Policies view lists all of configured protection policies from all availability zones.

The following figure is a sample view, and the tables describe the fields and the actions that you can perform in this view.

Figure. Protection Policies View Click to enlarge Protection Policies View

Table 1. Protection Policies View Fields
Field Description
Name Name of the protection policy.
Primary Location Replication source site for the protection policy.
Recovery Location Replication target site for the protection policy.
RPO Recovery point objective for the protection policy.
Remote Retention Number of retention points at the remote site.
Local Retention Number of retention points at the local site.
Table 2. Workflows Available in the Protection Policies View
Workflow Description
Create protection policy Create a protection policy.
Table 3. Actions Available in the Actions Menu
Action Description
Update Update the protection policy.
Clone Clone the protection policy.
Delete Delete the protection policy.

Recovery Plans View in Xi Cloud Services

The Recovery Plans view lists all of configured recovery plans from all availability zones.

The following figure is a sample view, and the tables describe the fields and the actions that you can perform in this view.

Figure. Recovery Plans View Click to enlarge Recovery Plans View

Table 1. Recovery Plans View Fields
Field Description
Name Name of the recovery plan.
Source Replication source site for the recovery plan.
Destination Replication target site for the recovery plan.
Entities Sum of the following VMs:
  • Number of local, live VMs that are specified in the recovery plan.
  • Number of remote VMs that the recovery plan can recover at this site.
Last Validation Status Status of the most recent validation of the recovery plan.
Last Test Status Status of the most recent test performed on the recovery plan.
Table 2. Workflows Available in the Recovery Plans View
Workflow Description
Create Recovery Plan Create a recovery plan.
Table 3. Actions Available in the Actions Menu
Action Description
Validate Validates the recovery plan to ensure that the VMs in the recovery plan have a valid configuration and can be recovered.
Test Test the recovery plan.
Update Update the recovery plan.
Failover Perform a failover.
Delete Delete the recovery plan.

Dashboard Widgets in Xi Cloud Services

The Xi Cloud Services dashboard includes widgets that display the statuses of configured protection policies and recovery plans. If you have not configured these VMs, the widgets display a summary of the steps required to get started with Leap.

To view these widgets, click the Dashboard tab.

The following figure is a sample view of the dashboard widgets.

Figure. Dashboard Widgets for Xi Leap Click to enlarge Dashboard Widgets

Enabling Leap in the On-Prem Site

To perform disaster recovery (DR) from Xi Cloud Services to a Nutanix cluster at an on-prem availability zone (site), enable Leap at the on-prem site (Prism Central) only. You need not enable Leap in the Xi Cloud Services portal; Xi Cloud Services does that by default for you. Without enabling Leap, you can configure protection policies and recovery plans that synchronize to the on-prem site but you cannot perform failover and failback operations.

To enable Leap at the on-prem site, see Enabling Leap for On-Prem Site.

Xi Leap Environment Setup

You can set up a secure environment to enable replication between an on-prem site and Xi Cloud Services with virtual private network (VPN). To configure the required environment, perform the following steps.

  1. Pair your on-prem availability zone (site) with Xi Cloud Services. For more information about pairing, see Pairing Availability Zones (Xi Leap).
  2. Set up an on-prem VPN solution.
  3. Enable VPN on the production virtual private cloud by using the Xi Cloud Services portal.
  4. Set up a VPN client as a VM in Xi Cloud Services to enable connectivity to the applications that have failed over to the Xi Cloud Services.
  5. Configure policy-based routing (PBR) rules for the VPN to successfully work with the Xi Cloud Services. If you have a firewall in the Xi Cloud Services and a floating IP address is assigned to the firewall, create a PBR policy in the Xi Cloud Services to configure the firewall as the gateway to the Internet. For example, specify 10.0.0.2/32 (private IP address of the firewall) in the Subnet IP . For more information, see Policy Configuration in Xi Infrastructure Service Administration Guide .
  6. Configure the custom DNS in your virtual private cloud in the Xi Cloud Services. For more information, see Virtual Private Cloud Management in Xi Infrastructure Service Administration Guide .
Note: For more information about Xi Cloud Services, see Xi Infrastructure Service Administration Guide.

Pairing Availability Zones (Xi Leap)

To perform disaster recovery (DR) from Xi Cloud Services to a Nutanix cluster at an on-prem availability zone (site), pair the on-prem site (Prism Central) only to Xi Cloud Services. For reverse synchronization, you need not pair again from Xi Cloud Services portal; Xi Cloud Services captures the paring configuration from the on-prem site that pairs Xi Cloud Services.

To pair an on-prem site with Xi Cloud Services, see Pairing Availability Zones (Leap).

VPN Configuration (On-prem and Xi Cloud Services)

Xi Cloud Services enables you to set up a secure VPN connection between your on-prem sites and Xi Cloud Services to enable end-to-end disaster recovery services of Leap. A VPN solution between your on-prem site and Xi Cloud Services enables secure communication between your on-prem Prism Central instance and the production virtual private cloud (VPC) in Xi Cloud Services. If your workload fails over to Xi Cloud Services, the communication between the on-prem resources and failed over resources in Xi Cloud Services takes place over an IPSec tunnel established by the VPN solution.

Note: Set up the VPN connection before data replication begins.

You can connect multiple on-prem sites to Xi Cloud Services. If you have multiple remote sites, you can set up secure VPN connectivity between each of your remote sites and Xi Cloud Services. With this configuration, you do not need to force the traffic from your remote site through your main site to Xi Cloud Services.

A VPN solution to connect to Xi Cloud Services includes a VPN gateway appliance in the Xi Cloud and a VPN gateway appliance (remote peer VPN appliance) in your on-prem site. A VPN gateway appliance learns about the local routes, establishes an IPSec tunnel with its remote peer, exchanges routes with its peer, and directs network traffic through the VPN tunnel.

After you complete the VPN configuration in the Xi Cloud Services portal, Nutanix creates a virtual VPN gateway appliance in the Xi Cloud. To set up a remote peer VPN gateway appliance in your on-prem site, you can either use the On Prem - Nutanix VPN solution (provided by Nutanix) or use a third-party VPN solution:

  • On Prem - Nutanix (recommended): If you select this option, Nutanix creates a VPN gateway VM (remote peer appliance) on your on-prem cluster, connects the appliance to your network, and establishes an IPsec tunnel with the VPN gateway that is running in the Xi Cloud.

    The Nutanix VPN controller runs as a service in the Xi Cloud and on the on-prem Nutanix cluster and is responsible for the creation, setup, and lifecycle maintenance of the VPN gateway appliance (in the Xi Cloud and on-prem). The VPN controller deploys the virtual VPN gateway appliance in the Xi Cloud after you complete the VPN configuration in the Xi Cloud Services portal. The on-prem VPN controller deploys the virtual VPN gateway appliance on the on-prem cluster in the subnet you specify when you configure a VPN gateway in the Xi Cloud Services portal.

    The virtual VPN gateway appliance in the Xi Cloud and VPN gateway VM (peer appliance) in your on-prem cluster each consume 1 physical core, 4 GB RAM, and 10 GB storage.

  • On Prem - Third Party : If you select this option, you must manually set up a VPN solution as an on-prem VPN gateway (peer appliance) that can establish an IPsec tunnel with the VPN gateway VM in the Xi Cloud. The on-prem VPN gateway (peer appliance) can be a virtual or hardware appliance. See On-Prem - Third-Party VPN Solution for a list of supported third-party VPN solutions.

VPN Configuration Entities

To set up a secure VPN connection between your on-prem sites and Xi Cloud Services, configure the following entities in the Xi Cloud Services portal:

  • VPN Gateway : Represents the gateway of your VPN appliances.

    VPN gateways are of the following types:

    • Xi Gateway : Represents the Xi VPN gateway appliance
    • On Prem - Nutanix Gateway : Represents the VPN gateway appliance at your on-prem site if you are using the on-prem Nutanix VPN solution.
    • On Prem - Third Party Gateway : Represents the VPN gateway appliance at your on-prem site if you are using your own VPN solution (provided by a third-party vendor).
  • VPN Connection : Represents the VPN IPSec tunnel established between a VPN gateway in the Xi Cloud and VPN gateway in your on-prem site. When you create a VPN connection, you select a Xi gateway and on-prem gateway between which you want to create the VPN connection.

You configure a VPN gateway in the Xi Cloud and at each of the on-prem sites you want to connect to the Xi Cloud. You then configure a VPN connection between a VPN gateway in the Xi Cloud and VPN gateway in your on-prem site.

Single-Site Connection

If you want to connect only one on-prem site to the Xi Cloud, configure the following entities in the Xi Cloud Services portal:

  1. One Xi gateway to represent the Xi VPN gateway appliance
  2. One on-prem gateway (On-prem - Nutanix Gateway or on-prem - third-party Gateway) to represent the VPN gateway appliance at your on-prem site
  3. One VPN connection to connect the two VPN gateways
Figure. Single-Site Connection Click to enlarge

Multi-Site Connection

If you want to connect multiple on-prem sites to the Xi Cloud, configure the following entities in the Xi Cloud Services portal:

  1. One Xi gateway to represent the Xi VPN gateway appliance
  2. On-prem gateways (On-prem - Nutanix Gateway or on-prem - third-party Gateway) for each on-prem site
  3. VPN connections to connect the Xi gateway and the on-prem gateway at each on-prem site

For example, if you want to connect two on-prem sites to the Xi Cloud, configure the following:

  1. One Xi gateway
  2. Two on-prem gateways for the two on-prem sites
  3. Two VPN connections
Figure. Multi-Site Connection for less the 1 Gbps Bandwidth Click to enlarge

One Xi VPN gateway provides 1 Gbps of aggregate bandwidth for IPSec traffic. Therefore, connect only as many on-prem VPN gateways to one Xi VPN gateway to accommodate 1 Gbps of aggregate bandwidth.

If you require an aggregate bandwidth of more than 1 Gbps, configure multiple Xi VPN gateways.

Figure. Multi-Site Connection for more the 1 Gbps Bandwidth Click to enlarge

On-Prem - Nutanix VPN Solution

You can use the on-prem - Nutanix VPN solution to set up VPN between your on-prem site and Xi Cloud Services. If you select this option, you are using an end-to-end VPN solution provided by Nutanix and you do not need to use your own VPN solution to connect to Xi Cloud Services.

After you complete the VPN configuration in the Xi Cloud Services portal, Nutanix creates a virtual VPN gateway appliance in the Xi Cloud. The On Prem - Nutanix VPN solution creates a VPN gateway VM (remote peer appliance) on your on-prem cluster, connects the appliance to your network, and establishes an IPsec tunnel with the VPN gateway VM that is running in the Xi Cloud.

Following is the workflow if you choose the On Prem - Nutanix VPN solution to set up a VPN connection between your on-prem site and Xi Cloud Services.

  1. Create one or more Xi VPN gateways.
  2. The VPN controller running in Xi Cloud Services creates a VPN gateway VM in the Xi Cloud. The Xi VPN gateway VM runs in your (tenant) overlay network.
  3. Create one or more on-prem VPN gateways.

    Create a VPN gateway for each on-prem site that you want to connect to the Xi Cloud.

  4. Create one or more VPN connections.

    Create a VPN connection between each on-prem site (on-prem VPN gateway) and Xi Cloud (Xi gateway).

  5. The VPN controller creates a VPN gateway VM on the on-prem cluster in the subnet you specify when you create an on-prem VPN gateway. The VPN gateway VM becomes the peer appliance to the VPN gateway VM in the Xi Cloud.
  6. Both the VPN appliances are now configured, and the appliances now proceed to perform the following:
    1. An on-prem router communicates the on-prem routes to the on-prem VPN gateway by using iBGP or OSPF.
    2. The Xi VPN controller communicates the Xi subnets to the Xi VPN gateway VM.
    3. The on-prem VPN gateway VM then establishes a VPN IPsec tunnel with the Xi VPN gateway VM. Both appliances establish an eBGP peering session over the IPsec tunnel and exchange routes.
    4. The on-prem VPN gateway VM publishes the Xi subnet routes to the on-prem router by using iBGP or OSPF.
Nutanix VPN Solution Requirements

In your on-prem site, ensure the following before you configure VPN on Xi Cloud Services:

  1. The Prism Central instance and cluster are running AOS 5.11 or newer for AHV and AOS 5.19 or newer for ESXi.

  2. A router with iBGP, OSPF, or Static support to communicate the on-prem routes to the on-prem VPN gateway VM.
  3. Depending on whether you are using iBGP or OSPF, ensure that you have one of the following:
    • Peer IP (for iBGP): The IP address of the on-prem router to exchange routes with the VPN gateway VM.
    • Area ID (for OSPF): The OSPF area ID for the VPN gateway in the IP address format.
  4. Determine the following details for the deployment of the on-prem VPN gateway VM.
    • Subnet UUID : The UUID of the subnet of the on-prem cluster in which you want to install the on-prem VPN gateway VM. Log on to your on-prem Prism Central web console to determine the UUID of the subnet.
    • Public IP address of the VPN Gateway Device : A public WAN IP address that you want the on-prem gateway to use to communicate with the Xi VPN gateway appliance.
    • VPN VM IP Address : A static IP address that you want to allocate to the on-prem VPN gateway VM.
    • IP Prefix Length : The subnet mask in CIDR format of the subnet on which you want to install the on-prem VPN gateway VM.
    • Default Gateway IP : The gateway IP address for the on-prem VPN gateway appliance.
    • On Prem Gateway ASN : ASN must not be the same as any of your on-prem BGP ASNs. If you already have a BGP environment in your on-prem site, the customer gateway is the ASN for your organization. If you do not have a BGP environment in your on-prem site, you can choose any number. For example, you can choose a number in the 65000 range.
On-Prem Site Firewall Port Requirements

Configure rules for ports in your on-prem firewall depending on your deployment scenario.

On-Prem Behind a Network Address Translation or Firewall Device

In this scenario, the IPSec tunnel terminates behind a network address translation (NAT) or firewall device. For NAT to work, open UDP ports 500 and 4500 in both directions. Ports 1024–1034 are ephemeral ports used by the CVMs.

Enable the on-prem VPN gateway to allow the traffic according to the rules described in the Port Rules table.

IPSec Terminates on the Firewall Device

In this scenario, you do not need to open the ports for NAT (500 and 4500), but enable the on-prem VPN gateway to allow the traffic according to the rules described in the Port Rules table.

In the following table, PC subnet refers to the subnet where your on-prem Prism Central is running. The Xi infrastructure load balancer route is where the traffic for Xi CVMs and PC is located. You receive this information when you begin using Xi Cloud Services.

Table 1. Port Rules
Source address Destination address Source port Destination port
PC subnet Load balancer route advertised Any 1024–1034
Xi infrastructure load balancer route PC and CVM subnet Any

2020

2009

9440

The following port requirements are applicable only if you are using the Nutanix VPN solution.
Nutanix VPN VM 8.8.8.8 and 8.8.4.4 IP addresses of the DNS server VPN VM DNS UDP port 53
Nutanix VPN VM time.google.com, 0.pool.ntp.org, 1.pool.ntp.org, 2.pool.ntp.org of the NTP server VPN VM NTP UDP port 123
Nutanix VPN VM ICMP ping to NTP servers NA NA
CVM IP address in AHV clusters HTTPS request to the Internet AHV hosts HTTPS port 443
CVM IP address in ESXi clusters HTTPS and FTP requests to the Internet ESXi hosts HTTPS port 443 and FTP 21
Creating a Xi VPN Gateway

Create a VPN gateway to represent the Xi VPN gateway appliance.

About this task

Perform the following to create a Xi VPN gateway.

Procedure

  1. In the Xi Cloud Services portal, go to Explore -> Networking -> VPN Gateways .
  2. Click Create VPN Gateway .
    Figure. Create a Xi Gateway Click to enlarge

    The Create VPN Gateway window appears.

  3. Do the following in the indicated fields.
    1. Name: Enter a name for the VPN gateway.
    2. VPC: Select the production VPC.
    3. Type: Select Xi Gateway.
    4. Routing Protocol : Select eBGP or Static to set up a routing protocol between the Xi and on-prem gateways.
    5. ( eBGP only): Select this option if you want to set up the eBGP routing protocol between the Xi and on-prem gateways. Do the following in the indicated fields.
      • In the ASN field, set an ASN for the Xi gateway. Ensure that the Xi gateway ASN is different from that on-prem gateway ASN.
      • In the eBGP Password field, set up a password for the eBGP session that is established between the on-prem VPN gateway and Xi VPN gateway. The eBGP password can be any string, preferably alphanumeric.
    6. ( Static only) If you select this option, manually set up static routes between the Xi and on-prem gateways.

      For more information, see Adding a Static Route in Xi Infrastructure Service Administration Guide .

  4. Click Save .
    The Xi gateway you create is displayed in the VPN Gateways page.
Creating an On-Prem VPN Gateway (Nutanix)

Create a VPN gateway to represent the on-prem VPN gateway appliance.

About this task

Perform the following to create an on-prem VPN gateway.

Procedure

  1. In the Xi Cloud Services portal, go to Explore -> Networking -> VPN Gateways .
  2. Click Create VPN Gateway .
    Figure. Create a on-prem Gateway Click to enlarge

    The Create VPN Gateway window appears.

  3. Do the following in the indicated fields.
    1. Name : Enter a name for the VPN gateway.
    2. Type : Select On Prem - Nutanix .
    3. Automatically add route in PC and PE CVMs to enable replication : Select this option to automatically enable traffic between the on-prem CVMs and CVMs in Xi Cloud Services. If you select this option, a route to the CVMs in Xi Cloud Services is added with the on-prem VPN gateway as the next-hop. Therefore, even if you choose to have static routes between your on-prem router and the on-prem gateway, you do not need to manually add those static routes (see step g).

      A route to Xi CVMs is added with the on-prem VPN gateway as the next-hop.

      Note: This option is only for the CVM-to-CVM (on-prem CVM and Xi Cloud CVMs) traffic.
    4. Under Routing Protocol (between Xi Gateway and On Prem Nutanix Gateway) , do the following to set up the eBGP routing protocol between the Xi and on-prem gateways:
      • In the ASN field, enter the ASN for your on-prem gateway. If you do not have a BGP environment in your on-prem site, you can choose any number. For example, you can choose a number in the 65000 range. Ensure that the Xi gateway ASN and on-prem gateway ASN are not the same.
      • In the eBGP Password field, enter the same eBGP password as the Xi gateway.
    5. Subnet UUID : Enter the UUID of the subnet of the on-prem cluster in which you want to install the on-prem VPN gateway VM. Log on to your on-prem Prism Central web console to determine the UUID of the subnet.
    6. Under IP Address Information , do the following in the indicated fields:
      • Public IP Address of the On Premises VPN Gateway Device : Enter a public WAN IP address for the on-prem VPN gateway VM.
      • VPN VM IP Address : Enter a static IP address that you want to allocate to the on-prem VPN gateway VM.
      • IP Prefix Length : Enter the subnet mask of mask length 24 of the subnet on which you want to install the on-prem VPN gateway VM.
      • Default Gateway IP : Enter the gateway IP address of the subnet on which you want to install the on-prem VPN gateway VM.
    7. Under Routing Protocol Configuration , do the following in the indicated fields:
      • In the Routing Protocol drop-down list, select the dynamic routing protocol ( OSPF , iBGP , or Static ) to set up the routing protocol between the on-prem router and on-prem gateway.
      • ( Static only) If you select Static , manually add these routes in Xi Cloud Services.

        For more information, see Adding a Static Route in Xi Infrastructure Service Administration Guide .

        Note: You do not need to add static routes for CVM-to-CVM traffic (see step c).
      • ( OSPF only) If you select OSPF , in the Area ID field, type the OSPF area ID for the VPN gateway in the IP address format. In the Password Type field, select MD5 and type a password for the OSPF session.
      • ( iBGP only) If you select iBGP , in the Peer IP field, type the IP address of the on-prem router to exchange routes with the VPN gateway VM. In the Password field, type the password for the iBGP session.
  4. Click Save .
    The on-prem gateway you create is displayed in the VPN Gateways page.
Creating a VPN Connection

Create a VPN connection to establish a VPN IPSec tunnel between a VPN gateway in the Xi Cloud and VPN gateway in your on-prem site. Select the Xi gateway and on-prem gateway between whom you want to create the VPN connection.

About this task

Perform the following to create a VPN connection.

Procedure

  1. In the Xi Cloud Services portal, go to Explore -> Networking -> VPN Connections .
  2. Click Create VPN Connection .
    Figure. Create a VPN Connection Click to enlarge

    The Create VPN Connection window appears.

  3. Do the following in the indicated fields:
    1. Name : Enter a name for the VPN connection.
    2. Description : Enter a description for the VPN connection.
    3. IPSec Secret . Enter an alphanumeric string as the IPSec string for the VPN connection.
    4. Xi Gateway : Select the Xi gateway for which you want to create this VPN connection.
    5. On Premises Gateway : Select the on-prem gateway for which you want to create this VPN connection.
    6. Dynamic Route Priority : This is not a mandatory field. Set this field if you have multiple routes to the same destination. For example, consider you have VPN connection 1 and VPN connection 2 and you want VPN connection 1 to take precedence over VPN connection 2, set the priority for VPN connection 1 higher than VPN connection 2. Higher the priority number, higher is the precedence of that connection. You can set a priority number from 10 through 1000.
      See the Routes Precedence section in Routes Management in Xi Infrastructure Service Administration Guide for more information.
  4. Click Save .
    The VPN connection you create is displayed in the VPN Connections page.

On-Prem - Third-Party VPN Solution

You can use your own VPN solution to connect your on-prem site to Xi Cloud Services. If you select this option, you must manually set up a VPN solution by using a supported third-party VPN solution as an on-prem VPN gateway (peer appliance) that can establish an IPsec tunnel with the VPN gateway VM in the Xi Cloud.

Following is the workflow if you want to use a third-party VPN solution to set up a VPN connection between your on-prem site and Xi Cloud Services.

  1. Create one or more Xi VPN gateways.
  2. The VPN controller running in Xi Cloud Services creates a VPN gateway VM in the Xi Cloud. The Xi VPN gateway VM runs in your (tenant) overlay network.
  3. Create one or more on-prem VPN gateways.

    Create a VPN gateway for each on-prem site that you want to connect to the Xi Cloud.

  4. Create one or more VPN connections.

    Create a VPN connection to create an IPSec tunnel between each on-prem site (on-prem VPN gateway) and Xi Cloud (Xi gateway).

  5. Configure a peer VPN gateway appliance (hardware or virtual) in your on-prem site. Depending upon your VPN solution, you can download detailed instructions about how to configure your on-prem VPN gateway appliance. For more information, see Downloading the On-Prem VPN Appliance Configuration.

    Xi Cloud Services supports the following third-party VPN gateway solutions.

    • CheckPoint
    • Cisco ASA
    • PaloAlto
      Note: If you are using the Palo Alto VPN gateway solution, set the MTU value to 1356 in the Tunnel Interface settings. The replication fails for the default MTU value (1427).
    • Juniper SRX
    • Fortinet
    • SonicWall
    • VyOS
Third-Party VPN Solution Requirements

Ensure the following in your on-prem site before you configure VPN in Xi Cloud Services.

  1. A third-party VPN solution in your on-prem site that functions as an on-prem VPN gateway (peer appliance).
  2. The on-prem VPN gateway appliance supports the following.
    • IPSec IKEv2
    • Tunnel interfaces
    • External Border Gateway Protocol (eBGP)
  3. Note the following details of the on-prem VPN gateway appliance.
    • On Prem Gateway ASN : Assign an ASN for your on-prem gateway. If you already have a BGP environment in your on-prem site, the customer gateway is the ASN for your organization. If you do not have a BGP environment in your on-prem site, you can choose any number. For example, you can choose a number in the 65000 range.
    • Xi Gateway ASN : Assign an ASN for the Xi gateway. The Xi gateway ASN must not be the same as the on-prem gateway ASN.
    • eBGP Password : The eBGP password is the shared password between the Xi gateway and on-prem gateway. Set the same password for both the gateways.
    • Public IP address of the VPN Gateway Device : Ensure that the public IP address of the on-prem VPN gateway appliance can reach the public IP address of Xi Cloud Services.
  4. The on-prem VPN gateway appliance can route the traffic from the on-prem CVM subnets to the established VPN tunnel.
  5. Ensure that the following ports are open in your on-prem VPN gateway appliance.
    • IKEv2: Port number 500 of the payload type UDP.
    • IPSec: Port number 4500 of the payload type UDP.
    • BGP: Port number 179 of the payload type TCP.
On-Prem Site Firewall Port Requirements

Configure rules for ports in your on-prem firewall depending on your deployment scenario.

On-Prem Behind a Network Address Translation or Firewall Device

In this scenario, the IPSec tunnel terminates behind a network address translation (NAT) or firewall device. For NAT to work, open UDP ports 500 and 4500 in both directions. Ports 1024–1034 are ephemeral ports used by the CVMs.

Enable the on-prem VPN gateway to allow the traffic according to the rules described in the Port Rules table.

IPSec Terminates on the Firewall Device

In this scenario, you do not need to open the ports for NAT (500 and 4500), but enable the on-prem VPN gateway to allow the traffic according to the rules described in the Port Rules table.

In the following table, PC subnet refers to the subnet where your on-prem Prism Central is running. The Xi infrastructure load balancer route is where the traffic for Xi CVMs and PC is located. You receive this information when you begin using Xi Cloud Services.

Table 1. Port Rules
Source address Destination address Source port Destination port
PC subnet Load balancer route advertised Any 1024–1034
Xi infrastructure load balancer route PC and CVM subnet Any

2020

2009

9440

The following port requirements are applicable only if you are using the Nutanix VPN solution.
Nutanix VPN VM 8.8.8.8 and 8.8.4.4 IP addresses of the DNS server VPN VM DNS UDP port 53
Nutanix VPN VM time.google.com, 0.pool.ntp.org, 1.pool.ntp.org, 2.pool.ntp.org of the NTP server VPN VM NTP UDP port 123
Nutanix VPN VM ICMP ping to NTP servers NA NA
CVM IP address in AHV clusters HTTPS request to the Internet AHV hosts HTTPS port 443
CVM IP address in ESXi clusters HTTPS and FTP requests to the Internet ESXi hosts HTTPS port 443 and FTP 21
Creating a Xi VPN Gateway

Create a VPN gateway to represent the Xi VPN gateway appliance.

About this task

Perform the following to create a Xi VPN gateway.

Procedure

  1. In the Xi Cloud Services portal, go to Explore -> Networking -> VPN Gateways .
  2. Click Create VPN Gateway .
    Figure. Create a Xi Gateway Click to enlarge

    The Create VPN Gateway window appears.

  3. Do the following in the indicated fields.
    1. Name: Enter a name for the VPN gateway.
    2. VPC: Select the production VPC.
    3. Type: Select Xi Gateway.
    4. Routing Protocol : Select eBGP or Static to set up a routing protocol between the Xi and on-prem gateways.
    5. ( eBGP only): Select this option if you want to set up the eBGP routing protocol between the Xi and on-prem gateways. Do the following in the indicated fields.
      • In the ASN field, set an ASN for the Xi gateway. Ensure that the Xi gateway ASN is different from that on-prem gateway ASN.
      • In the eBGP Password field, set up a password for the eBGP session that is established between the on-prem VPN gateway and Xi VPN gateway. The eBGP password can be any string, preferably alphanumeric.
    6. ( Static only) If you select this option, manually set up static routes between the Xi and on-prem gateways.

      For more information, see Adding a Static Route in Xi Infrastructure Service Administration Guide .

  4. Click Save .
    The Xi gateway you create is displayed in the VPN Gateways page.
Creating an On-Prem VPN Gateway (Third-Party)

Create a VPN gateway to represent the on-prem VPN gateway appliance.

Before you begin

Ensure that you have all the details about your on-prem VPN appliance as described in Third-Party VPN Solution Requirements.

About this task

Perform the following to create an on-prem VPN gateway.

Procedure

  1. In the Xi Cloud Services portal, go to Explore -> Networking -> VPN Gateways .
  2. Click Create VPN Gateway .
    Figure. Create an On Prem Gateway Click to enlarge

  3. Do the following in the indicated fields.
    1. Name : Enter a name for the VPN gateway.
    2. Type : Select On Prem - Third Party .
    3. IP Address of your Firewall or Router Device performing VPN : Enter the IP address of the on-prem VPN appliance.
    4. Routing Protocol : Select eBGP or Static to set up a routing protocol between the Xi and on-prem gateways.
    5. ( eBGP only) If you select eBGP, do the following:
      • In the ASN field, enter the ASN for your on-prem gateway. If you do not have a BGP environment in your on-prem site, you can choose any number. For example, you can choose a number in the 65000 range. Ensure that the Xi gateway ASN and on-prem gateway ASN are not the same.
      • In the eBGP Password field, enter the same eBGP password as the Xi gateway.
    6. ( Static only) If you select Static , manually set up static routes between the Xi and on-prem gateways.

      For more information, see Adding a Static Route in Xi Infrastructure Service Administration Guide .

Creating a VPN Connection

Create a VPN connection to establish a VPN IPSec tunnel between a VPN gateway in the Xi Cloud and VPN gateway in your on-prem site. Select the Xi gateway and on-prem gateway between whom you want to create the VPN connection.

About this task

Perform the following to create a VPN connection.

Procedure

  1. In the Xi Cloud Services portal, go to Explore -> Networking -> VPN Connections .
  2. Click Create VPN Connection .
    Figure. Create a VPN Connection Click to enlarge

    The Create VPN Connection window appears.

  3. Do the following in the indicated fields:
    1. Name : Enter a name for the VPN connection.
    2. Description : Enter a description for the VPN connection.
    3. IPSec Secret . Enter an alphanumeric string as the IPSec string for the VPN connection.
    4. Xi Gateway : Select the Xi gateway for which you want to create this VPN connection.
    5. On Premises Gateway : Select the on-prem gateway for which you want to create this VPN connection.
    6. Dynamic Route Priority : This is not a mandatory field. Set this field if you have multiple routes to the same destination. For example, consider you have VPN connection 1 and VPN connection 2 and you want VPN connection 1 to take precedence over VPN connection 2, set the priority for VPN connection 1 higher than VPN connection 2. Higher the priority number, higher is the precedence of that connection. You can set a priority number from 10 through 1000.
      See the Routes Precedence section in Routes Management in Xi Infrastructure Service Administration Guide for more information.
  4. Click Save .
    The VPN connection you create is displayed in the VPN Connections page.
Downloading the On-Prem VPN Appliance Configuration

Depending upon your VPN solution, you can download detailed instructions about how to configure your on-prem VPN gateway appliance.

About this task

Perform the following to download the instructions to configure your on-prem VPN gateway appliance.

Procedure

  1. In the Xi Cloud Services portal, go to Explore -> Networking -> VPN Gateways .
  2. Click an on-prem VPN gateway.
  3. In the details page, click On Prem Gateway Configuration .
    Figure. On-prem VPN Gateway Appliance Configuration Click to enlarge

  4. Select the type and version of your on-prem VPN gateway appliance and click Download .
  5. Follow the instructions in the downloaded file to configure the on-prem VPN gateway appliance.

VPN Gateway Management

You can see the details of each VPN gateway, update the gateway, or delete the gateway.

All your VPN gateways are displayed in the VPN Gateways page.

Displaying the Details of a VPN Gateway

You can display the details such as the type of gateway, VPC, IP addresses, protocols, and connections associated with the gateways.

About this task

Perform the following to display the details of a VPN gateway.

Procedure

  1. In the Xi Cloud Services portal, go to Explore -> Networking -> VPN Gateways .
    A list of all your VPN gateways is displayed. The VPN gateways table displays details such as the name, type, VPC, status, public IP address, and VPN connections associated with each VPN gateway.
    Figure. VPN Gateways List Click to enlarge

  2. Click the name of a VPN gateway to display additional details of that VPN gateway.
  3. In the details page, click the name of a VPN connection to display the details of that VPN connection associated with the gateway.
Updating a VPN Gateway

The details that you can update in a VPN gateway depend on the type of gateway (Xi gateway or On Prem gateway).

About this task

Perform the following to update a VPN gateway.

Procedure

  1. In the Xi Cloud Services portal, go to Explore -> Networking -> VPN Gateways .
  2. Do one of the following:
    • Select the checkbox next to the name of the VPN gateway and, in the Actions drop-down list, click Update .
      Figure. Use the Actions drop-down list Click to enlarge

    • Click the name of the VPN gateway and, in the details page that appears, click Update .

    The Update VPN Gateway dialog box appears.

  3. Update the details as required.
    The fields are similar to the Create VPN Gateway dialog box. For more information, see Creating a Xi VPN Gateway, Creating an On-Prem VPN Gateway (Nutanix), or Creating an On-Prem VPN Gateway (Third-Party) depending on the type of gateway you are updating.
  4. Click Save .
Deleting a VPN Gateway

If you want to delete a VPN gateway, you must first delete all the VPN connections associated with the gateway and only then you can delete the VPN gateway.

About this task

Perform the following to delete a VPN gateway.

Procedure

  1. In the Xi Cloud Services portal, go to Explore -> Networking -> VPN Gateways .
  2. Do one of the following:
    • Select the checkbox next to the name of the VPN gateway and, in the Actions drop-down list, click Delete .
    • Click the name of the VPN gateway and, in the details page that appears, click Delete .
  3. Click OK in the confirmation message that appears to delete the VPN gateway.

VPN Connection Management

You can see the details of each VPN connection, update the connection, or delete the connection.

All your VPN connections are displayed in the VPN Connections page.

Displaying the Details of a VPN Connection

You can display details such as the gateways associated with the connection, protocol details, Xi gateway routes, throughput of the connection, and logs of the IPSec and eBGP sessions for troubleshooting purposes.

About this task

Perform the following to display the details of a VPN connection.

Procedure

  1. In the Xi Cloud Services portal, go to Explore -> Networking -> VPN Connections .
    A list of all your VPN connections is displayed. The VPN connections table displays details such as the name, IPSec and eBGP status, dynamic route priority, and the VPC and gateways associated with each VPN connection.
    Figure. VPN Connections List Click to enlarge

  2. Click the name of a VPN connection to display more details of that VPN connection.
    The details page displays the following tabs:
    • Summary : Displays details of each gateway, protocol, and Xi gateway routes associated with the connection.
    • Throughput : Displays a graph for throughput of the VPN connection.
    • IPSec Logging : Displays logs of the IPSec sessions of the VPN connection. You can see these logs to troubleshoot any issues with the VPN connection.
    • EBGP Logging : Displays logs of the eBGP sessions of the VPN connection. You can see these logs to troubleshoot any issues with the VPN connection.

    Click the name of the tab to display the details in that tab. For example, click the Summary tab to display the details.

    Figure. VPN Connection Summary Tab Click to enlarge

Updating a VPN Connection

You can update the name, description, IPSec secret, and dynamic route priority of the VPN connection.

About this task

Perform the following to update a VPN connection.

Procedure

  1. In the Xi Cloud Services portal, go to Explore -> Networking -> VPN Connections .
  2. Do one of the following:
    • Select the checkbox next to the name of the VPN connection and, in the Actions drop-down list, click Update .
      Figure. Use the Actions drop-down list Click to enlarge

    • Click the name of the VPN connection and, in the details page that appears, click Update .
      Figure. Click the name of the VPN connection Click to enlarge

    The Update VPN Connection dialog box appears.

  3. Update the details as required.
    The fields are similar to the Create VPN Connection dialog box. See Creating a VPN Connection for more information.
  4. Click Save .
Deleting a VPN Connection

About this task

Perform the following to delete a VPN connection.

Procedure

  1. In the Xi Cloud Services portal, go to Explore -> Networking -> VPN Connections .
  2. Do one of the following:
    • Select the checkbox next to the name of the VPN connection and, in the Actions drop-down list, click Delete .
    • Click the name of the VPN connection and, in the details page that appears, click Delete .
  3. Click OK in the confirmation message that appears to delete the VPN connection.

Upgrading the VPN Gateway Appliances

You can upgrade the VPN gateway VM in the Xi Cloud and on-prem VPN gateway VM in your on-prem site if you are using the On Prem - Nutanix VPN solution by using the Xi Cloud Services portal. If you are using a third-party VPN solution, you can upgrade only the VPN gateway VM running in the Xi Cloud by using the Xi Cloud Services portal. To upgrade the on-prem VPN gateway appliance provided by a third-party vendor, see the documentation of that vendor for instructions about how to upgrade the VPN appliance.

About this task

Note: The VPN gateway VM restarts after the upgrade is complete. Therefore, perform the upgrade during a scheduled maintenance window.

Perform the following to upgrade your VPN gateway appliances.

Procedure

  1. In the Xi Cloud Services portal, go to Explore -> Networking -> VPN Gateways .
  2. Click the name of the VPN gateway.

    To upgrade the VPN gateway VM running in the Xi Cloud, select a Xi gateway.

    To upgrade the VPN gateway VM running in your on-prem site, select the on-prem gateway associated with that on-prem VPN gateway VM.

  3. In the details page of the gateway, click the link in the Version row.

    The VPN Version dialog box appears.

    If you are using the latest version of the VPN gateway VM, the VPN Version dialog box displays a message that your VPN gateway VM is up to date.

    If your VPN gateway VM is not up to date, the VPN Version dialog box displays the Upgrade option.

  4. In the VPN Version dialog box, click Upgrade to upgrade your VPN gateway VM to the latest version.
    The VPN gateway VM restarts after the upgrade is complete and starts with the latest version.

Nutanix Virtual Networks

A planned or an unplanned failover for production workloads requires production virtual networks in both the primary and the recovery site. To ensure that a failover operation, whenever necessary, goes as expected, you also need test virtual network in both the sites for testing your recovery configuration in both directions (failover and failback). To isolate production and test workflows, a recovery plan in Leap uses four separate virtual networks, which are as follows.

Two Production Networks
A production virtual network in the primary site is mapped to a production network in the recovery site. Production failover and failback are confined to these virtual networks.
Two Test Networks
The production virtual network in each site is mapped to a test virtual network in the paired site. Test failover and failback are confined to these virtual networks.

The following figures show the source and target networks for planned, unplanned, and test failovers.

Figure. Virtual Network Mapping Click to enlarge Virtual Network Mapping (On-Prem to On-Prem)

Figure. Virtual Network Mapping (On-Prem to Xi Cloud Services) Click to enlarge Virtual Network Mapping (On-Prem to Xi Cloud Services)

Virtual networks on on-prem Nutanix clusters are virtual subnets bound to a single VLAN. At on-prem sites (including the recovery site), you must manually create the production and test virtual networks before you create your first recovery plan.

The virtual networks required in Xi Cloud Services are contained within virtual private clouds (VPCs). Virtual networks required for production workloads are contained within a virtual private cloud named production. Virtual networks required for testing failover from on-prem sites are contained within a virtual private cloud named Test. The task of creating virtual networks in the VPCs in Xi Cloud Services is an optional one. If you do not create a virtual network in a VPC, Leap dynamically creates the virtual networks for you when a failover operation is in progress. Leap cleans up dynamically created virtual networks when they are no longer required (after failback).

Note: You cannot create more VPCs in Xi Cloud Services. However, you can update the VPCs to specify settings such as DNS and DHCP, and you can configure policies to secure the virtual networks.

Virtual Subnet Configuration in On-Prem Site

You can use your on-prem Prism Central instance to create, modify, and remove virtual networks. For information about how to perform these procedures by using Prism Central, see the Prism Central Guide .

Virtual Subnet Configuration in Xi Cloud Services

You can create virtual subnets in the production and test virtual networks. This is an optional task. You must perform these procedures in Xi Cloud Services. For more information, see the Xi Infrastructure Services Guide .

Xi Leap RPO Sizer

Nutanix offers standard service level agreements (SLAs) for data replication from your on-prem AHV clusters to Xi Cloud Services based on RPO and RTO. The replication to Xi Cloud Services occurs over public Internet (VPN or DirectConnect) and therefore the network bandwidth available for replication to Xi Cloud Services cannot be controlled. The unstable network bandwidth and the lack of network information affects the amount of data that can be replicated in a given time frame. You can test your RPO objectives by setting up a real protection policy or use Xi Leap RPO sizer utility to simulate the protection plan (without replicating data to Xi Cloud Services). Xi Leap RPO Sizer provides you with information required to determine if the RPO SLAs are achievable. The utility provides insights on your network bandwidth, estimates performance, calculates actual change rate, and calculates the feasible RPO for your data protection plan.

About this task

See Xi Leap Service-Level Agreements (SLAs) for more information about Nutanix SLAs for data replication to Xi Cloud Services. To use the Xi Leap RPO Sizer utility, perform the following steps.

Procedure

  1. Log on to the My Nutanix portal with your account credentials.
  2. Click Launch in Xi Leap RPO Sizer widget.
  3. (optional) Download the bundle (rpo_sizer.tar) using the hyperlink given in the instructions.
    Tip: You can also download the bundle directly (using wget command in CLI) into the directory after step 4.a.
  4. Log on to any on-prem guest VM through an SSH session and do the following.
    Note: The guest VM must have connectivity to the Prism Central VM and CVMs.
    1. Create a separate directory to ensure that all the downloaded and extracted files inside the downloaded bundle remain in one place.
      nutanix@cvm$ mkdir dir_name

      Replace dir_name with an identifiable name. For example, rpo_sizer.

    2. (optional) Copy the downloaded bundle into the directory created in the previous step.
      nutanix@cvm$ cp download_bundle_path/rpo_sizer.tar ./dir_name/

      Replace download_bundle_path with the path to the downloaded bundle.

      Replace dir_name with the directory name created in the previous step.

      Tip: If you download the bundle directly (using wget command in CLI) from the directory, you can skip this step.
    3. Go to the directory where the bundle is stored and extract the bundle.
      nutanix@cvm$ cd ./dir_name

      Replace dir_name with the directory name created in the step 4.a.

      nutanix@cvm$ tar -xvf rpo_sizer.tar
      This command generates rpo_sizer.sh and rposizer.tar in the same directory.
    4. Change the permissions to make the extracted shell file executable.
      nutanix@cvm$ chmod +x rpo_sizer.sh
    5. Run the shell script in the bundle.
      nutanix@cvm$ ./rpo_sizer.sh
      Note: If you ran the Xi Leap RPO Sizer previously on the Prism Central VM, ensure that you clean up the script before you run the shell script again. Run the command ./rpo_sizer.sh delete to clean up the script. If you do not clean up the script, you get an error similar to
      The container name "/rpo_sizer" is already in use by container "xxxx"(where xxxx is the container name. You have to remove (or rename) that container to be able to reuse that name
      .
  5. Open a web browser and go to http:// Prism_Central_IP_address :8001/ to run the RPO test.

    Replace Prism_Central_IP_address with the virtual IP address of your Prism Central deployment.

    Note: If you have set up a firewall on Prism Central, ensure that the port 8001 is open.
    nutanix@cvm$ modify_firewall -p 8001 -o open -i eth0 -a
    Close the port after running the RPO test.
    nutanix@cvm$ modify_firewall -p 8001 -o close -i eth0 -a
  6. Click Configure and execute test and specify the following information in the Configuration Wizard .
    Note: If you are launching the Xi Leap RPO Sizer utility for the first time, generate an API key pair. To generate API key pair, see Creating an API Key in the Nutanix Licensing Guide guide.
    1. In the API Key and PC Credentials tab, specify the following information.
        1. API Key : Enter the API key that you generated.
        2. Key ID : Enter the key ID that you generated.
        3. PC IP : Enter the IP address of Prism Central VM.
        4. Username : Enter the username of your Prism Central deployment.
        5. Password : Enter the password of your Prism Central deployment.
        6. Click Next .
    2. In the Select Desired RPO and Entities tab, select the desired RPO from the drop-down list, select the VM Categories or individual VMs, and click + . If you want to add more RPO and entities to the test, enter the information again and click Next .
      Note: Only when you select the desired RPO, you can select the VM Categories or individual VMs on which you can test the selected RPO.
      The system discovers Prism Element automatically based on the VM Categories and the individual VMs you choose.
    3. In the Enter PE credentials tab, enter the SSH password or SSH key for Prism Element ("nutanix" user) running on AHV cluster and click Next .
    4. In the Network Configuration tab, specify the following information.
        1. Select region : Select a region closest to Xi Leap datacenter from the drop-down list where the workloads should be copied.
        2. Select availability zone : Select an availability zone (site) from the drop-down list.
        3. NAT Gateway IPs : Enter the public facing IP address of Prism Element running on your AHV cluster.
          Note: To find the NAT gateway IP address of Prism Element running on your AHV cluster, log on to Prism Element through an SSH session (as the "nutanix" user) and run the curl ifconfig.me command.

          Note: Do not turn on the Configure Advanced Options switch unless advised by the Nutanix Support.
        4. Click Next .
    5. In the View Configuration tab, review the RPO, entity, and network configuration, the estimated test duration, and click Submit .
    The new window shows the ongoing test status in a progress bar.
  7. When the RPO test completes, click Upload result to upload the test result. To view the detailed and intuitive report of the test, click View Report . To abort the test, click X .
    Note: If a test is in progress, a new test cannot be triggered.

Protection and Automated DR (Xi Leap)

Automated data recovery (DR) configurations use protection policies to protect the guest VMs, and recovery plans to orchestrate the recovery of those guest VMs to Xi Cloud Services. With reverse synchronization, you can protect guest VMs and enable DR from Xi Cloud Services to a Nutanix cluster at an on-prem availability zone (site). You can automate protection of your guest VMs with the following supported replication schedules in Xi Leap.

  • Asynchronous replication schedule (1 hour or greater RPO). For information about protection with Asynchronous replication schedule, see Protection with Asynchronous Replication and DR (Xi Leap).
  • NearSync replication schedule (1–15 minute RPO). For information about protection with NearSync replication schedule, see Protection with NearSync Replication and DR (Xi Leap).

Protection with Asynchronous Replication and DR (Xi Leap)

Asynchronous replication schedules enable you to protect your guest VMs with an RPO of 1 hour or beyond. A protection policy with an Asynchronous replication schedule creates a recovery point in an hourly time interval, and replicates it to Xi Cloud Services for High Availability. For guest VMs protected with Asynchronous replication schedule, you can perform disaster recovery (DR) to Xi Cloud Services. With reverse synchronization, you can perform DR from Xi Cloud Services to a Nutanix cluster at an on-prem site. In addition to performing DR from AHV clusters to Xi Cloud Services (only AHV), you can also perform cross-hypervisor disaster recovery (CHDR)—DR from ESXi clusters to Xi Cloud Services.

Note: Nutanix provides multiple disaster recovery (DR) solutions to secure your environment. See Nutanix Disaster Recovery Solutions for the detailed representation of the DR offerings of Nutanix.

Asynchronous Replication Requirements (Xi Leap)

The following are the specific requirements for protecting your guest VMs with Asynchronous replication schedule. Ensure that you meet the following requirements in addition to the general requirements of Xi Leap.

For information about the general requirements of Xi Leap, see Xi Leap Requirements.

For information about the on-prem node, disk and Foundation configurations required to support Asynchronous replication schedules, see On-Prem Hardware Resource Requirements.

Hypervisor Requirements

AHV or ESXi

  • The AHV clusters must be running on AHV versions that come bundled with the latest version of AOS.
  • The ESXi clusters must be running on version ESXi 6.5 GA or newer.

Nutanix Software Requirements

The on-prem Prism Central and their registered clusters (Prism Elements) must be running the following versions of AOS.

  • AOS 5.10 or newer with AHV.
  • AOS 5.11 or newer with ESXi.

Xi Cloud Services runs the latest versions of AOS.

Cross Hypervisor Disaster Recovery (CHDR) Requirements

Guest VMs protected with Asynchronous replication schedule support cross-hypervisor disaster recovery. You can perform failover (DR) to recover guest VMs from ESXi clusters to AHV clusters (Xi Cloud Services) by considering the following requirements.

  • The on-prem Nutanix clusters must be running AOS 5.11 or newer.
  • Install and configure Nutanix Guest Tools (NGT) on all the guest VMs. For more information, see Enabling and Mounting Nutanix Guest Tools in Prism Web Console Guide .

    NGT configures the guest VMs with all the required drivers for VM portability. For more information about general NGT requirements, see Nutanix Guest Tools Requirements and Limitations in Prism Web Console Guide .

  • CHDR supports guest VMs with flat files only.
  • CHDR supports IDE/SCSI and SATA disks only.
  • For all the non-boot SCSI disks of Windows guest VMs, set the SAN policy to OnlineAll so that they come online automatically.
  • In vSphere 6.7, guest VMs are configured with UEFI secure boot by default. Upon CHDR to an AHV cluster, these guest VMs do not start if the host does not support the UEFI secure boot feature. For more information about supportability of UEFI secure boot on Nutanix clusters, see Compatibility Matrix.

    For operating systems that support UEFI and Secure Boot, see UEFI and Secure Boot Support for CHDR.

  • Nutanix does not support vSphere inventory mapping (for example, VM folder and resource pools) when protecting workloads between VMware clusters.

  • Nutanix does not support vSphere snapshots or delta disk files.

    If you have delta disks attached to a guest VM and you proceed with failover, you get a validation warning and the guest VM does not recover. Contact Nutanix Support for assistance.

Table 1. Operating Systems Supported for CHDR
Operating System Version Requirements and limitations
Windows
  • Windows 2008 R2 or newer versions
  • Windows 7 or newer versions
  • Only 64-bit operating systems are supported.
Linux
  • CentOS 6.5 and 7.0
  • RHEL 6.5 or newer and RHEL 7.0 or newer.
  • Oracle Linux 6.5 and 7.0
  • Ubuntu 14.04
  • SLES operating system is not supported.

Additional Requirement

The storage container name of the protected guest VMs must be the same on both the primary and recovery clusters. Therefore, a storage container must exist on the recovery cluster with the same name as the one on the primary cluster. For example, if the protected VMs are in the SelfServiceContainer storage container on the primary cluster, there must also be a SelfServiceContainer storage container on the recovery cluster.

Asynchronous Replication Limitations (Xi Leap)

Consider the following specific limitations before protecting your guest VMs with Asynchronous replication schedule. These limitations are in addition to the general limitations of Xi Leap.

For information about the general limitations of Leap, see Xi Leap Limitations.

  • You cannot restore guest VMs with incompatible GPUs at the recovery site.
  • You cannot protect guest VMs configured as part of a network function chain.
  • You cannot retain hypervisor-specific properties after cross hypervisor disaster recovery (CHDR).

    Cross hypervisor disaster recovery (CHDR) does not preserve hypervisor-specific properties (for example, multi-writer flags, independent persistent and non-persistent disks, changed block tracking (CBT), PVSCSI disk configurations).

Creating a Protection Policy with Asynchronous Replication Schedule (Xi Leap)

To protect the guest VMs in an hourly replication schedule, configure an Asynchronous replication schedule while creating the protection policy. The policy takes recovery points of those guest VMs in the specified time intervals (hourly) and replicates them to Xi Cloud Services for High Availability. With reverse synchronization, you can create policy at Xi Cloud Services and replicate to an on-prem availability zone (site). For protection from Xi Cloud Services to an on-prem site, the protection policy allows you to add only one Asynchronous replication schedule.

Before you begin

See Asynchronous Replication Requirements (Xi Leap) and Asynchronous Replication Limitations (Xi Leap) before you start.

About this task

To create a protection policy with an Asynchronous replication schedule, perform the following procedure at Xi Cloud Services. You can also create a protection policy at the on-prem site. Protection policies you create or update at the on-prem site synchronize back to Xi Cloud Service.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click the hamburger icon at the top-left corner of the window. Go to Data Protection & Recovery > Protection Policies in the left pane.
  3. Click Create Protection Policy .
    Specify the following information in the Create Protection Policy window.
    Figure. Protection Policy Configuration: Asynchronous
    Click to enlarge Protection Policy Configuration: Asynchronous

    1. Policy name : Enter a name for the protection policy.
      Caution: The name can be of only alphanumeric, dot, dash, and underscore characters.
    2. In the Primary Location pane, specify the following information.
        1. Location : From the drop-down list, check the Xi Cloud Services availability zone (site) that hosts the guests VMs to protect.

          The drop-down lists all the sites paired with the local site. Local AZ represents the local site (Prism Central). For your primary site, you can check either the local site or a non-local site.

        2. Cluster : Xi Cloud Services automatically selects the cluster for you. Therefore the only option available is Auto .

          The drop-down lists all the Nutanix clusters registered to Prism Central representing the selected site. If you want to protect the guest VMs from multiple Nutanix clusters in the same protection policy, check the clusters that host those guest VMs. All Clusters protects the guest VMs of all Nutanix clusters registered to Prism Central.

        3. Click Save .

          Clicking Save activates the Recovery Location pane. After saving the primary site configuration, you can optionally add a local schedule (step iv) to retain the recovery points at the primary site.

        4. Click + Add Local Schedule if you want to retain recovery points locally in addition to retaining recovery points in a replication schedule (step d.iv). For example, you can create a local schedule to retain 15 minute recovery points locally and also an hourly replication schedule to retain recovery points and replicate them to a recovery site every 2 hours. The two schedules apply differently on the guest VMs.

          Specify the following information in the Add Schedule window.

          Figure. Protection Policy Configuration: Add Local Schedule
          Click to enlarge Protection Policy Configuration: Add Local Schedule

          1. Take Snapshot Every : Specify the frequency in minutes , hours , days , or weeks at which you want the recovery points to be taken locally.
          2. Retention Type : Specify one of the following two types of retention policy.
            • Linear : Implements a simple retention scheme at the local site. If you set the retention number to n, the local site retains the n recent recovery points.

              When you enter the frequency in minutes , the system selects the Roll-up retention type by default because minutely recovery points do not support Linear retention types.

            • Roll-up : Rolls up the recovery points into a single recovery point at the local site.

              For more information about the roll-up recovery points, see step d.iii.

          3. Retention on XI-US-EAST-1A-PPD : Auto : Specify the retention number for the local site.
          4. If you want to take application-consistent recovery points, check Take App-Consistent Recovery Point .

            Irrespective of the local or replication schedules, the recovery points are of the specified type. If you check Take App-Consistent Recovery Point , the recovery points generated are application-consistent and if you do not check Take App-Consistent Recovery Point , the recovery points generated are crash-consistent. If the time in the local schedule and the replication schedule match, the single recovery point generated is application-consistent.

            Note: See Application-consistent Recovery Point Conditions and Limitations before you take application-consistent snapshot.
          5. Click Save Schedule .
    3. In the Recovery Location pane, specify the following information.
      Figure. Protection Policy Configuration: Select Recovery Location
      Click to enlarge Protection Policy Configuration: Select Recovery Location

        1. Location : From the drop-down list, select the availability zone (site) where you want to replicate the recovery points.

          The drop-down lists all the sites paired with the Xi Cloud Services. XI-US-EAST-1A-PPD : Auto represents the local site (Prism Central). Do not select XI-US-EAST-1A-PPD : Auto because a duplicate location is not supported in Xi Cloud Services.

          If you do not select a site, local recovery points that are created by the protection policy do not replicate automatically. You can, however, replicate the recovery points manually and use recovery plans to recover the guest VMs. For more information, see Manual Disaster Recovery (Leap).

        2. Cluster : Xi Cloud Services automatically selects the cluster for you. Therefore the only option available is Auto .

          The drop-down lists all the Nutanix clusters registered to Prism Central representing the selected site. If you want to protect the guest VMs from multiple Nutanix clusters in the same protection policy, check the clusters that host those guest VMs. All Clusters protects the guest VMs of all Nutanix clusters registered to Prism Central.

        3. Click Save .

          Clicking Save activates the + Add Schedule button between the primary and the recovery site. After saving the recovery site configuration, you can optionally add a local schedule to retain the recovery points at the recovery site.

        4. Click + Add Local Schedule if you want to retain recovery points locally in addition to retaining recovery points in a replication schedule (step d.iv). For example, you can create a local schedule to retain one hourly recovery points locally to supplement the hourly replication schedule. The two schedules apply differently on the guest VMs after failover, when the recovery points replicate back to the primary site.

          Specify the following information in the Add Schedule window.

          Figure. Protection Policy Configuration: Add Local Schedule
          Click to enlarge Protection Policy Configuration: Add Local Schedule

          1. Take Snapshot Every : Specify the frequency in minutes , hours , days , or weeks at which you want the recovery points to be taken locally.
          2. Retention Type : Specify one of the following two types of retention policy.
            • Linear : Implements a simple retention scheme at the local site. If you set the retention number to n, the local site retains the n recent recovery points.

              When you enter the frequency in minutes , the system selects the Roll-up retention type by default because minutely recovery points do not support Linear retention types.

            • Roll-up : Rolls up the recovery points into a single recovery point at the local site.

              For more information about the roll-up recovery points, see step d.iii.

          3. Retention on PC_xx.xx.xxx:PE_yyy : Specify the retention number for the local site.
          4. If you want to take application-consistent recovery points, check Take App-Consistent Recovery Point .

            Irrespective of the local or replication schedules, the recovery points are of the specified type. If you check Take App-Consistent Recovery Point , the recovery points generated are application-consistent and if you do not check Take App-Consistent Recovery Point , the recovery points generated are crash-consistent. If the time in the local schedule and the replication schedule match, the single recovery point generated is application-consistent.

            Note: See Application-consistent Recovery Point Conditions and Limitations before you take application-consistent snapshot.
          5. Click Save Schedule .
    4. Click + Add Schedule to add a replication schedule between the primary and the recovery site.

      Specify the following information in the Add Schedule window. The window auto-populates the Primary Location and Recovery Location that you have selected in step b and step c.

      Figure. Protection Policy Configuration: Add Schedule (Asynchronous)
      Click to enlarge Protection Policy Configuration: Add Schedule (Asynchronous)

        1. Protection Type : Click Asynchronous .
        2. Take Snapshot Every : Specify the frequency in hours , days , or weeks at which you want the recovery points to be taken.

          The specified frequency is the RPO. For more information about RPO, see Leap Terminology.

        3. Retention Type : Specify one of the following two types of retention policy.
          • Linear : Implements a simple retention scheme at both the primary (local) and the recovery (remote) site. If you set the retention number for a given site to n, that site retains the n recent recovery points. For example, if the RPO is 1 hour, and the retention number for the local site is 48, the local site retains 48 hours (48 X 1 hour) of recovery points at any given time.
            Tip: Use linear retention policies for small RPO windows with shorter retention periods or in cases where you always want to recover to a specific RPO window.
          • Roll-up : Rolls up the recovery points as per the RPO and retention period into a single recovery point at a site. For example, if you set the RPO to 1 hour, and the retention time to 5 days, the 24 oldest hourly recovery points roll up into a single daily recovery point (one recovery point = 24 hourly recovery points) after every 24 hours. The system keeps one day (of rolled-up hourly recovery points) and 4 days of daily recovery points.
            Note:
            • If the retention period is n days, the system keeps 1 day of RPO (rolled-up hourly recovery points) and n-1 days of daily recovery points.
            • If the retention period is n weeks, the system keeps 1 day of RPO, 1 week of daily and n-1 weeks of weekly recovery points.
            • If the retention period is n months, the system keeps 1 day of RPO, 1 week of daily, 1 month of weekly, and n-1 months of monthly recovery points.
            • If the retention period is n years, the system keeps 1 day of RPO, 1 week of daily, 1 month of weekly, and n-1 months of monthly recovery points.
            Note: The recovery points that are used to create a rolled-up recovery point are discarded.
            Tip: Use roll-up retention policies for anything with a longer retention period. Roll-up policies are more flexible and automatically handle recovery point aging/pruning while still providing granular RPOs for the first day.
        4. To specify the retention number for the primary and recovery sites, do the following.
          • Retention on XI-US-EAST-1A-PPD : Auto : Specify the retention number for the primary site.

            This field is unavailable if you do not specify a recovery location.

          • Retention on PC_xx.xx.xx.xxx:PE_yyy : Specify the retention number for the recovery site.

            If you select linear retention, the remote and local retention count represents the number of recovery points to retain at any given time. If you select roll-up retention, these numbers specify the retention period.

        5. If you want to enable reverse retention of the recovery points, check Reverse retention for VMs on recovery location .
          Note: Reverse retention for VMs on recovery location is available only when the retention numbers on the primary and recovery sites are different.

          Reverse retention maintains the retention numbers of recovery points even after failover to a recovery site in the same or different availability zones. For example, if you retain two recovery points at the primary site and three recovery points at the recovery site, and you enable reverse retention, a failover event does not change the initial retention numbers when the recovery points replicate back to the primary site. The recovery site still retains two recovery points while the primary site retains three recovery points. If you do not enable reverse retention, a failover event changes the initial retention numbers when the recovery points replicate back to the primary site. The recovery site retains three recovery points while the primary site retains two recovery points.

          Maintaining the same retention numbers at a recovery site is required if you want to retain a particular number of recovery points, irrespective of where the guest VM is after its failover.

        6. If you want to take application-consistent recovery points, check Take App-Consistent Recovery Point .

          Application-consistent recovery points ensure that application consistency is maintained in the replicated recovery points. For application-consistent recovery points, install NGT on the guest VMs running on AHV clusters. For guest VMs running on ESXi clusters, you can take application-consistent recovery points without installing NGT, but the recovery points are hypervisor-based, and leads to VM stuns (temporary unresponsive VMs) after failover to the recovery sites.

          Note: See Application-consistent Recovery Point Conditions and Limitations before you take application-consistent snapshot.
          Caution: Application-consistent recovery points fail for EFI-boot enabled Windows 2019 VM running on ESXi when NGT is not installed. Nutanix recommends installing NGT on guest VMs running on ESXi also.
        7. Click Save Schedule .
    5. Click Next .
      Clicking Next shows a list of VM categories where you can optionally check one or more VM categories to protect in the protection policy. DR configurations using Leap allows you to protect a guest VM by using only one protection policy. Therefore, VM categories specified in another protection policy are not in the list. If you protect a guest VM in another protection policy by specifying the VM category of the guest VM (category-based inclusion), and if you protect the guest VM from the VMs page in this policy (individual inclusion), the individual inclusion supersedes the category-based inclusion. Effectively, only the protection policy that protected the individual guest VM protects the guest VM.

      For example, the guest VM VM_SherlockH is in the category Department:Admin , and you add this category to the protection policy named PP_AdminVMs . Now, if you add VM_SherlockH from the VMs page to another protection policy named PP_VMs_UK , VM_SherlockH is protected in PP_VMs_UK and unprotected from PP_AdminVMs .

    6. If you want to protect the guest VMs category wise, check the VM categories that you want to protect from the list and click Add .
      Figure. Protection Policy Configuration: Add VM Categories Click to enlarge Protection Policy Configuration: Add VM Categories

      Prism Central includes built-in VM categories for frequently encountered applications (for example, MS Exchange and Oracle). If the VM category or value you want is not available, first create the category with the required values, or update an existing category so that it has the values you require. Doing so ensures that the VM categories and values are available for selection. You can add VMs to the category either before or after you configure the protection policy. If the guest VMs have a common characteristic, such as belonging to a specific application or location, create a VM category and add the guest VMs into the category.

      If you do not want to protect the guest VMs category wise, proceed to the next step without checking VM categories. You can add the guest VMs individually to the protection policy later from the VMs page (see Adding Guest VMs individually to a Protection Policy).

    7. Click Create .
      The protection policy with an Asynchronous replication schedule is created. To verify the protection policy, see the Protection Policies page. You can add VMs individually (without VM categories) to the protection policy or remove VMs from the protection policy. For information about the operations that you can perform on a protection policy, see Protection Policy Management.

Creating a Recovery Plan (Xi Leap)

To orchestrate the failover (disaster recovery) of the protected guest VMs to the recovery site, create a recovery plan. After a failover, a recovery plan recovers the protected guest VMs to the recovery availability zone (site). To create a recovery plan, perform the following procedure at Xi Cloud Services. You can also create a recovery plan at the on-prem site. The recovery plan you create or update at the on-prem site synchronizes back to Xi Cloud Service.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to Recovery Plans in the left pane.
  3. Click Create Recovery Plan .
    Specify the following information in the Create Protection Policy window.
    1. Primary Location : Select the primary site that hosts the guest VMs to protect. This list displays the Local AZ by default and is unavailable for editing.
    2. Recovery Location : Select the on-prem site where you want to replicate the recovery points.
    3. Click Proceed .
    Tip: After you create the recovery plan, you cannot change the Recovery Location from the Recovery Plans page. To change the recovery location on an existing recovery plan, do the following.
    • Update the protection policy to point to the new recovery location. For more information, see Updating a Protection Policy.
    • Configure the network mapping. For more information, see Nutanix Virtual Networks.
    Caution: If all the VMs in the recovery plan do not point to the new recovery location, you get an availability zone conflict alert.
  4. In the General tab, enter Recovery Plan Name , Recovery Plan Description . Click Next .
    Figure. Recovery Plan Configuration: General Click to enlarge Recovery Plan Configuration: General

  5. In the Power On Sequence tab, click + Add Entities to add VMs to the sequence and do the following.
    Figure. Recovery Plan Configuration: Add Entities
    Click to enlarge Recovery Plan Configuration: Adding Entities

    1. In the Search Entities by , select VM Name from the drop-down list to specify VMs by name.
    2. In the Search Entities by , select Category from the drop-down list to specify VMs by category.
    3. To add the VMs or VM categories to the stage, select the VMs or VM categories from the list.
      Note: The VMs listed in the search result are in the active state of replication.
    4. Click Add .
    The selected VMs are added to the sequence. You can also create multiple stages and add VMs to those stages to define their power-on sequence. For more information about stages, see Stage Management.
    Caution: Do not include the guest VMs protected with Asynchronous, NearSync, and Synchronous replication schedules in the same recovery plan. You can include guest VMs protected with Asynchronous or NearSync replication schedules in the same recovery plan. However, if you combine these guest VMs with the guest VMs protected by Synchronous replication schedules in a recovery plan, the recovery fails.
  6. To manage in-guest script execution on guest VMs during recovery, select the individual VMs or VM categories in the stage. Click Manage Scripts and then do the following.
    Note: In-guest scripts allow you to automate various task executions upon recovery of the VMs. For example, in-guest scripts can help automate the tasks in the following scenarios.

    • After recovery, the VMs must use new DNS IP addresses and also connect to a new database server that is already running at the recovery site.

      Traditionally, to achieve this new configuration, you would manually log on to the recovered VM and modify the relevant files. With in-guest scripts, you have to write a script to automate the required steps and enable the script when you configure a recovery plan. The recovery plan execution automatically invokes the script and performs the reassigning of DNS IP address and reconnection to the database server at the recovery site.

    • If VMs are part of domain controller siteA.com at the primary site AZ1 , and after the VMs recover on the site AZ2 , you want to add the recovered VMs to the domain controller siteB.com .

      Traditionally, to reconfigure, you would manually log on to the VM, remove the VM from an existing domain controller, and then add the VM to a new domain controller. With in-guest scripts, you can automate the task of changing the domain controller.

    Note: In-guest script execution requires NGT version 1.9 or newer installed on the VM. The in-guest scripts run as a part of the recovery plan only if they have executable permissions for the following.
    • Administrator user (Windows)
    • Root user (Linux)
    Note: You can have only two in-guest batch or shell scripts—one for production (planned and unplanned failover) while the other for test failover. One script, however, can invoke other scripts. Place the scripts at the following locations in the VMs.
    • In Windows VMs,
      • Batch script file path for production failover:
        C:\Program Files\Nutanix\scripts\production\vm_recovery
      • Batch script file path for test failover:
        C:\Program Files\Nutanix\scripts\test\vm_recovery
    • In Linux VMs,
      • Shell script file path for production failover:
        /usr/local/sbin/production_vm_recovery
      • Shell script file path for test failover:
        /usr/local/sbin/test_vm_recovery
    Note: When an in-guest script runs successfully, it returns code 0 . Error code 1 signifies that the execution of the in-guest script was unsuccessful.
    Figure. Recovery Plan Configuration: In-guest Script Execution
    Click to enlarge Recovery Plan Configuration: In-guest Script execution

    1. To enable script execution, click Enable .
      A command prompt icon appears against the VMs or VM categories to indicate that in-guest script execution is enabled on those VMs or VM categories.
    2. To disable script execution, click Disable .
  7. In the Network Settings tab, map networks in the primary cluster to networks at the recovery cluster.
    Figure. Recovery Plan Configuration: Network Settings
    Click to enlarge Recovery Plan Configuration: Network Mapping

    Network mapping enables replicating the network configurations of the primary clusters to the recovery clusters, and recover VMs into the same subnet at the recovery cluster. For example, if a VM is in the vlan0 subnet at the primary cluster, you can configure the network mapping to recover that VM in the same vlan0 subnet at the recovery cluster. To specify the source and destination network information for a network mapping, do the following in Local AZ (Primary) and PC 10.51.1xx.xxx (Recovery) .
    1. Under Production in Virtual Network or Port Group , select the production subnet that contains the protected VMs for which you are configuring a recovery plan. (Optional) If the virtual network is a non-IPAM network, specify the gateway IP address and prefix length in the Gateway IP/Prefix Length field.
    2. Under Test Failback in Virtual Network or Port Group , select the test subnet that you want to use for testing failback from the recovery cluster. (Optional) If the virtual network is a non-IPAM network, specify the gateway IP address and prefix length in the Gateway IP/Prefix Length field.
    3. To add a network mapping, click Add Networks at the top-right corner of the page, and then repeat the steps 7.a-7.b.
      Note: The primary and recovery Nutanix clusters must have identical gateway IP addresses and prefix length. Therefore you cannot use a test failover network for two or more network mappings in the same recovery plan.
    4. Click Done .
    Note: For ESXi, you can configure network mapping for both standard and distributed (DVS) port groups. For more information about DVS, see VMware documentation.
    Caution: Leap does not support VMware NSX-T datacenters. For more information about NSX-T datacenters, see VMware documentation.
  8. If you want to enable the VMs in the production VPC to access the Internet, enable Outbound Internet Access .
  9. To assign floating IP addresses to the VMs when they are running in Xi Cloud Services, click + Floating IPs in Floating IPs section and do the following.
    Figure. Recovery Plan Configuration: Assign Floating IP Address
    Click to enlarge Recovery Plan Configuration: Assign Floating IP Addresses

    1. In the NUMBER OF FLOATING IPS , enter the number of floating IP addresses you need for assigning to VMs.
    2. In the ASSIGN FLOATING IPS TO VMS (OPTIONAL) , enter the name of the VMs and select the IP address for it.
    3. In Actions , click Save .
    4. To assign a floating IP address to another VM, click + Assign Floating IP , and then repeat the steps for assigning a floating IP address.
  10. Click Done .

    The recovery plan is created. To verify the recovery plan, see the Recovery Plans page. You can modify the recovery plan to change the recovery location, add, or remove the protected guest VMs. For information about various operations that you can perform on a recovery plan, see Recovery Plan Management.

Failover and Failback Operations (Xi Leap)

You perform failover of the protected guest VMs when unplanned failure events (for example, natural disasters) or planned events (for example, scheduled maintenance) happen at the primary availability zone (site) or the primary cluster. The protected guest VMs migrate to the recovery site where you perform the failover operations. On recovery, the protected guest VMs start in the Xi Cloud Services region you specify in the recovery plan that orchestrates the failover.

The following are the types of failover operations in Xi Leap.

Test Failover
To ensure that the protected guest VMs failover efficiently to the recovery site, you perform a test failover. When you perform a test failover, the guest VMs recover in the virtual network designated for testing purposes at the recovery site (a manually created virtual subnet in the test VPC in Xi Cloud Services). However, the guest VMs at the primary site are not affected. Test failovers rely on the presence of VM recovery points at the recovery sites.
Planned Failover
To ensure VM availability when you foresee service disruption at the primary site, you perform a planned failover to the recovery site. For a planned failover to succeed, the guest VMs must be available at the primary site. When you perform a planned failover, the recovery plan first creates a recovery point of the protected guest VM, replicates the recovery point to the recovery site, and then starts the guest VM at the recovery site. The recovery point used for migration is retained indefinitely. After a planned failover, the guest VMs no longer run at the primary site. After a planned failover, the VMs no longer run at the primary site.
Unplanned Failover
To ensure VM availability when a disaster causing service disruption occurs at the primary site, you perform an unplanned failover to the recovery site. In an unplanned failover, you can expect some data loss to occur. The maximum data loss possible is equal to the least RPO you specify in the protection policy, or the data that was generated after the last manual recovery point for a given guest VM. In an unplanned failover, by default, the protected guest VMs recover from the most recent recovery point. However, you can recover from an earlier recovery point by selecting a date and time of the recovery point.

After the failover, replication begins in the reverse direction. You can perform an unplanned failover operation only if recovery points have replicated to the recovery cluster. At the recovery site, failover operations cannot use recovery points that were created locally in the past. For example, if you perform an unplanned failover from the primary site AZ1 to recovery site AZ2 in Xi Cloud Services and then attempt an unplanned failover (failback) from AZ2 to AZ1 , the recovery succeeds at AZ1 only if the recovery points are replicated from AZ2 to AZ1 after the unplanned failover operation. The unplanned failover operation cannot perform recovery based on the recovery points that were created locally when the VMs were running in AZ1 .

The procedure for performing a planned failover is the same as the procedure for performing an unplanned failover. You can perform a failover even in different scenarios of network failure. For more information about network failure scenarios, see Leap and Xi Leap Failover Scenarios.

Performing a Test Failover (Xi Leap)

After you create a recovery plan, you can run a test failover periodically to ensure that the failover occurs smoothly when required. You can perform the test failover from Xi Cloud Services.

About this task

To perform a test failover to Xi Cloud Services, do the following.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to Recovery Plans in the left pane.
  3. Select the recovery plan that you want to test.
  4. Click Test from the Actions drop-down menu.
  5. In the Test Recovery Plan dialog box, do the following.
    1. In Primary Location , select the primary availability zone (site).
    2. In Recovery Location , select the recovery availability zone.
    3. Click Test .
    If you get errors or warnings, see the failure report that is displayed. Click the report to review the errors and warnings. Resolve the error conditions and then restart the test procedure.
  6. Click Close .
Cleaning up Test VMs (Xi Leap)

After testing a recovery plan, you can remove the test VMs that the recovery plan created in the recovery test network on Xi Cloud Services. To clean up the test VMs created when you test a recovery plan, do the following.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to Recovery Plans in the left pane.
  3. Click the recovery plans whose VMs you want to remove.
  4. Click Clean Up Test VMs from the Actions drop-down menu.
  5. In the Clean Up Test VMs dialog box, click Clean .
    Test VMs are deleted. If you get errors or warnings, see the failure report that is displayed. Click the report to review the errors and warnings. Resolve the error conditions and then restart the test procedure.
Performing a Planned Failover (Xi Leap)

Perform a planned failover at the recovery site. To perform a planned failover to Xi Cloud Services, do the following procedure.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to Recovery Plans in the left pane.
    Figure. Planned Failover
    Click to enlarge Planned Failover

  3. Select a recovery plan for the failover operation.
  4. Click Failover from the Actions drop-down menu. Specify the following information in the Failover from Recovery Plan dialog box.
    Note: The Failover action is available only when all the selected recovery plans have the same primary and recovery locations.
    Figure. Planned Failover
    Click to enlarge Planned Failover

    1. Failover Type : Click Planned Failover .
    2. Failover From (Primary) : Select the protected primary cluster.
    3. Failover To (Recovery) : Select the recovery cluster where you want the VMs to failover. This list displays Local AZ by default and is unavailable for editing.
    Note: Click + to add more combinations of primary and recovery clusters. You can add as many primary clusters as there are in the selected recovery plan.
  5. Click Failover .
    The Failover from Recovery Plan dialog box lists the errors and warnings, if any, and allows you to stop or continue the failover operation.
  6. If you see errors, do the following.
    1. To review errors or warnings, click View Details in the description.
    2. Click Cancel to return to the Failover from Recovery Plan dialog box.
    3. Select one of the following.
      • To stop the failover operation, click Abort .
      • To continue the failover operation despite the warnings, click Execute Anyway .
      You cannot continue the failover operation when the validation fails with errors.
    Note:

    The entities of AHV/ESXi clusters recover at a different path on the ESXi clusters if their files conflict with the existing files on the recovery ESXi cluster. For example, there is a file name conflict if a VM (VM1) migrates to a recovery cluster that already has a VM (VM1) in the same container.

    However, the entities recover at a different path with VmRecoveredAtAlternatePath alert only if the following conditions are met.

    • Both the primary and the recovery clusters (Prism Elements) are of version 5.17 or newer.
    • A path for the entity recovery is not defined while initiating the failover operation.
    • The protected entities do not have shared disk/s.

    If these conditions are not satisfied, the failover operation fails.

Performing an Unplanned Failover (Xi Leap)

Perform an unplanned failover at the recovery site. To perform an unplanned failover to Xi Cloud Services, do the following procedure.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to Recovery Plans in the left pane.
  3. Select a recovery plan for the failover operation.
  4. Click Failover from the Actions drop-down menu. Specify the following information in the Failover from Recovery Plan dialog box.
    Note: The Failover action is available only when all the selected recovery plans have the same primary and recovery locations.
    Figure. Unplanned Failover
    Click to enlarge Unplanned Failover

    1. Failover Type : Click Unplanned Failover and do one of the following.
      • Click Recover from latest Recovery Point to use the latest recovery point for recovery.
      • Click Recover from specific point in time to use a recovery point taken at a specific point in time for recovery.
    2. Failover From (Primary) : Select the protected primary cluster.
    3. Failover To (Recovery) : Select the recovery cluster where you want the VMs to failover. This list displays Local AZ by default and is unavailable for editing.
    Note: Click + to add more combinations of primary and recovery clusters. You can add as many primary clusters as there are in the selected recovery plan.
    Note: If recovery plans contain VM categories, the VMs from those categories recover in the same category after an unplanned failover to the recovery site. Also, the recovery points keep generating at the recovery site for those recovered VMs. Since the VM count represents the number of recoverable VMs (calculated from recovery points), the recovered VMs and their newly created recovery points sum up. Their sum gives double the count of the originally recovered VMs on the recovery plans page. Now, if some VMs belonging to the given category at the primary or recovery site are deleted, the VM count at both sites still stay the same until the recovery points of deleted VMs expire. For example, when two VMs have failed over, the recovery plans page at the recovery site shows four VMs (two replicated recovery points from source and two newly generated recovery points). The page shows four VMs even if the VMs are deleted from the primary or recovery site. The VM count synchronizes and becomes consistent in the subsequent RPO cycle conforming to the retention policy set in the protection policy (due to the expiration of recovery points).
  5. Click Failover .
    The Failover from Recovery Plan dialog box lists the errors and warnings, if any, and allows you to stop or continue the failover operation.
  6. If you see errors, do the following.
    1. To review errors or warnings, click View Details in the description.
    2. Click Cancel to return to the Failover from Recovery Plan dialog box.
    3. Select one of the following.
      • To stop the failover operation, click Abort .
      • To continue the failover operation despite the warnings, click Execute Anyway .
      Note: You cannot continue the failover operation when the validation fails with errors.
      Note:

      The entities of AHV/ESXi clusters recover at a different path on the ESXi clusters if their files conflict with the existing files on the recovery ESXi cluster. For example, there is a file name conflict if a VM (VM1) migrates to a recovery cluster that already has a VM (VM1) in the same container.

      However, the entities recover at a different path with VmRecoveredAtAlternatePath alert only if the following conditions are met.

      • Both the primary and the recovery clusters (Prism Elements) are of version 5.17 or newer.
      • A path for the entity recovery is not defined while initiating the failover operation.
      • The protected entities do not have shared disk/s.

      If these conditions are not satisfied, the failover operation fails.

    Note: To avoid conflicts when the primary site becomes active after the failover, shut down the guest VMs associated with this recovery plan. Manually power off the guest VMs on either primary or recovery site after the failover is complete. You can also block the guest VMs associated with this recovery plan through the firewall.

Performing Failback (Xi Leap)

A failback is similar to a failover but in the reverse. The same recovery plan applies to both the failover and the failback operations. Therefore, how you perform a failback is identical to how you perform a failover. Log on to the site where you want the VMs to failback, and then perform a failover. For example, if you failed over VMs from an on-prem site to Xi Cloud Services, to failback to the on-prem site, perform the failover from the on-prem site.

About this task

To perform a failback, do the following procedure at the primary site.

Procedure

  1. Log on to the Prism Central web console.
  2. Click the hamburger icon at the top-left corner of the window. Go to Policies > Recovery Plans in the left pane.
  3. Select a recovery plan for the failover operation.
  4. Click Failover from the Actions drop-down menu.
    Note: If you select more than one recovery plan in step 3, the Failover action is available only when the selected recovery plans have the same primary and recovery locations.
    Specify the following information in the Failover from Recovery Plan window. The window auto-populates the Failover From and Failover To locations from the recovery plan you select in step 3.
    Figure. Unplanned Failover
    Click to enlarge Unplanned Failover

    1. Failover Type : Click Unplanned Failover and do one of the following.
      Tip: You can also click Planned Failover to perform planned failover procedure for a failback.
      • Click Recover from latest Recovery Point to use the latest recovery point for recovery.
      • Click Recover from specific point in time to use a recovery point taken at a specific point in time for recovery.
    2. Click + Add target clusters if you want to failover to specific Nutanix clusters at the primary site.
      If you do not add target clusters, the recovery plan recovers the guest VMs to any eligible cluster at the primary site.
    Note: If recovery plans contain VM categories, the VMs from those categories recover in the same category after an unplanned failover to the recovery site. Also, the recovery points keep generating at the recovery site for those recovered VMs. Since the VM count represents the number of recoverable VMs (calculated from recovery points), the recovered VMs and their newly created recovery points sum up. Their sum gives double the count of the originally recovered VMs on the recovery plans page. Now, if some VMs belonging to the given category at the primary or recovery site are deleted, the VM count at both sites still stay the same until the recovery points of deleted VMs expire. For example, when two VMs have failed over, the recovery plans page at the recovery site shows four VMs (two replicated recovery points from source and two newly generated recovery points). The page shows four VMs even if the VMs are deleted from the primary or recovery site. The VM count synchronizes and becomes consistent in the subsequent RPO cycle conforming to the retention policy set in the protection policy (due to the expiration of recovery points).
  5. Click Failover .
    The Failover from Recovery Plan dialog box lists the errors and warnings, if any, and allows you to stop or continue the failover operation. If there are no errors or you resolve the errors in step 6, the guest VMs failover to the recovery cluster.
  6. If you see errors, do the following.
    • To review errors or warnings, click View Details in the description.

      Resolve the error conditions and then restart the failover procedure.

    • Select one of the following.
      • To stop the failover operation, click Abort .
      • To continue the failover operation despite the warnings, click Execute Anyway .
        Note: You cannot continue the failover operation when the validation fails with errors.
    Note:

    The entities of AHV/ESXi clusters recover at a different path on the ESXi clusters if their files conflict with the existing files on the recovery ESXi cluster. For example, there is a file name conflict if a VM (VM1) migrates to a recovery cluster that already has a VM (VM1) in the same container.

    However, the entities recover at a different path with VmRecoveredAtAlternatePath alert only if the following conditions are met.

    • Prism Element running on both the primary and the recovery Nutanix clusters are of version 5.17 or newer.
    • A path for the entity recovery is not defined while initiating the failover operation.
    • The protected entities do not have shared disk/s.

    If these conditions are not satisfied, the failover operation fails.

    Note: To avoid conflicts when the primary site becomes active after the failover, shut down the guest VMs associated with this recovery plan. Manually power off the guest VMs on either primary or recovery site after the failover is complete. You can also block the guest VMs associated with this recovery plan through the firewall.

Monitoring a Failover Operation (Xi Leap)

After you trigger a failover operation, you can monitor failover-related tasks. To monitor a failover, do the following.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to Recovery Plans in the left pane.
  3. Click the name of the recovery plan for which you triggered failover.
  4. Click the Tasks tab.
    The left pane displays the overall status. The table in the details pane lists all the running tasks and their individual statuses.

UEFI and Secure Boot Support for CHDR

Nutanix supports CHDR migrations of guest VMs having UEFI and Secure Boot.

Table 1. Nutanix Software - Minimum Requirements
Nutanix Software Minimum Supported Version
Minimum AOS 5.19.1
Minimum PC pc.2021.1
Minimum NGT 2.1.1
Table 2. Applications and Operating Systems Requirements - UEFI
Operating Systems Versions
Microsoft Windows
  • Microsoft Windows 10
  • Microsoft Windows Server 2016
  • Microsoft Windows Server 2019
Linux
  • CentOS Linux 7.3
  • Ubuntu 18.04
  • Red Hat Enterprise Linux Server versions 7.1 and 7.7
Table 3. Applications and Operating Systems Requirements - Secure Boot
Operating Systems Versions
Microsoft Windows
  • Microsoft Windows Server 2016
  • Microsoft Windows Server 2019
Linux
  • CentOS Linux 7.3
  • Red Hat Enterprise Linux Server versions 7.7
Table 4. Recovery Limitations
System Configuration Limitation

Microsoft Windows Defender Credential Guard

VMs which have Credential Guard enabled cannot be recovered with CHDR recovery solution.

IDE + Secure Boot

VMs on ESXi which have IDE Disks or CD-ROM and Secure Boot enabled cannot be recovered on AHV.

UEFI VMs on CentOS 7.4, CentOS 7.5 and Ubuntu 16.04 may fail to boot after CHDR migration.

CentOS 7.4, CentOS 7.5 and Ubuntu 16.04 UEFI VMs do not boot after cross-hypervisor disaster recovery migrations.

See KB-10633 for more information about this limitation. Contact Nutanix Support for assistance with this limitation.

UEFI VM may fail to boot after failback.

When a UEFI VM is booted on AHV for the first time, UEFI firmware settings of the VM are initialized. The next step is to perform a guest reboot or guest shutdown to fully flush the settings into persistent storage in the NVRAM.

If this UEFI VM is failed over to an ESXi host without performing the guest reboot/shutdown, the UEFI settings of the VM remain partial. Although the VM boots on ESXi, it fails to boot on AHV when a failback is performed.

See KB-10631 for more information about this limitation. Contact Nutanix Support for assistance with this limitation.

Protection with NearSync Replication and DR (Xi Leap)

NearSync replication enables you to protect your data with an RPO of as low as 1 minute. You can configure a protection policy with NearSync replication by defining the VMs or VM categories. The policy creates a recovery point of the VMs in minutes (1–15 minutes) and replicates it to Xi Cloud Services. You can configure disaster recovery with Asynchronous replication between an on-prem AHV or ESXi clusters and Xi Cloud Services. You can also perform cross-hypervisor disaster recovery (CHDR)—disaster recovery of VMs from AHV clusters to ESXi clusters or of VMs from ESXi clusters to AHV clusters.

Note: Nutanix provides multiple disaster recovery (DR) solutions to secure your environment. See Nutanix Disaster Recovery Solutions for the detailed representation of the DR offerings of Nutanix.

The following are the advantages of NearSync replication.

  • Protection for the mission-critical applications. Securing your data with minimal data loss if there is a disaster, and providing you with more granular control during the recovery process.
  • No minimum network latency or distance requirements.
    Note: However, a maximum of 75 ms network latency is allowed for replication between an AHV cluster and Xi Cloud Services.
  • Low stun time for VMs with heavy I/O applications.

    Stun time is the time of application freeze when the recovery point is taken.

  • Allows resolution to a disaster event in minutes.

To implement the NearSync feature, Nutanix has introduced a technology called lightweight snapshots (LWSs). LWS recovery points are created at the metadata level only, and they continuously replicate incoming data generated by workloads running on the active clusters. LWS recovery points are stored in the LWS store, which is allocated on the SSD tier. When you configure a protection policy with NearSync replication, the system allocates the LWS store automatically.

Note: The maximum LWS store allocation for each node is 360 GB. For the hybrid systems, it is 7% of the SSD capacity on that node.

Transitioning in and out of NearSync

When you configure a protection policy with NearSync replication, the policy remains in an hourly schedule until its transition into NearSync is complete.

To transition into NearSync, initial seeding of the recovery site with the data is performed, the recovery points are taken on an hourly basis, and replicated to the recovery site. After the system determines that the recovery points containing the seeding data have replicated within a specified amount of time (default is an hour), the system automatically transitions the protection policy into NearSync depending on the bandwidth and the change rate. After you transition into NearSync, you can see the configured NearSync recovery points in the web interface.

The following are the characteristics of the process.

  • Until you are transitioned into NearSync, you can see only the hourly recovery points in Prism Central.
  • If for any reason, a VM transitions out of NearSync, the system raises alerts in the Alerts dashboard, and the protection policy transitions out to the hourly schedule. The system continuously tries to get to the NearSync schedule that you have configured. If the transition is successful, the protection policy automatically transitions back into NearSync, and alerts specific to this condition are raised in the Alerts dashboard.

To transition out of NearSync, you can do one of the following.

  • Delete the protection policy with NearSync replication that you have configured.
  • Update the protection policy with NearSync replication to use an hourly RPO.
  • Unprotect the VMs.
    Note: There is no transitioning out of the protection policy with NearSync replication on the addition or deletion of a VM.

Repeated transitioning in and out of NearSync can occur because of the following reasons.

  • LWS store usage is high.
  • The change rate of data is high for the available bandwidth between the primary and the recovery sites.
  • Internal processing of LWS recovery points is taking more time because the system is overloaded.

Retention Policy

Depending on the RPO (1–15 minutes), the system retains the recovery points for a specific amount of time. For protection policy with NearSync replication, you can configure the retention policy for days, weeks, or months on both the primary and recovery sites instead of defining the number of recovery points you want to retain. For example, if you desire an RPO of 1 minute and want to retain the recovery points for 5 days, the following retention policy is applied.

  • For every 1 minute, a recovery point is created and retained for a maximum of 15 minutes.
    Note: The recent 15 recovery points are only visible in Prism Central and are available for the restore operation.
  • For every hour, a recovery point is created and retained for 6 hours.
  • One daily recovery point is created and retained for 5 days.

You can also define recovery point retention in weeks or months. For example, if you configure a 3-month schedule, the following retention policy is applied.

  • For every 1 minute, a recovery point is created and retained for 15 minutes.
  • For every hour, a recovery point is created and retained for 6 hours.
  • One daily recovery point is created and retained for 7 days.
  • One weekly recovery point is created and retained for 4 weeks.
  • One monthly recovery point is created and retained for 3 months.
Note:
  • You can define different retention policies on the primary and recovery sites.
  • The system retains subhourly and hourly recovery points for 15 minutes and 6 hours respectively. Maximum retention time for days, weeks, and months is 7 days, 4 weeks, and 12 months respectively.
  • If you change the protection policy configuration from hourly schedule to minutely schedule (Asynchronous to NearSync), the first recovery point is not created according to the new schedule. The recovery points are created according to the start time of the old hourly schedule (Asynchronous). If you want to get the maximum retention for the first recovery point after modifying the schedule, update the start time accordingly for NearSync.

NearSync Replication Requirements (Xi Leap)

The following are the specific requirements of configuring protection policies with NearSync replication schedule in Xi Leap. Ensure that you meet the following requirements in addition to the general requirements of Xi Leap.

For more information about the general requirements of Xi Leap, see Xi Leap Requirements.

For information about the on-prem node, disk and Foundation configurations required to support NearSync replication schedules, see On-Prem Hardware Resource Requirements.

Hypervisor Requirements

AHV or ESXi clusters running AOS 5.17 or newer, each registered to a different Prism Central

  • The on-prem AHV clusters must be running on version 20190916.189 or newer.
  • The on-prem ESXi clusters must be running on version ESXi 6.5 GA or newer.

Nutanix Software Requirements

The on-prem Prism Central and its registered clusters (Prism Elements) must be running the following versions of AOS.

  • AOS 5.17 or newer with AHV.
  • AOS 5.17 or newer with ESXi.

Cross Hypervisor Disaster Recovery (CHDR) Requirements

Data Protection with NearSync replication supports cross-hypervisor disaster recovery. You can configure disaster recovery to recover VMs from AHV clusters to ESXi clusters or VMs from ESXi clusters to AHV clusters by considering the following requirement of CHDR.

  • The on-prem clusters are running AOS 5.18 or newer.
  • Install and configure Nutanix Guest Tools (NGT) on all the guest VMs. For more information, see Enabling and Mounting Nutanix Guest Tools in Prism Web Console Guide .

    NGT configures the guest VMs with all the required drivers for VM portability. For more information about general NGT requirements, see Nutanix Guest Tools Requirements and Limitations in Prism Web Console Guide .

  • CHDR supports guest VMs with flat files only.
  • CHDR supports IDE/SCSI disks only.
    Tip: From AOS 5.19.1, CHDR supports SATA disks also.
  • For all the non-boot SCSI disks of Windows guest VMs, set the SAN policy to OnlineAll so that they come online automatically.
  • In vSphere 6.7, guest VMs are configured with UEFI secure boot by default. Upon CHDR to an AHV cluster, these guest VMs do not start if the host does not support the UEFI secure boot feature. For more information about supportability of UEFI secure boot on Nutanix clusters, see Compatibility Matrix.

  • For information about operating systems that support UEFI and Secure Boot, see UEFI and Secure Boot Support for CHDR.

  • Nutanix does not support vSphere inventory mapping (for example, VM folder and resource pools) when protecting workloads between VMware clusters.

  • Nutanix does not support vSphere snapshots or delta disk files. If you have delta disks attached to a VM and you proceed with failover, you get a validation warning and the VM does not recover. Contact Nutanix Support for assistance.
Note: CHDR does not preserve hypervisor-specific properties (for example, multi-writer flags, independent persistent and non-persistent disks, changed block tracking (CBT), PVSCSI disk configurations).
Note: In vSphere 6.7, guest VMs are configured with EFI secure boot by default. Upon CHDR to AHV, these guest VMs will not start if the host does not support the UEFI secure boot feature. For more information about the supportability of UEFI secure boot on Nutanix clusters, see https://portal.nutanix.com/page/documents/compatibility-interoperability-matrix/guestos.

Table 1. Operating System Supported for CHDR
Operating System Version Requirements and Limitations
Windows
  • Windows 2008 R2 or newer versions
  • Windows 7 or newer versions
  • Only 64-bit operating systems are supported.
Linux
  • CentOS 6.5 and 7.0
  • RHEL 6.5 or newer and RHEL 7.0 or newer.
  • Oracle Linux 6.5 and 7.0
  • Ubuntu 14.04
  • SLES operating system is not supported.

Additional Requirements

  • Both the primary and the recovery clusters must be of minimum three-nodes.
  • See On-Prem Hardware Resource Requirements for the on-prem hardware and Foundation configurations required to support NearSync replication schedules.
  • Set the virtual IP address and the data services IP address in the primary and the recovery clusters.
  • The recovery site container must have as much space as the protected VMs working size set of the primary site. For example, if you are protecting a VM that is using 30 GB of space on the container of the primary site, the same amount of space is required on the recovery site container.
  • The bandwidth between the two sites must be approximately equal to or higher than the change rate of the protected VMs (maximum change rate is 20 MBps).

NearSync Replication Limitations (Xi Leap)

The following are the specific limitations of data protection with NearSync replication in Xi Leap. These limitations are in addition to the general limitations of Leap.

For information about the general limitations of Leap, see Xi Leap Limitations.

  • Deduplication enabled on storage containers having VMs protected with NearSync lowers the replication speed.
  • All files associated with the VMs running on ESXi must be located in the same folder as the VMX configuration file. The files not located in the same folder as the VMX configuration file might not recover on a recovery cluster. On recovery, the VM with such files fails to start with the following error message. Operation failed: InternalTaskCreationFailure: Error creating host specific VM change power state task. Error: NoCompatibleHost: No host is compatible with the virtual machine
  • In CHDR, NearSync replication does not support retrieving recovery points from the recovery sites.

    For example, if you have 1 day retention at the primary site and 5 days retention at the recovery site, and you want to go back to a recovery point from 5 days ago. NearSync does not support replicating 5 days retention back from the recovery site to the primary site.

Creating a Protection Policy with NearSync Replication Schedule (Xi Leap)

Create a NearSync protection policy in the primary site Prism Central. The policy schedules recovery points of the protected VMs as per the set RPO and replicates them to Xi Cloud Services for availability. When creating a protection policy, you can specify only VM categories. If you want to include VMs individually, you must first create the protection policy—which can also include VM categories and then include the VMs individually in the protection policy from the VMs page.

Before you begin

Ensure that the AHV or ESXi clusters on both the primary and recovery site are NearSync capable. A cluster is NearSync capable if the capacity of each SSD in the cluster is at least 1.2 TB.

See NearSync Replication Requirements (Xi Leap) and NearSync Replication Limitations (Xi Leap) before you start.

About this task

To create a protection policy with NearSync replication in Xi Cloud Services, perform the following procedure.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to Protection Policies in the left pane.
  3. Click Create Protection Policy .
    Specify the following information in the Create Protection Policy window.
    Figure. Protection Policy Configuration: NearSync
    Click to enlarge Protection Policy Configuration: NearSync

    1. Name : Enter a name for the policy.
      Caution: The name can be of only alphanumeric, dot, dash, and underscore characters.
    2. Primary Location : Select the primary availability zone that hosts the VMs to protect. This list displays the Local AZ by default and is unavailable for editing.
    3. Primary Cluster(s) : Select the cluster that hosts the VMs to protect.
    4. Recovery Location : Select the recovery availability zone where you want to replicate the recovery points.
      If you do not select a recovery location, the local recovery points that are created by this protection policy do not replicate automatically. You can, however, replicate recovery points manually and use recovery plans to recover the VMs. For more information, see Manual Disaster Recovery (Xi Leap).
    5. Target Cluster : Select the NearSync capable cluster where you want to replicate the recovery points.
      This field becomes available only if the recovery location is a physical remote site. If the specified recovery location is an availability zone in Xi Cloud Services, the Target Cluster field becomes unavailable because Xi Cloud Services selects a cluster for you. If the specified recovery location is a physical location, you can select a cluster of your choice.
      Caution: If the primary cluster contains an IBM Power Systems server, you cannot replicate recovery points to Xi Cloud Services. However, you can replicate recovery points to the on-prem target cluster if the target on-prem cluster also contains an IBM Power Systems server.
      Caution: Select auto-select from the drop-down list only if all the clusters at the recovery site are NearSync capable.

    6. Policy Type : Click Asynchronous .
    7. Recovery Point Objective : Specify the frequency in minutes (anywhere between 1-15 minutes) at which you want recovery points to be taken.

      By default, recovery point creation begins immediately after you create the protection policy. If you want to specify when recovery point creation must begin, click Change , and then, in the Start Time dialog box, do the following.

      Click Start from specific point in time.

      In the time picker, specify the time at which you want to start taking recovery points.

      Click Save .

      Tip: NearSync also allows you to recover the data of the minute just before the unplanned failover. For example, on a protection policy with 10 minute RPO, you can use the internal lightweight snapshots (LWS) to recover the data of the 9th minute when there is an unplanned failover.
    8. Retention Policy : Specify the type of retention policy.
      Figure. Roll-up Retention Policy Click to enlarge Roll-up Retention Policy

      • Roll-up : Rolls up the recovery points as per the RPO and retention period into a single recovery point at a site. For example, if you set the RPO to 1 hour, and the retention time to 5 days, the 24 oldest hourly recovery points roll up into a single daily recovery point (one recovery point = 24 hourly recovery points) after every 24 hours. The system keeps one day (of rolled-up hourly recovery points) and 4 days of daily recovery points.
        Note:
        • If the retention period is n days, the system keeps 1 day of RPO (rolled-up hourly recovery points) and n-1 days of daily recovery points.
        • If the retention period is n weeks, the system keeps 1 day of RPO, 1 week of daily and n-1 weeks of weekly recovery points.
        • If the retention period is n months, the system keeps 1 day of RPO, 1 week of daily, 1 month of weekly, and n-1 months of monthly recovery points.
        • If the retention period is n years, the system keeps 1 day of RPO, 1 week of daily, 1 month of weekly, and n-1 months of monthly recovery points.
        Note: The recovery points that are used to create a rolled-up recovery point are discarded.
        Tip: Use roll-up retention policies for anything with a longer retention period. Roll-up policies are more flexible and automatically handle recovery point aging/pruning while still providing granular RPOs for the first day.
        Note: NearSync does not support Linear retention policies. When you enter a minutely time unit in the Recovery Point Objective , the Roll-up retention policy is automatically selected.
  4. To specify the retention number for the sites, do the following.
    1. Remote Retention : Specify the retention number for the remote site.
      This field is unavailable if you do not specify a recovery location.
    2. Local Retention : Specify the retention number for the local site.

      If you select linear retention, the remote and local retention count represents the number of recovery points to retain at any given time. If you select roll-up retention, these numbers specify the retention period.

  5. If you want to take application consistent recovery points, select Take App-Consistent Recovery Point .
    Application consistent recovery points ensure that application consistency is maintained in the replicated recovery points. For application-consistent recovery points, install NGT on the VMs running on AHV. For VMs running on ESXi, you can take application-consistent recovery points without installing NGT, but the recovery points are hypervisor-based and lead to VM stuns (temporary unresponsive VMs).
    Caution: Application-consistent recovery points fail for EFI-boot enabled Windows 2019 VM running on ESXi when NGT is not installed. Nutanix recommends installing NGT on VMs running on ESXi also.
  6. Associated Categories : To protect categories of VMs, perform the following.
    Tip: Before associating VM categories to a protection policy, determine how you want to identify the VMs you want to protect. If they have a common characteristic (for example, the VMs belong to a specific application or location), check the Categories page to ensure that both the category and the required value are available. Prism Central includes built-in categories for frequently encountered applications such as MS Exchange and Oracle. You can also create your custom categories. If the category or value you want is not available, first create the category with the required values, or update an existing category so that it has the values that you require. Doing so ensures that the categories and values are available for selection when creating the protection policy. You can add VMs to the category either before or after you configure the protection policy. For more information about VM categories, see Category Management in the Prism Central Guide .
    1. Click Add Categories .
    2. Select the VM categories from the list to add to the protection policy.
      Note:

      You cannot protect a VM by using two or more protection policies. Therefore, VM categories specified in another protection policy are not listed here. Also, if you included a VM in another protection policy by specifying the category to which it belongs (category-based inclusion), and if you add the VM to this policy by using its name (individual inclusion), the individual inclusion supersedes the category-based inclusion. Effectively, the VM is protected only by this protection policy and not by the protection policy in which its category is specified.

      For example, the guest VM VM_SherlockH is in the category Department:Admin , and you add this category to the protection policy named PP_AdminVMs . Now, if you add VM_SherlockH from the VMs page to another protection policy named PP_VMs_UK , VM_SherlockH is protected in PP_VMs_UK and unprotected from PP_AdminVMs .

    3. Click Save .
    Tip: To add or remove categories from the existing protection policy, click Update .
  7. Click Save .
    You have successfully created a protection policy with NearSync replication in Xi Leap. You can add VMs individually (without VM categories) to the protection policy or remove VMs from the protection policy. For information about the operations that you can perform on a protection policy, see Protection Policy Management.

Creating a Recovery Plan (Xi Leap)

Create a recovery plan in the primary Prism Central. The procedure for creating a recovery plan is the same for all the data protection strategies in Xi Leap.

For more information about creating a recovery plan in Xi Leap, see Creating a Recovery Plan (Xi Leap).

Protection Policy Management

A protection policy automates the creation and replication of recovery points. When configuring a protection policy for creating local recovery points, you specify the RPO, retention policy, and the VMs that you want to protect. You also specify the recovery location if you want to automate recovery point replication to Xi Cloud Services.

When you create, update, or delete a protection policy, it synchronizes to the paired Xi Cloud Services. The recovery points automatically start replicating in the reverse direction after you perform a failover at the recovery Xi Cloud Services. For information about how Xi Leap determines the list of availability zones for synchronization, see Entity Synchronization Between Paired Availability Zones.

Note: A VM cannot be simultaneously protected by a protection domain and a protection policy. If you want to use a protection policy to protect a VM that is part of a protection domain, first remove the VM from the protection domain, and then include it in the protection policy. For information, see Migrating Guest VMs from a Protection Domain to a Protection Policy

Adding Guest VMs individually to a Protection Policy

You can also add VMs directly to a protection policy from the VMs page, without the use of a VM category. To add VMs directly to a protection policy in Xi Cloud Services, perform the following procedure.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to VMs in the left pane.
  3. Click Protect from the Actions drop-down menu.
    Figure. Protect VMs Individually
    Click to enlarge Protect VMs Individually

  4. Select the protection policy in the table to include the VMs in a protection policy.
    Figure. Protection Policy Selection
    Click to enlarge Protection Policy Selection

  5. Click Protect .
    The VMs are added to the selected protection policy. The updated protection policy starts synchronizing to the recovery Prism Central.

Removing Guest VMs Individually from a Protection Policy

You can directly remove guest VMs from a protection policy from the VMs page. To remove guest VMs from a protection policy in Xi Cloud Services, perform the following procedure.

About this task

Note: If a guest VM is protected individually (not through VM categories), you can remove it from the protection policy only by using this individual removal procedure.
Note: If a guest VM is protected under a VM category, you cannot remove the guest VM from the protection policy with this procedure. You can remove the guest VM from the protection policy only by dissociating the guest VM from the category.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to VMs in the left pane.
  3. Select the guest VMs that you want to remove from a protection policy.
  4. Click UnProtect from the Actions drop-down menu.
    The selected guest VMs are removed from the protection policy. The updated protection policy starts synchronizing to the recovery Prism Central.
    Note: Delete all the recovery points associated with the guest VM to avoid incurring subscription charges. The recovery points adhere to the expiration period set in the protection policy and unless deleted individually, continue to incur charges until the expiry.

Cloning a Protection Policy

If the requirements of the protection policy that you want to create are similar to an existing protection policy in Xi Cloud Services, you can clone the existing protection policy and update the clone.

About this task

To clone a protection policy from Xi Cloud Services, perform the following procedure.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to Protection Policies in the left pane.
  3. Select the protection policy that you want to clone.
  4. Click Clone from the Actions drop-down menu.
  5. Make the required changes on the Clone Protection Policy page. For information about the fields on the page, see:
    • Creating a Protection Policy with Asynchronous Replication Schedule (Xi Leap)
    • Creating a Protection Policy with NearSync Replication Schedule (Xi Leap)
  6. Click Save .
    The selected protection policy is cloned. The updated protection policy starts synchronizing to the recovery Prism Central.

Updating a Protection Policy

You can modify an existing protection policy in the Xi Cloud Services. To update an existing protection policy in Xi Cloud Services, perform the following procedure.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to Protection Policies in the left pane.
  3. Select the protection policy that you want to update.
  4. Click Update from the Actions drop-down menu.
  5. Make the required changes on the Update Protection Policy page. For information about the fields on the page, see:
    • Creating a Protection Policy with Asynchronous Replication Schedule (Xi Leap)
    • Creating a Protection Policy with NearSync Replication Schedule (Xi Leap)
  6. Click Save .
    The selected protection policy is updated. The updated protection policy starts synchronizing to the recovery Prism Central.

Finding the Protection Policy of a Guest VM

You can use the data protection focus on the VMs page to determine the protection policies to which a VM belongs in Xi Cloud Services. To determine the protection policy in Xi Cloud Services to which a VM belongs, do the following.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to VMs in the left pane.
  3. Click Data Protection from the Focus menu at the top-right corner.
    The Protection Policy column that is displayed shows the protection policy to which the VMs belong.
    Figure. Focus
    Click to enlarge Focus

  4. After you review the information, to return the VM page to the previous view, remove the Focus Data Protection filter from the filter text box.

Recovery Plan Management

A recovery plan orchestrates the recovery of protected VMs at a recovery site. Recovery plans are predefined procedures (runbooks) that use stages to enforce VM power-on sequence. You can also configure the inter-stage delays to recover applications gracefully. Recovery plans that recover applications in Xi Cloud Services are also capable of creating the required networks during failover and can assign public-facing IP addresses to VMs.

A recovery plan created in one availability zone (site) replicates to the paired availability zone and works bidirectionally. After a failover from the primary site to a recovery site, you can failback to the primary site by using the same recovery plan.

After you create a recovery plan, you can validate or test it to ensure that recovery goes through smoothly when failover becomes necessary. Xi Cloud Services includes a built-in VPC for validating or testing failover.

Recovery plans are independent of protection policies and do not reference protection policies in their configuration information. Also, they do not create recovery points. While the process of planned failover includes the creation of a recovery point so that the latest data can be used for recovery, unplanned and test failovers rely on the availability of the required recovery points at the designated recovery site. A recovery plan therefore requires the VMs in the recovery plan to also be associated with a protection policy.

Recovery plans are synchronized to one or more paired sites when they are created, updated, or deleted. For information about how Leap determines the list of availability zones (sites) for synchronization, see Entity Synchronization Between Paired Availability Zones.

Adding Guest VMs individually to a Recovery Plan

You can also add VMs directly to a recovery plan in the VMs page, without the use of a category. To add VMs directly to a recovery plan in Xi Cloud Services, perform the following procedure.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to VMs in the left pane.
  3. Select the VMs that you want to add to a recovery plan.
  4. Click Add to Recovery Plan from the Actions drop-down menu.
    The Update Recovery Plan page is displayed.
  5. Select the recovery plan where you want to add the VMs in the Add to Recovery Plan dialog box.
  6. Click Add .
    The Update Recovery Plan dialog box appears.
  7. In the General tab, check Recovery Plan Name , Recovery Plan Description . Click Next .
  8. In the Power On Sequence tab, add VMs to the stage. For more information, see Stage Management
  9. Click Next .
  10. In the Network Settings tab, update the network settings as required for the newly added VMs. For more information, see Creating a Recovery Plan (Xi Leap).
  11. Click Done .
    The VMs are added to the recovery plan.

Removing Guest VMs individually from a Recovery Plan

You can also remove VMs directly from a recovery plan in Xi Cloud Services. To remove VMs directly from a protection policy, perform the following procedure.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to Recovery Plans in the left pane.
  3. Select the recovery plan from which you want to remove VM.
  4. Click Update from the Actions drop-down menu.
    The Update Recovery Plan dialog box appears.
  5. In the General tab, check Recovery Plan Name , Recovery Plan Description . Click Next .
  6. In the Power On Sequence tab, select the VMs and click More Actions > Remove .
    Note: You see More Actions in a stage only when one or more VMs in the stage are selected. When none of the VMs in the stage are selected, you see Actions .
  7. Click Next .
  8. In the Network Settings tab, update the network settings as required for the newly added VMs. For more information, see Stage Management.
  9. Click Done .
    The VMs are removed from the selected recovery plan.

Updating a Recovery Plan

You can update an existing recovery plan in Xi Cloud Services. To update a recovery plan, perform the following procedure.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to Recovery Plans in the left pane.
  3. Select the recovery plan that you want to update.
  4. Click Update from the Actions drop-down menu.
    The Update Recovery Plan dialog box appears.
  5. Make the required changes to the recovery plan. For information about the various fields and options, see Creating a Recovery Plan (Xi Leap).
  6. Click Done .
    The selected recovery plan is updated.

Validating a Recovery Plan

You can validate a recovery plan from the recovery site. For example, if you perform the validation in the Xi Cloud Services (primary site being an on-prem site), Leap validates failover from the on-prem site to Xi Cloud Services. Recovery plan validation only reports warnings and errors. Failover is not performed. In this procedure, you need to specify which of the two paired sites you want to treat as the primary, and then select the other site as the secondary.

About this task

To validate a recovery plan, do the following.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to Recovery Plans in the left pane.
  3. Select the recovery plan that you want to validate.
  4. Click Validate from the Actions drop-down menu.
  5. In the Validate Recovery Plan dialog box, do the following.
    1. In Primary Location , select the primary location.
    2. In Recovery Location , select the recovery location.
    3. Click Proceed .
    The validation process lists any warnings and errors.
  6. Click Back .
    A summary of the validation is displayed. You can close the dialog box.
  7. To return to the detailed results of the validation, click the link in the Validation Errors column.
    The selected recovery plan is validated for its correct configuration.

Manual Disaster Recovery (Xi Leap)

Manual data protection involves manually creating recovery points, manually replicating recovery points, and manually recovering the VMs at the recovery site. You can also automate some of these tasks. For example, the last step—that of manually recovering VMs at the recovery site—can be performed by a recovery plan while the underlying recovery point creation and replication can be performed by protection policies. Conversely, you can configure protection policies to automate recovery point creation and replication and recover VMs at the recovery site manually.

Creating Recovery Points Manually (Out-of-Band Snapshots)

About this task

To create recovery points manually in Xi Cloud Services, do the following.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to VMs in the left pane.
  3. Select the VMs for which you want to create a recovery point.
  4. Click Create Recovery Point from the Actions drop-down menu.
  5. To verify that the recovery point is created, click the name of the VM, click the Recovery Points tab, and verify that a recovery point is created.

Replicating Recovery Points Manually

You can manually replicate recovery points only from the site where the recovery points exist.

About this task

To replicate recovery points manually from Xi Cloud Service, do the following.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to VMs in the left pane.
  3. Click the VM whose recovery point you want to replicate, and then click Recovery Points in the left.
    The Recovery Points view lists all the recovery points of the VM.
  4. Select the recovery points that you want to replicate.
  5. Click Replicate from the Actions drop-down menu.
  6. In the Replicate dialog box, do the following.
    1. In Recovery Location , select the location where you want to replicate the recovery point.
    2. In Target Cluster , select the cluster where you want to replicate the recovery point.
    3. Click Replicate Recovery Point .

Recovering a Guest VM from a Recovery Point Manually (Clone)

You can recover a VM by cloning a VM from a recovery point.

About this task

To recover a VM from a recovery point at Xi Cloud Services, do the following.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click Explore and go to VMs in the left pane.
  3. Click the VM whose recovery point you want to replicate, and then click Recovery Points in the left.
    The Recovery Points view lists all the recovery points of the VM.
  4. Select the recovery point from which you want to recover the VM.
  5. Click Restore from the Actions drop-down menu.
  6. In the Restore dialog box, do the following.
    1. In the text box provided for specifying a name for the VM, specify a new name or do nothing to use the automatically generated name.
    2. Click Restore .
    Warning: The following are the limitations of the manually recovered VMs (VMs recovered without the use of a recovery plan).
    • The VMs recover without a VNIC if the recovery is performed at the remote site.
    • VM categories are not applied.
    • NGT needs be reconfigured.

Entity Synchronization Between Paired Availability Zones

When paired with each other, availability zones (sites) synchronize disaster recovery configuration entities. Paired sites synchronize the following disaster recovery configuration entities.

Protection Policies
A protection policy is synchronized whenever you create, update, or delete the protection policy.
Recovery Plans
A recovery plan is synchronized whenever you create, update, or delete the recovery plan. The list of availability zones (sites) to which Xi Leap must synchronize a recovery plan is derived from the VMs that are included in the recovery plan. The VMs used to derive the availability zone list are VM categories and individually added VMs.

If you specify VM categories in a recovery plan, Leap determines which protection policies use those VM categories, and then synchronizes the recovery plans to the availability zones specified in those Protection Plans.

If you include VMs individually in a recovery plan, Leap uses the recovery points of those VMs to determine which protection policies created those recovery points, and then synchronizes the recovery plans to the availability zones specified in those protection policies. If you create a recovery plan for VM categories or VMs that are not associated with a protection policy, Leap cannot determine the availability zone list and therefore cannot synchronize the recovery plan. If a recovery plan includes only individually added VMs and a protection policy associated with a VM has not yet created VM recovery points, Leap cannot synchronize the recovery plan to the availability zone specified in that protection policy. However, recovery plans are monitored every 15 minutes for the availability of recovery points that can help derive availability zone information. When recovery points become available, Xi Leap derives the availability zone by the process described earlier and synchronizes the recovery plan to the availability zone.

VM Categories used in Protection Policies and Recovery Plans
A VM category is synchronized when you specify the VM category in a protection policy or recovery plan.
Issues such as a loss of network connectivity between paired availability zones or user actions such as unpairing of availability zones followed by repairing of those availability zones can affect VM synchronization.
Tip: Nutanix recommends to unprotect all the VMs on the availability zone before unpairing it to avoid getting into a state where the entities have stale configurations after repairing of availability zones.

If you update VMs in either or both availability zones before such issues are resolved or before unpaired availability zones are paired again, VM synchronization is not possible. Also, during VM synchronization, if a VM cannot be synchronized because of an update failure or conflict (for example, you updated the same VM in both availability zones during a network connectivity issue), no further VMs are synchronized. Entity synchronization can resume only after you resolve the error or conflict. To resolve a conflict, use the Entity Sync option, which is available in the web console. Force synchronization from the availability zone that has the desired configuration. Forced synchronization overwrites conflicting configurations in the paired availability zone.
Note: Forced synchronization cannot resolve errors arising from conflicting values in VM specifications (for example, the paired availability zone already has a VM with the same name).

If you do not update entities before a connectivity issue is resolved or before you pair the availability zones again, the synchronization behavior described earlier resumes. Also, pairing previously unpaired availability zones trigger an automatic synchronization event. For recommendations to avoid facing such issues, see Entity Synchronization Recommendations (Xi Leap).

Entity Synchronization Recommendations (Xi Leap)

Consider the following recommendations to avoid inconsistencies and the resulting synchronization issues.

  • During network connectivity issues, do not update entities at both the availability zones (sites) in a pair. You can safely make updates at any one site. After the connectivity issue is resolved, force synchronization from the site in which you made updates. Failure to adhere to this recommendation results in synchronization failures.

    You can safely create entities at either or both the sites as long as you do not assign the same name to entities at the two sites. After the connectivity issue is resolved, force synchronization from the site where you created entities.

  • If one of the sites becomes unavailable, or if any service in the paired site is down perform force synchronization from the paired availability zone after the issue is resolved.

Forcing Entity Synchronization (Xi Leap)

Entity synchronization, when forced from an availability zone (site), overwrites the corresponding entities in paired sites. Forced synchronization also creates, updates, and removes those entities from paired sites.

About this task

The availability zone (site) to which a particular entity is forcefully synchronized depends on which site requires the entity (seeEntity Synchronization Between Paired Availability Zones). To avoid inadvertently overwriting required entities, ensure to force VM synchronization from the site in which the entities have the desired configuration.

If a site is paired with two or more availability zones (sites), you cannot select one or more sites with which to synchronize entities.

To force entity synchronization from Xi Cloud Services, do the following.

Procedure

  1. Log on to Xi Cloud Services.
  2. Click the settings button (gear icon) at the top-right corner of the window.
  3. Click Entity Sync in the menu.
  4. In the Entity Sync dialog box, review the message at the top of the dialog box, and then do the following.
    1. To review the list of entities that will be synchronized to an AVAILABILITY ZONE , click the number of ENTITIES adjacent to an availability zone.
    2. After you review the list of entities, click Back .
  5. Click Sync Entities .

Migrating Guest VMs from a Protection Domain to a Protection Policy

You can protect a guest VM either with a protection domain in Prism Element or with a protection policy in Prism Central. If you have guest VMs in protection domains, migrate those guest VMs to protection policies to orchestrate their disaster recovery using Leap.

Before you begin

Migration from protection domains to protection policies is a disruption process. For successful migration,
  • Ensure that the guest VMs have no on-going replication.
  • Ensure that the guest VMs do not have volume groups.
  • Ensure that the guest VMs are not in consistency groups.

About this task

To migrate a guest VM from a protection domain to a protection policy manually, perform the following procedure.

Tip: To automate the migration using a script, refer KB 10323 .

Procedure

  1. Unprotect the guest VM from the protection domain.
    Caution: Do not delete the guest VM snapshots in the protection domain. Prism Central reads those guest VM snapshots to generate new recovery points without full replication between the primary and recovery Nutanix clusters. If you delete the guest VM snapshots, the VM data replicates afresh (full replication). Nutanix recommends keeping the VM snapshots in the protection domain until the first recovery point for the guest VM is available on Prism Central.
    Caution: Use the automated script for migrating guest VMs from a large protection domain. A large protection domain consists of more than 500 guest VMs. If you migrate the guest VMs manually from a large protection domain, the VM data replicates afresh (full replication).

  2. Log on to Prism Central and protect the guest VMs with protection policies individually (see Adding Guest VMs individually to a Protection Policy) or through VM categories.
Read article

Epoch Documentation

21-Nov-2018

Epoch Documentation

For Epoch documentation, see https://docs.epoch.nutanix.com/

Read article

File Analytics Guide

Files 3.0

Last updated: 2022-06-14

File Analytics

File Analytics provides data and statistics on the operations and contents of a file server.

Once deployed, Nutanix Files adds a File Analytics VM (FAVM) to the Files cluster. A single File Analytics VM supports all file servers in the cluster; however, you must enable File Analytics separately for each file server. File Analytics protects data on the FAVM, which is kept in a separate volume group.

Once you deploy File Analytics, a new File Analytics link appears on the file server actions bar. Use the link to access File Analytics on any file server that has File Analytics enabled.

Figure. File Analytics VM Click to enlarge

Display Features

The File Analytics web console consists of display features:

Main menu bar : The main menu bar appears at the top of every page of the File Analytics web console. The main menu bar includes the following display features:

  • Dashboard tab : View widgets that present data on file trends, distribution, and operations, see Dashboard.
  • Audit Trails tab : Search for a specific user or file and view various widgets to audit activity, see Audit Trails.
  • Anomalies tab : Create anomaly policies and view anomaly trends, see Anomalies.
  • Ransomware tab : Configure ransomware protection and self-service restore (SSR) snapshots, see Ransomware Protection.
    Warning: Ransomware protection helps detect potential ransomware. Nutanix does not recommend using the File Analytics ransomware feature as an all-encompassing ransomware solution.
  • Reports tab : Create custom reports or use pre-canned report templates, see Reports.
  • Status icon : Check the file system scan status.
  • File server drop-down : View the name of the file server for which data is displayed.
  • Settings drop-down : Manage File Analytics and configure settings, see Administration and File Analytics Options.
  • Health icon : Check the health of File Analytics, see Health.
  • Admin dropdown : Collect logs and view the current File Analytics version.

Deployment Requirements

Meet the following requirements prior to deploying File Analytics.

Ensure that you have performed the following tasks and your Files deployment meets the following specifications.

  • Assign the file server administrator role to an Active Directory (AD) user, see Managing Roles in the Nutanix Files Guide .
  • Log on as the Prism admin user to deploy the File Analytics server.
  • Configure a VLAN with one dedicated IP address for File Analytics, or you can use an IP address from an existing Files external network. This IP address must have connectivity to AD, the control VM (CVM), and Files. See "Configuring a Virtual Network For guest VM Interfaces" in the Prism Web Console Guide.
    Note: Do not install File Analytics on the Files internal network.
  • (optional) Assign the file server administrator role to an LDAP user, see Managing Roles in the Nutanix Files Guide .
  • Ensure that all software components meet the supported configurations and system limits, see the File Analytics Release Notes .

Network Requirements

Open the required ports, and ensure that your firewall allows bi-directional Internet Control Message Protocol (ICMP) traffic between the FAVM and CVMs.

The Port Reference provides detailed port information for Nutanix products and services, including port sources and destinations, service descriptions, directionality, and protocol requirements.

In addition to meeting the File Analytics network requirements, ensure to meet Nutanix Files port requirements as described in the Port Reference .

Limitations

File Analytics has the following limitations.

Note: Depending on data set size, file count, and workload type, enabling File Analytics can affect the performance of Nutanix Files. High latency is more common with heavy file-metadata operations (directory and file creation, deletion, permission changes, and so on). To minimize the impact on performance, ensure that the host has enough CPU and memory resources to handle the File Analytics VM (FAVM), file servers, and guest VMs (if any).
  • Only Prism admin can deploy File Analytics.
  • File Analytics analyzes data from daily up to 1 year based on the configuration. File Analytics automatically deletes data beyond the defined configuration.
    Note: After surpassing the audit event threshold, as specified in File Analytics Release Notes , Analytics archives the oldest events. Archived audit events do not appear in the Analytics UI.
  • You cannot deploy or decommission File Analytics when a file server has high-availability (HA) mode enabled.
  • You cannot use network segmentation for Nutanix Volumes with File Analytics.
  • If file server DNS or IP changes, File Analytics does not automatically reconfigure.
  • File Analytics does not collect metadata for files on Kerberos authenticated NFS v4.0 shares.
  • File Analytics does not support hard links.
  • You cannot enable File Analytics on a file server clone.
  • You cannot move File Analytics to another storage container.
  • File Analytics creates an unprotected Prism and an unprotected file server user for integration purposes. Do not delete these users.
  • The legacy file blocking policy has an upper limit of 300 ransomware extensions.
    Note: For higher limits, it is recommended to use Nutanix Data Lens.
  • File Analytics does not support the following operations for graceful shutdown:
    • AHV: power cycle, power off
    • ESXi: power off, reset

Administration

Overview of administrative processes for File Analytics.

As an admin, you have the required permissions for performing File Analytics administrative tasks. To add a file server admin user, see Managing Roles in the Nutanix Files Guide . The topics in this chapter describe the basics for administering your File Analytics environment. For advanced administrative options, refer to the File Analytics Options chapter.

Deploying File Analytics

Follow this procedure to deploy the File Analytics server.

Before you begin

Ensure that your environment meets all requirements prior to deployment, see Deployment Requirements.

Procedure

Deploying the File Analytics server.
  1. Go to Support Portal > Downloads > File Analytics .
  2. Download the File Analytics QCOW2 and JSON files.
  3. Log on to Prism with the user name and password of the Prism administrator.
    Note: An Active Directory (AD) user or an AD user mapped to a Prism admin role cannot deploy File Analytics.
  4. In Prism, go to the File Server view and click the Deploy File Analytics action link.
    Figure. File Analytics
    Click to enlarge

  5. Review the File Analytics requirements and best practices in the Pre-Check dialog box.
  6. In the Deploy File Analytics Server dialog box, do the following in the Image tab.
    • Under Available versions , select one of the available File Analytics versions. (continue to step 8.).
    • Install by uploading installation binary files (continue to next step).
  7. Upload installation files.
    1. In the Upload binary section, click upload the File Analytics binary to upload the File Analytics JSON and QCOW files.
      Figure. Upload Binary Link Click to enlarge
    2. Under File Analytics Metadata File (.Json) , click Choose File to choose the downloaded JSON file.
    3. Under File Analytics Installation Binary (.Qcow2) click Choose File to choose the downloaded QCOW file.
      Figure. Upload Binary Files Click to enlarge
    4. Click Upload Now after choosing the files.
  8. Click Next .
  9. In the VM Configuration tab, do the following in the indicated fields:
    1. Name : Enter a name for the File Analytics VM (FAVM).
    2. Server Size : Select either the small or large configuration. Large file servers require larger configurations for the FAVM. By default File Analytics selects the large configuration.
    3. Storage Container: select a storage container from the drop-down.
      The drop-down only displays file server storage containers.
    4. Network List : Select a VLAN.
      Note: If the selected network is unmanaged , enter more network details in the Subnet Mask , Default Gateway IP , and IP Address fields as indicated.
      Note: The FAVM must use the client-side network.
  10. Click Deploy .
    In the main menu drop-down, select the Tasks view to monitor the deployment progress.

Results

Once deployment is complete, File Analytics creates an FAVM, CVM, and a new Files user to make REST API calls. Do not delete the CVM, FAVM, or the REST API user.

Enabling File Analytics

Steps for enabling File Analytics after deployment or disablement.

About this task

Attention: Nutanix recommends enabling File Analytics during off-peak hours.

Follow these steps to enable File Analytics after disabling the application.

Note: File Analytics saves all previous configurations.

Procedure

  1. In the File Server view in Prism , select the target file server.
  2. (skip to step 3 if you are re-enabling a file server) click Manage roles to add a file server admin user, see Managing Roles in the Nutanix Files Guide .
  3. In the File Server view, select the target file server and click File Analytics in the tabs bar.
  4. (Skip to step 5 if you are not re-enabling a disabled instance of File Analytics) to re-enable File Analytics, click Enable File Analytics in the message bar.
    Figure. Enabling File Analytics Link Click to enlarge
    The Enable File Analytics dialog-box appears. Skip the remaining steps.
  5. In the Data Retention field, select a data retention period. The data retention period refers to the length of time File Analytics retains audit events.
  6. In the Authentication section, enter the credentials as indicated:
    Note: AD passwords for the file server admin cannot contain the following special characters: comma (,), single quote ('), double quote ("). Using the special characters in passwords prevents File Analytics from performing file system scans.
    1. (For SMB users only) In the SMB section, do the following in the indicated fields to provide SMB authentication details:
      • Active Directory Realm Name : Confirm the AD realm name for the file server.
      • Username : Enter the AD username for the file server administrator, see File Analytics Prerequisites .
      • Password : Enter the AD user password for the file server administrator.
    2. (For NFS users only) In the NFS Authentication section, do the following in the indicated fields to provide NFS authentication details:
      • LDAP Server URI : Enter the URI of the LDAP server.
      • Base DN : Enter the base DN for the LDAP server.
      • Password : Enter the LDAP user password for the file server administrator.

    Click to enlarge

  7. Click Enable .

Results

After enablement, File Analytics performs a one-time file system scan to pull metadata information. The duration of the scan varies depending on the protocol of the share. There is no system downtime during the scan.

Example

Scanning 3–4 million NFS files or 1 million SMB files takes about 1 hour.

Disabling File Analytics

About this task

Follow the steps as indicated to disable File Analytics.

Procedure

  1. In File Analytics click the gear icon > Disable File Analytics .
  2. In the dialog-box, click Disable .
    Disabling File Analytics disables data collection. The following message banner appears.
     File Analytics is disabled on the server. Enable File Analytics to start collecting data again or Delete File Analytics Data. 

What to do next

To delete data, click the Delete File Analytics Data link in the banner described in Step 2.

Launching File Analytics

About this task

Do the following to launch File Analytics.

Procedure

  1. From the Prism views drop-down, select the File Server view.
  2. Select the target file server from the entity tab.
  3. Click the File Analytics action button below the entity table.
    Figure. Launch File Analytics Click to enlarge The File Analytics action button.

File Analytics VM Management

To update a File Analytics VM (FAVM), refer to the sizing guidelines in the File Analytics release notes and follow the steps in the VM Management topic of the Prism Web Console Guide .

Removing File Analytics VMs

Remove a File Analytics VM (FAVM) by disabling it and deleting it from the cluster in Prism.

About this task

Follow the steps as indicated to remove an FAVM.
Note: Do not delete an FAVM using the CLI, as this operation does not decommission the FAVM.

Procedure

  1. Disable File Analytics on all file servers in the cluster, see Disabling File Analytics.
  2. In the File Server view in Prism Element, do the following:
    1. In the top actions bar, click Manage File Analytics .
    2. Click Delete to remove the FAVM.
    When you delete an FAVM, you also delete all of your File Analytics configurations and audit data stored on the FAVM.

Updating Credentials

About this task

Follow the steps as indicated to update authentication credentials for LDAP or Active Directory.

Procedure

  1. Click gear icon > Update AD/LDAP Configuration .
  2. To update Active Directory credentials, do the following in the indicated fields (otherwise move on to the next step).
    Note: AD passwords for the file server admin cannot contain the following special characters: comma (,), single quote ('), double quote ("). Using the special characters in passwords prevents File Analytics from performing file system scans.
    1. Active Directory Realm Name: confirm or replace the realm name.
    2. Username: confirm or replace the username.
    3. Password: type in the new password.
  3. To update NFS configuration, do the following (otherwise move on to the next step).
    1. LDAP Server URI: confirm or replace the server URI.
    2. Base DN: confirm or replace the base distinguished name (DN).
    3. Bind DN (Optional): confirm or replace the bind distinguished name (DN).
    4. Password: type in the new password.
  4. Click Save .

Managing Deleted Share/Export Audits

Manage the audit data of delete shares and exports.

About this task

By default, File Analytics retains deleted share and export data. The dashboard widgets do not account for data of deleted shares and exports. The deleted marker appears next to deleted shares and exports in audit trails. The Manage Share/Export Audit data window displays a list of deleted shares and exports.

Follow the directions as indicated to delete audit data for the deleted share or export.

Note: You cannot restore the deleted audit data of a deleted share or export.

Procedure

  1. Click the gear icon > Manage Deleted Share/Export Audit .
  2. Check the box next to the share or export name.
  3. Click Delete .
  4. In the confirmation window, click Delete to confirm the deletion of data.
    In the Manage Deleted Share/Export Audit , a progress bar displays the progress of the deletion process next to the share name. File Analytics considers data deletion of a deleted share a low-priority task, which can take several hours to finish.

Changing an FAVM Password

Steps for updating the password of a File Analytics VM (FAVM).

About this task

Context for the current task

Procedure

  1. Log on to an FAVM with SSH.
  2. Change the nutanix password.
    nutanix@fsvm$ sudo passwd nutanix
  3. Respond to the prompts, providing the current and new nutanix user password.
    Changing password for user nutanix.
    Old Password:
    New password:
    Retype new password:
    passwd: all authentication tokens updated successfully.
    Note:

    The password must meet the following complexity requirements:

    • At least 8 characters long
    • At least 1 lowercase letter
    • At least 1 uppercase letter
    • At least 1 number
    • At least 1 special character
    • At least 4 characters difference from the old password
    • Should not be among the last 10 passwords

Upgrades

Perform File Analytics upgrades using the Life Cycle Manager feature in Prism Element.

Before you upgrade File Analytics, ensure that you are running a compatible version of AOS and Files. Refer to File Analytics release notes for compatibility details. You can upgrade both AOS and Files through Prism Element, see AOS Upgrade in the Prism Web Console Guide .

To upgrade File Analytics, perform inventory and updates using the Life-Cycle Manager (LCM), see the Life Cycle Manager Guide for instructions on performing inventory and updates. LCM cannot upgrade File Analytics when the protection domain (PD) for the File Analytics VM (FAVM) includes any other entities.

Note: The File Analytics UI is not accessible during upgrades.

During the upgrade process, File Analytics takes a snapshot of the volume group (VG) that contains File Analytics data. If issues occur during an upgrade, File Analytics restores the FAVM to the pre-upgrade state. If the volume group is protected and is part a protection domain, the File Analytics creates a snapshot and sets the expiry time to 30 days. If the volume group is not protected, File Analytics creates a snapshot and deletes the snapshot after completing the upgrade successfully. If any errors occur, the system keeps the snapshot for 30 days to troubleshoot the issue.

Upgrade File Analytics at a Dark Site

Upgrade File Analytics at a dark site using the Life-Cycle Manager (LCM).

About this task

Before you begin

You need a local web server reachable by your Nutanix clusters to host the LCM repository.

Procedure

  1. From a device that has public Internet access, go to Nutanix Portal > Downloads > Tools & Firmware .
    1. Download the tar file lcm_dark_site_version.tar.gz .
    2. Transfer lcm_dark_site_version.tar.gz to your local web server and untar into the release directory.
  2. From a device that has public Internet access, go to the Nutanix portal and select Downloads > File Analytics .
    1. Download the following files.
      • file_analytics_dark_site_version.tar.gz
      • nutanix_compatibility.tgz
      • nutanix_compatibility.tgz.sign
    2. Transfer file_analytics_dark_site_version.tar.gz to your local web server and untar into the release directory.
    3. Transfer the nutanix_compatibility.tgz and nutanix_compatibility.tgz.sign files to your local web server (overwrite existing files as needed).
  3. Log on to Prism Element.
  4. Click Home > LCM > > Settings .
    1. In the Fetch updates from field, enter the path to the directory where you extracted the tar file on your local server. Use the format http://webserver_IP_address/release .
    2. Click Save .
      You return to the Life Cycle Manager.
    3. In the LCM sidebar, click Inventory > Perform Inventory .
    4. Update the LCM framework before trying to update any other component.
      The LCM sidebar shows the LCM framework with the same version as the file you downloaded.

Dashboard

The Dashboard tab displays data on the operational trends of a file server.

Dashboard View

The Dashboard tab is the opening screen that appears after launching File Analytics from Prism. The dashboard displays widgets that present data on file trends, distribution, and operations.

Figure. File Analytics Dashboard Click to enlarge File Analytics data panes in the Dashboard view.

Table 1. Dashboard Widgets
Tile Name Description Intervals
Capacity trend Displays capacity trends for the file server including capacity added, capacity removed, and net changes.

Clicking an event period widget displays the Capacity Trend Details view.

7 days, the last 30 days, or the last 1 year.
Data age Displays the percentage of data by age. Data age determines the data heat, including: hot, warm, and cold. Default intervals are as follows:
  • Hot data – accessed within the last week.
  • Warm data – accessed within 2 to 4 weeks.
  • Cold data – accessed 4 weeks ago or later.
Anomaly alerts Displays alerts for configured anomalies and ransomware detection based on blocked file types, see Configuring Anomaly Detection. [alert]
Permission denials Displays users who have had excessive permission denials and the number of denials. Clicking a user displays audit details, see Audit Trails - Users for more. [user id], [number of permission denials]
File distribution by size Displays the number of files by file size. Provides trend details for top 5 files. Less than 1 MB, 1–10 MB, 10–100 MB, 100 MB to 1 GB, greater than 1 GB).
File distribution by type Displays the space taken up by various applications and file types. The file extension determines the file type. See the File types table for more details. MB or GB
File distribution by type details view Displays a trend graph of the top 5 file types. File distribution details include file type, current space used, current number of files, and change in space for the last 7 or 30 days.

Clicking View Details displays the File Distribution by Type view.
Daily size trend for top 5 files (GB), file type (see the "File Type" table), current space used (GB), current number of files (numeric), change in last 7 or 30 days (GB).
Top 5 active users Lists the users who have accessed the most files and number of operations the user performed for the specified period. When there are more than 5 active users, the more link provides details on the top 50 users. Clicking the user name displays the audit view for the user, see Audit Trails - Users for more. 24 hours, 7 days, 1 month, or 1 year.
Top 5 accessed files Lists the 5 most frequently accessed files. Clicking more provides details on the top 50 files.

Clicking the file name displays the audit view details for the file, see Audit Trails - Files for more.

24 hours, 7 days, 1 month, or 1 year.
Files operations Displays the distribution of operation types for the specified period, including a count for each operation type and the total sum of all operations.

Operations include: create, delete, read, write, rename, permission changed, set attribute, symlink, permission denied, permission denied (file blocking).

Clicking an operation displays the File Operation Trend view.
24 hours, 7 days, 1 month, or 1 year.

Capacity Trend Details

Clicking an event period in the Capacity Trend widget displays the Capacity Trend Details view for that period. The view includes three tabs Share/Export , Folder , and Category . Each tab includes columns detailing entity details: Name . Net capacity change, capacity added, and capacity removed.

Figure. Capacity Trend Details View Click to enlarge Clicking on the Capacity Trend widget in the Dashboard tab displays the Capacity Trend Details view.

Table 2. Capacity Trend Details
Category Supported File Type
Name Name of share/export, folder, or category.
Net capacity change The total difference between capacity at the beginning and the end of the specified period.
Share name (for folders only) The name of the share or export that the folder belongs to.
Capacity added Total added capacity for the specified period.
Capacity removed Total removed capacity for the specified period.

File Distribution by Type Details

Clicking View Details for the File Distribution by Type widget displays granular details of file distribution, see the File Types table for details.

Figure. File Distribution by Type Click to enlarge Clicking View Details on the File Distribution by Type widget displays the File Distribution by Type dashboard.

Table 3. Details of File Distribution Parameters
Category Supported File Type
File type Name of file type
Current space used Space capacity occupied by the file type
Current number of files Number of files for the file type
Change (in last 30 days) The increase in capacity over a 30-day period for the specified file type
Table 4. File Types
Category Supported File Type
Archives .cab, .gz, .rar, .tar, .z, .zip
Audio .aiff, .au, .mp3, .mp4, .wav, .wma
Backups .bak, .bkf, .bkp
CD/DVD images .img, .iso, .nrg
Desktop publishing .qxd
Email archives .pst
Hard drive images .tib, .gho, .ghs
Images .bmp, .gif, .jpg, .jpeg, .pdf .png, .psd, .tif, .tiff,
Installers .msi, .rpm
Log Files .log
Lotus notes .box, .ncf, .nsf, .ns2, .ns3, .ns4, .ntf
MS Office documents .accdb, .accde, .accdt, .accdr, .doc, .docx, .docm, .dot, .dotx, .dotm, .xls, .xlsx, .xlsm, .xlt, .xltx, .xltm, .xlsb, .xlam, .ppt, .pptx, .pptm, .potx, .potm, .ppam, .ppsx, .ppsm, .mdb
System files .bin, .dll, .exe
Text files .csv, .pdf, .txt
Video .avi, mpg, .mpeg, .mov, .m4v
Disk image .hlog, .nvram, .vmdk, .vmx, .vmxf, .vmtm, .vmem, .vmsn, .vmsd

File Operation Trend

Clicking an operation type in the File Operations widget displays the File Operation Trend view. The File Operation Trend view breaks down the specified period into smaller intervals, and displays the number of occurrences of the operation during each interval.

Figure. Operation Trend Click to enlarge A graph displays the number of times the specified operation took place over time.

Table 5. File Operation Trend View Parameters
Category Description
Operation type A drop-down option to specify the operation type. See Files Operations in the Dashboard Widgets table for a list of operation types.
Last (time period) A drop-down option to specify the period for the file operation trend.
File operation trend graph The x-axis displays shorter intervals for the specified period. The y-axis displays the number of operations trend over the extent of the intervals.

Health

The Health dashboard displays dynamically updated health information about each File File Analytics component.

The Health dashboard includes the following details:

  • Data Summary Data summary of all file servers with File Analytics enabled.
  • Host Memory Percent of used memory on the File Analytics VM (FAVM).
  • Host CPU Usage Percent of CPU used by the FAVM.
  • Storage Summary Amount of storage space used on the File Analytics data disk or FAVM disk.
  • Overall Health Overall health of File Analytics components.
  • Data Server Summary Data server usage by component.
Figure. Health Page Click to enlarge The Health page dashboard includes tiles that dynamically update to indicate the health of relevant entities.

Data Age

The Data Age widget in the Dashboard provides details on data heat.

Share-level data is displayed to provide details on share capacity trends. There are three levels of data heat.

  • Hot – frequently accessed data (last accessed within the last week).
  • Warm – infrequently accessed data (last accessed within the last 2 to 4 weeks).
  • Cold – rarely accessed data (last accessed longer than 4 weeks ago).

You can configure the definitions for each level of data heat rather than using the default values.

Configuring Data Heat Levels

Update the values that constitute different data heat levels.

Procedure

  1. In the Data Age widget, click Explore .
  2. Click Edit Data Age Configuration .
  3. Do the following in the Hot Data section:
    1. In the entry field next to Older Than , enter an integer.
    2. In the dropdown, choose a value for Week(s) , Month(s) , or Year(s) .
  4. Do the following in the Warm Data section to configure two ranges :
    1. In the first entry field, enter an integer to configure the first range.
    2. In the dropdown, choose a value for Week(s) , Month(s) , or Year(s) .
    3. In the second entry field, enter an integer to configure the second range.
    4. In the dropdown, choose a value for Week(s) , Month(s) , or Year(s) .
  5. Do the following in the Cold Data section to configure four ranges :
    1. In the first entry field, enter an integer to configure the first range.
    2. In the dropdown, choose a value for Week(s) , Month(s) , or Year(s) .
    3. In the second entry field, enter an integer to configure the second range.
    4. In the dropdown, choose a value for Week(s) , Month(s) , or Year(s) .
    5. In the 3rd entry field, enter an integer to configure the 3rd range.
    6. In the dropdown, choose a value for Week(s) , Month(s) , or Year(s) .
    7. (optional) In the 4th entry field, enter an integer to configure the 4th range.
    8. In the dropdown, choose a value for Week(s) , Month(s) , or Year(s) .
  6. Click Apply .
    Note: The new values do not affect the already calculated heat statistics. File Analytics uses the updated values for future heat calculations..

Anomalies

Data panes in the Anomalies tab display data and trends for configured anomalies.

The Anomalies tab provides options for creating anomaly policies and displays dashboards for viewing anomaly trends.
Note: Configure an SMTP server to send anomaly alerts, see Configuring an SMTP Server

You can configure anomalies for the following operations:

  • Creating files and directories
  • Deleting files and directories
  • Permission changes
  • Permission denials
  • Renaming files and directories
  • Reading files and directories

Define anomaly rules by the specifying the following conditions:

  • Users exceed an operation count threshold
  • Users exceed an operation percentage threshold

Meeting the lower operation threshold triggers an anomaly.

Consider a scenario where you have 1 thousand files, the operation count threshold defined as 10, and the operation percentage threshold defined as 10%. The count threshold takes precedence, as 10% of 1 thousand is 100, which is greater than the count threshold of 10.

Figure. Anomalies Dashboard Click to enlarge The Anomalies dashboard displays anomaly trends.

Table 1. Anomalies Data Pane Descriptions
Pane Name Description Values
Anomaly Trend Displays the number of anomalies per day or per month. Last 7 days, Last 30 days, Last 1 year
Top Users Displays the users with the most anomalies and the number of anomalies per user. Last 7 days, Last 30 days, Last 1 year
Top Folders Displays the folders with the most anomalies and the number of anomalies per folder. Last 7 days, Last 30 days, Last 1 year
Operation Anomaly Types Displays the percentage of occurrences per anomaly type. Last 7 days, Last 30 days, Last 1 year

Anomaly Details

Clicking an anomaly bar in the Anomaly Trend graph displays the Anomaly Details view.

Figure. Anomaly Details View Click to enlarge

Table 2. Anomalies Details View Total Results Table
Column Description
Anomaly Type The configured anomaly type. Anomaly types not configured do not show up in the table.
Total User Count The number of users that have performed the operation causing the specified anomaly during the specified time range.
Total Folder Count The numbers of folders in which the anomaly occurred during the specified time range.
Total Operation Count Total number of anomalies for the specified anomaly type that occurred during the specified time range.
Time Range The time range for which the total user count, total folder count, and total operation count are specified.
Table 3. Anomalies Details View Users/Folders Table
Column Description
Username or Folders Indicates the entity for the operation count. Selecting the Users tab indicates operation count for specific users, and selecting the Folders tab indicates the operation count for specific folders.
Operation count The total number of operations causing anomalies for the selected user or folder during the time period for the bar in the Anomaly Trend graph.

Configuring Anomaly Detection

Steps for configuring anomaly rules.

About this task

Configure an SMTP server for File Analytics to send anomaly alerts, see Configuring an SMTP Server. To create an anomaly rule, do the following.

Procedure

  1. In the File Analytics web console, click the gear icon > Define Anomaly Rules. .
  2. In the Anomaly Email Recipients field, enter a comma-separated list of email recipients for all anomaly alerts and data.
    Note: File Analytics sends anomaly alerts and data to recipients whenever File Analytics detects an anomaly.
  3. To configure a new anomaly, do the following in the indicated fields:
    1. Events : Select a rule for the anomaly from one of the following:
      • Permission changed
      • Permission denied
      • Delete
      • Create
      • Rename
      • Read
      The event defines the scenario type for the anomaly.
    2. Minimum Operations % : Enter a percentage value for the minimum threshold.
      File Analytics calculates the minimum operations percentage based on the number of files. For example, if there are 100 files, and you set the minimum operations percentage to 5, five operations within the scan interval would trigger an anomaly alert.
    3. Minimum Operation Count : Enter a value for a minimum operation threshold.
      File Analytics triggers an anomaly alert after meeting the threshold.
    4. User : Choose if the anomaly rule is applicable for All Users or an Individual user.
    5. Type: the type determines the interval.
      The interval determines how far back File Analytics monitors the anomaly.
    6. Interval : Enter a value for the detection interval.
    7. (optional) Actions : Click the pencil icon to update an anomaly rule. Click the x icon to delete an existing rule.
    Figure. Anomaly Configuration Fields Click to enlarge Fill out these fields to configure a new anomaly rule.

  4. Click Save .

Configuring an SMTP Server

File Analytics uses a simple mail transport protocol (SMTP) server to send anomaly alerts.

About this task

To configure an SMTP server, do the following:

Procedure

  1. In the File Analytics web console, click the gear icon > SMTP Configuration .
  2. In the SMTP Configuration window, enter the indicated details in the following fields:
    1. Hostname Or IP Address : Enter a fully qualified domain name or IP address for the SMTP server.
    2. Port : Enter the port to use.
      The standard SMTP ports are 25 (encrypted), 587 (TLS), and 465 (SSL).
    3. Security Mode : Enter the desired security mode from the dropdown list.
      The options are:
      • NONE (unencrypted)
      • STARTTLS (TTL encryption)
      • SSL (SSL encryption)
    4. (If security mode is "NONE" go to step f.)
    5. User Name enter a user name for logging into the SMTP server. Depending on the authentication method, the user name may require a domain.
    6. Password enter password.
    7. From Email Address: enter the email address from which File Analytics will send the anomaly alerts.
    8. Recipient Email Address: enter a recipient email address to test the SMTP configuration.
    Figure. SMTP Configuration Click to enlarge Fields for configuring an SMTP server.

  3. Click Save .

Audit Trails

Use audit trails to look up operation data for a specific user, file, folder, or client.

The Audit Trails tab includes Files , Folders , Users , and Client IP options for specifying the audit type. Use the search bar for specifying the specific entity for the audit (user, folder, file, or client IP).

The results table presents details for entities that match the search criteria. Clicking the entity name (or client IP number) takes you to the Audit Trails dashboard for the target entity.

View Audit Trails

Audit a user, file, client, or folder.

About this task

Procedure

  1. Click the Audit Trails tab.
  2. Select the Files , Folders , Users , or Client IP option.
  3. Enter the audit trails target into the search bar.
  4. Click Search .
  5. To display audit results in the Audit Trails window, click the entity name (or client IP number).

Audit Trails - Users

Details for client IP Audit Trails.

Audit Trails Search - Users

When you search by user in the Audit Trails tab, search results display the following information in a table.

  • User Name
  • Domain
  • Last Operation
  • Last Operation On
  • Share Name
  • Operation Date
  • Action
Figure. Users Search Results Click to enlarge A table displays user search results for the query.

Audit Details Page - Users

Clicking View Audit displays the Audit Details page, which shows the following audit information for the selected user.

  • A User Events graph displays various operations the user performed during the selected period and the percentage of time each operation has occurred per total operations during the specified period.
    • The Filter by operations dropdown contains operation filters, which you can use to filter the audit by operation type. Operations include:
      • Create File
      • Delete
      • Make Directory
      • Permission Changed
      • Permission Denied
      • Read
      • Remove Directory
      • Rename
      • Set Attribute
      • Write
      • Symlink
    • The filter bar , above the User Events graph, displays the filters in use.
    • Use the From and To fields to filter by date.
  • The Results table displays operation-specific information. See more details below.
  • The Reset Filters button removes all filters.
Figure. User Audit Details - Events Click to enlarge User Events table displays event rates for various operations performed by the user.

The Results table provides granular details of the audit results. The following data is displayed for every event.

  • User Name
  • User IP Address
  • Operation
  • Operation Date
  • Target File

Click the gear icon for options to download the data as an xls, csv, or JSON file.

Note: The maximum limitation of downloading events to CSV and JSON format is 10,000.
Figure. Results Table Click to enlarge The results table displays a detailed view of the audit data.

Audit Trails - Folders

Dashboard details for folder audits.

The following information displays when you search by file in the Audit Trails tab.

  • Folder Name
  • Folder Owner Name
  • Share Name
  • Parent Folder
  • Last Operation
  • Last Operation By
  • Last Operation Date
  • Action
Figure. Folders Search Results Click to enlarge

The Audit Details page shows the following audit information for the selected folder.

  • A Folder Events graph displays various operations performed on the file during the selected period, and the percentage of time each operation has occurred per total operations during that period.
    • The Filter by operations dropdown contains operation filters, which you can use to filter the audit by operation type. Operations include:
      • Select All
      • Make Directory
      • Permission Changed
      • Permission Denied
      • Remove Directory
      • Rename
      • Set Attribute
    • A filter bar , above the File Events graph displays the filters in use.
    • Use the From and to fields to filter by date.
  • The Results table displays operation-specific details.
  • The Reset Filters button removes all filters.

The Results table provides granular details of the audit results. File Analytics displays the following data for every event.

  • User Name
  • Client IP
  • Operation
  • Operation Date

Click the gear icon for options to download the data as a CSV file.

Note: The maximum limitation of downloading events to CSV and JSON format is 10,000.

Audit Trails - Files

Dashboards details for file audit.

Audit Trails for Files

When you search by file in the Audit Trails tab, the following information displays:

  • File Name
  • File Owner Name
  • Share Name
  • Parent Folder
  • Last Operation
  • Last Operation By
  • Last Operation Date
  • Action
Figure. Files Search Results Click to enlarge A table displays file search results for the query.

Note: File Analytics does not support regular-expression (RegEx) based search.

The Audit Details page shows the following audit information for the selected file.

  • A File Events graph displays various operations performed on the file during the selected period, and the percentage of time each operation has occurred per total operations during that period.
    • The Filter by operations dropdown contains operation filters, which you can use to filter the audit by operation type. Operation types include:
      • Close File
      • Create File
      • Delete
      • Make Directory
      • Open
      • Read
      • Rename
      • Set Attribute
      • Write
      • Symlink
    • A filter bar , above the File Events graph displays the filters in use.
    • Use the From and to fields to filter by date.
  • The Results table displays operation-specific details.
    • The Reset Filters button removes all filters.
Figure. Files Audit Details - Events Click to enlarge File Events table displays event rates for various operations for the file.

The Results table provides granular details of the audit results. File Analytics displays the following data for every event.

  • Username
  • Client IP
  • Operation
  • Operation Date

Click the gear icon for options to download the data as a CSV file.

Note: The maximum limitation of downloading events to CSV and JSON format is 10,000.
Figure. Results Table Click to enlarge The results table displays a detailed view of the audit data.

Audit Trails - Client IP

Dashboard details for client IP Audit Trails.

Audit Trails Search - Client IP

When you search by client IP in the Audit Trails tab, search results display the following information in a table.

  • Client IP
  • User Name
  • Domain
  • Last Operation
  • Last Operation On
  • Share Name
  • Operation Date
  • Action
Figure. IP Search Results Click to enlarge A table displays IP search results for the query

The Audit Details page shows the following audit information for the selected client.

  • A User Events graph displays various operations performed on the client during the selected period, and the percentage of time each operation has occurred per total operations during that period.
    • The Filter by operations dropdown contains operation filters, which you can use to filter the audit by operation type. Operation types include:
      • Create File
      • Delete
      • Make Directory
      • Permission Changed
      • Permission Denied
      • Read
      • Removed Directory
      • Rename
      • Set Attribute
      • Write
      • Symlink
      • Permission Denied (File Blocking)
    • A filter bar , above the File Events graph displays the filters in use.
    • Use the From and to fields to filter by date.
  • The Results table displays operation-specific details.
    • The Reset Filters button removes all filters.
Figure. Files Audit Details - Events Click to enlarge File Events table displays event rates for various operations for the file.

The Results table provides granular details of the audit results. File Analytics displays the following data for every event.

  • User Name
  • Operation
  • Target File
  • Operation Date

Click the gear icon for an option to download the data as a CSV file.

Note: The maximum limitation of downloading events to CSV and JSON format is 10,000.

Ransomware Protection

Ransomware protection for your file server.

Caution: Ransomware protection helps detect potential ransomware. Nutanix does not recommend using the File Analytics ransomware feature as an all-encompassing ransomware solution.

File Analytics scans files for ransomware in real time, and notifies you through email in the event of a ransomware attack. By using the Nutanix Files file blocking mechanism, File Analytics prevents files with signatures of potential ransomware from carrying out malicious operations. Ransomware protection automatically scans for ransomware based on a curated list of signatures that frequently appear in ransomware files. You can modify the list by manually adding other signatures.

Note: Nutanix does not recommend manipulating the blocked signatures through Nutanix Files.

File Analytics also monitors shares for self-service restore (SSR) policies and identifies shares that do not have SSR enabled in the ransomware dashboard. You can enable SSR through the ransomware dashboard by selecting shares identified by File Analytics.

Ransomware Protection Features

The ransomware dashboard includes panes for managing ransomware protection and self-service restore (SSR).

Ransomware Dashboard

The ransomware dashboard includes two main sections:

  • The SSR Status pane for viewing, enabling, and managing SSR, see Enabling SSR.
  • The Vulnerabilities (Infection Attempts) pane for viewing total vulnerabilities, vulnerable shares, malicious clients, and top recent ransomware attempts.
    • Clicking on the number of total vulnerabilities provides a detailed view of recent vulnerabilities.
    • Clicking on the number of vulnerable shares provides a detailed view of vulnerable shares.
    • Clicking on the number of malicious clients provides a detailed view of malicious clients.
  • Click Settings , to enable and configure ransomware protection, see Enabling Ransomware Protection and Configuring Ransomware Protection.
  • Click Download (.csv) to download a list of blocked ransomware signatures.
Figure. Ransomware Dashboard Click to enlarge

Blocked Ransomware Extensions

File Analytics blocks the following ransomware signatures.

Table 1. Table
Extension Known Ransomware
*.micro eslaCrypt 3.0
*.zepto Locky
*.cerber3 Cerber 3
*.locky Locky
*.cerber Cerber
*.loli LOLI
*.mole CryptoMix (variant)
*.cryp1 CryptXXX
*.axx AxCrypt
*.onion Dharma
*.crypt Scatter
*.osiris Locky (variant)
*.crypz CryptXXX
*.ccc TeslaCrypt or Cryptowall
*.locked Various ransomware
*.odin Locky
*.cerber2 Cerber 2
*.sage Sage
*.globe Globe
*.good Scatter
*.exx Alpha Crypt
*.encrypt Alpha
*.encrypted Various ransomware
*.1txt Enigma
*.ezz Alpha Crypt
*.r5a 7ev3n
*.wallet Globe 3 (variant)
*.decrypt2017 Globe 3
*.zzzzz Locky
*.MERRY Merry X-Mas
*.enigma Coverton
*.ecc Cryptolocker or TeslaCrypt
*.cryptowall Cryptowall
*.aesir Locky
*.cryptolocker CryptoLocker
*.coded Anubis
*.sexy PayDay
*.pubg PUBG
*.ha3 El-Polocker
*.breaking_bad Files1147@gmail(.)com
*.dharma CrySiS
*.wcry WannaCry
*.lol! GPCode
*.damage Damage
*.MRCR1 Merry X-Mas
*.fantom Fantom
*.legion Legion
*.kratos KratosCrypt
*.crjoker CryptoJoker
*.LeChiffre LeChiffre
*.maya HiddenTear (variant)
*.kraken Rakhni
*.keybtc@inbox_com KeyBTC
*.rrk Radamant v2
*.zcrypt ZCRYPT
*.crinf DecryptorMax or CryptInfinite
*.enc TorrentLocker / Cryptorium
*.surprise Surprise
*.windows10 Shade
*.serp Serpent (variant)
*.file0locked Evil
*.ytbl Troldesh (variant)
*.pdcr PadCrypt
*.venusf Venus Locker
*.dale Chip
*.potato Potato
*.lesli CryptoMix
*.angelamerkel Angela Merkel
*.PEGS1 Merry X-Mas
*.R16m01d05 Evil-JS (variant)
*.zzz TeslaCrypt
*.wflx WildFire
*.serpent Serpent
*.Dexter Troldesh (variant)
*.rnsmwr Gremit
*.thor Locky
*.nuclear55 Nuke
*.xyz TeslaCrypt
*.encr FileLocker
*.kernel_time KeRanger OS X
*.darkness Rakhni
*.evillock Evil-JS (variant)
*.locklock LockLock
*.rekt HiddenTear (variant) / RektLocker
*.coverton Coverton
*.VforVendetta Samsam (variant)
*.remk STOP
*.1cbu1 Princess Locker
*.purge Globe
*.cry CryLocker
*.zyklon ZYKLON
*.dCrypt DummyLocker
*.raid10 Globe [variant]
*.derp Derp
*.zorro Zorro
*.AngleWare HiddenTear/MafiaWare (variant)
*.shit Locky
*.btc Jigsaw
*.atlas Atlas
*.EnCiPhErEd Xorist
*.xxx TeslaCrypt 3.0
*.realfs0ciety@sigaint.org.fs0ciety Fsociety
*.vbransom VBRansom 7
*.exotic Exotic
*.crypted Nemucod
*.fucked Manifestus
*.vvv TeslaCrypt 3.0
*.padcrypt PadCrypt
*.cryeye DoubleLocker
*.hush Jigsaw
*.RMCM1 Merry X-Mas
*.unavailable Al-Namrood
*.paym Jigsaw
*.stn Satan
*.braincrypt Braincrypt
*.ttt TeslaCrypt 3.0
*._AiraCropEncrypted AiraCrop
*.spora Spora
*.alcatraz Alcatraz Locker
*.reco STOP/DJVU
*.crypte Jigsaw (variant)
*.aaa TeslaCrypt
*.pzdc Scatter
*.RARE1 Merry X-Mas
*.ruby Ruby
*.fun Jigsaw
*.73i87A Xorist
*.abc TeslaCrypt
*.odcodc ODCODC
*.crptrgr CryptoRoger
*.herbst Herbst
*.comrade Comrade
*.szf SZFLocker
*.pays Jigsaw
*.antihacker2017 Xorist (variant)
*.rip KillLocker
*.rdm Radamant
*.CCCRRRPPP Unlock92
*.bript BadEncriptor
*.hnumkhotep Globe 3
*.helpmeencedfiles Samas/SamSam
*.BarRax BarRax (HiddenTear variant)
*.magic Magic
*.noproblemwedecfiles​ Samas/SamSam
*.bitstak Bitstak
*.kkk Jigsaw
*.kyra Globe
*.a5zfn Alma Locker
*.powerfulldecrypt Samas/SamSam
*.vindows Vindows Locker
*.payms Jigsaw
*.lovewindows Globe (variant)
*.p5tkjw Xorist
*.madebyadam Roga
*.conficker Conficker
*.SecureCrypted Apocalypse
*.perl Bart
*.paymts Jigsaw
*.kernel_complete KeRanger OS X
*.payrms Jigsaw
*.paymst Jigsaw
*.lcked Jigsaw (variant)
*.covid19 Phishing
*.ifuckedyou SerbRansom
*.d4nk PyL33T
*.grt Karmen HiddenTear (variant)
*.kostya Kostya
*.gefickt Jigsaw (variant)
*.covid-19 Phishing
*.kernel_pid KeRanger OS X
*.wncry Wana Decrypt0r 2.0
*.PoAr2w Xorist
*.Whereisyourfiles Samas/SamSam
*.edgel EdgeLocker
*.adk Angry Duck
*.oops Marlboro
*.theworldisyours Samas/SamSam
*.czvxce Coverton
*.crab GandCrab
*.paymrss Jigsaw
*.kimcilware KimcilWare
*.rmd Zeta
*.dxxd DXXD
*.razy Razy
*.vxlock vxLock
*.krab GandCrab v4
*.rokku Rokku
*.lock93 Lock93
*.pec PEC 2017
*.mijnal Minjal
*.kobos Kobos
*.bbawasted Bbawasted
*.rlhwasted RLHWasted
*.52pojie 52Pojie
*.FastWind Fastwind
*.spare Spare
*.eduransom Eduransom
*.RE78P RE78P
*.pstKll pstKll
*.erif
*.kook
*.xienvkdoc
*.deadfiles
*.mnbzr
*.silvertor
*.MH24
*.nile
*.ZaCaPa
*.tcwwasted
*.Spade
*.pandemic
*.covid
*.xati
*.Zyr
*.spybuster
*.ehre
*.wannacry WannaCry
*.jigsaaw
*.boop
*.Back
*.CYRAT
*.bmd
*.Fappy
*.Valley
*.copa
*.horse
*.CryForMe
*.easyransom
*.nginxhole
*.lockedv1 Lockedv1
*.ziggy Ziggy
*.booa Booa
*.nobu Nobu
*.howareyou Howareyou
*.FLAMINGO Flamingo
*.FUSION Fusion
*.pay2key Pay2Key
*.zimba Zimba, Dharma
*.luckyday Luckyday
*.bondy Bondy
*.cring Cring
*.boom Boom
*.judge Judge
*.LIZARD LIZARD
*.bonsoir Bonsoir
*.moloch Moloch
*.14x 14x
*.cnh CNH
*.DeroHE DeroHE

Enabling Ransomware Protection

Enable ransomware protection on your file server.

About this task

Procedure

  1. Go to dropdown menu > Ransomware .
  2. In the message banner, click Enable Ransomware Protection .
  3. (optional) Click Configure SMTP to add recipients .
    Note: This option appears only if you have not configured a simple mail transfer protocol (SMTP) server, see Configuring an SMTP Server.
  4. Under Ransomware Email Recipients , add at least one email address. If there is a ransomware attack, File Analytics sends a notification to the specified email address.
    Figure. Enable Ransomware Click to enlarge

  5. Click Enable .

Configuring Ransomware Protection

Configure ransomware protection on file servers.

About this task

Do the following to add signature to the blocked extension list.

Procedure

  1. Go to dropdown menu > Ransomware > > Settings .
  2. (optional) Under Search for blocked File Signatures , enter ransomware signatures in the *. (signature) format.
    1. To check that the signature has been blocked, click Search .
    2. If the signature has not been blocked, click Add to Block List .
    Figure. Click to enlarge

  3. (optional) click Download (.csv) to download a list of blocked ransomware signatures.
  4. (optional) under Ransomware Email Recipients , add a comma separated list of email addresses. If there is a ransomware attack, File Analytics sends a notification to the specified email address.
  5. (optional) click Disable Ransomware Protection , to disable the ransomware protection feature.

Enabling SSR

Enable self-service restore on shares identified by File Analytics.

About this task

File Analytics scans shares for SSR policies.

Procedure

  1. Go to dropdown menu > Ransomware .
  2. Click Enable SSR on Prism .
  3. Check the box next to the shares for which to enable SSR.
    Figure. Click to enlarge

  4. Click Enable SSR .

Reports

Generate a report for entities on the file server.

Create a report with custom attribute values or use one of the File Analytics pre-configured report templates. To create a custom report, you must specify the entity, attributes, operators for some attributes, attribute values, column headings, and the number of columns.

The reports page displays a table or previously generated reports. You can rerun existing reports rather than creating a template. After creating a report, download it as a JSON or CSV file.

Reports Dashboard

The reports dashboard includes options to create, view, and download reports.

The Reports dashboard includes options to create a report, download reports as a JSON, download reports as a CSV, rerun reports, and delete reports.

The reports table includes columns for the report name, status, last run, and actions.

Figure. Reports Dashboard Click to enlarge

Clicking Create a new report takes you to the report creation screen, which includes a Report builder and a Pre-canned Reports Templates tabs. The tabs include report options and filters for report configuration.

Both tabs include the following elements:

  • The Define Report Type section includes an Entity drop-down menu to select an entity.
  • The Define Filters section includes an Attribute drop-down menu and an option to add more attributes by clicking + Add filter .
  • The Add/remove columns in this report in your report section displays default columns. Clicking the columns field lets you add addition columns to the report. Clicking the x next to the column name removes it from the report.
  • The Define number of maximum rows in this report section includes a Count section to specify the number of rows in the report.
Table 1. Report Builder – Filter Options
Entity Attributes (filters) Operator Value Column
Events event_date
  • equal_to
  • greater_than
  • greater_than_equal_to
  • less_than
  • less_than_equal_to
(date)
  • audit_path (object path)
  • audit_objectname (object name)
  • audit_operation (operation)
  • audit_machine_name (source of operation)
  • audit_event_date (event date in UTC)
  • audit_username (user name)
Event_operation N/A
  • file_write
  • file_read
  • file_create
  • file_delete
  • rename
  • directory_create
  • directory_delete
  • SecurityChange (permission change)
  • set_attr
  • sym_link
Files Category
  • equal_to
  • greater_than
  • greater_than_equal_to
  • less_than
  • less_than_equal_to
(date)
  • object_name (file name)
  • share_UUID (share name)
  • object_owner_name (owner name)
  • object_size_logical (size)
  • file_type (extension)
  • object_creation_date (creation date in UTC)
  • last_event_date (access date in UTC)
  • share_UUID (share name)
  • fileserver_protocol
  • object_ID (file id)
  • object_last_operation_name (last operation)
  • audit_username (last operation user
  • object_last_operation_name (last operation)
  • file_path (file path)
Extensions N/A (type in value)
Deleted N/A Last (number of days from 1 to 30) days
creation_date
  • equal_to
  • greater_than
  • greater_than_equal_to
  • less_than
  • less_than_equal_to
(date)
access_date
  • equal_to
  • greater_than
  • greater_than_equal_to
  • less_than
  • less_than_equal_to
(date)
Size
  • equal_to
  • greater_than
  • greater_than_equal_to
  • less_than
  • less_than_equal_to
(number) (file size)

File size options:

  • B
  • KB
  • MB
  • GB
  • TB
Folders Deleted N/A Last (number of days from 1 to 30) days
  • object_name (Dir name)
  • object_owner_name (owner name)
  • object_creation_date (creation date in UTC)
  • last_event_date (access date in UTC)
  • share_UUID (share name)
  • object_last_operation_name (last operation)
  • audit_username (last operation user)
  • File server protocol
  • object_ID (file id)
  • file_path (Dir path)
creation_date
  • equal_to
  • greater_than
  • greater_than_equal_to
  • less_than
  • less_than_equal_to
(date)
Users last_event_date
  • equal_to
  • greater_than
  • greater_than_equal_to
  • less_than
  • less_than_equal_to
(date)
  • user_login_name (user name)
  • Last operation
  • last_event_date (access date in UTC)
  • last_operation_audit_path
Table 2. Pre-Canned Reports – Filters
Entity Pre-canned report template Columns
Events
  • PermissionDenied events
  • Permission Denied (file blocking) events
  • audit_path (object path)
  • audit_objectname (object name)
  • audit_operation (operation)
  • audit_machine_name (source of operation)
  • audit_event_date (event date in UTC)
  • audit_username (user name)
Files
  • Largest Files
  • Oldest Files
  • Files not accessed for last 1 year
  • Files accessed in last 30 days
  • object_name (file name)
  • share_UUID (share name)
  • object_owner_name (owner name)
  • object_size_logical (size)
  • file_type (extension)
  • object_creation_date (creation date in UTC)
  • last_event_date (access date in UTC)
  • share_UUID (share name)
  • fileserver_protocol
  • object_ID (file id)
  • object_last_operation_name (last operation)
  • audit_username (last operation user
  • object_last_operation_name (last operation)
  • file_path (file path)
Users
  • Top owners with space consumed
  • Top active users
  • All users
  • user_login_name (user name)
  • Last operation
  • last_event_date (access date in UTC)
  • last_operation_audit_path

Creating a Custom Report

Create a custom report by defining the entity, attribute, filters, and columns.

About this task

Procedure

  1. Go to dropdown menu > Reports .
  2. Click Create a new report .
  3. In the Report Builder tab, do the following.
    1. In the Define Report Type section, select an entity from the drop-down menu.
    2. In the Define Filters section, select an attribute from the attributes dropdown.
    3. Under Value , specify the values for the attribute (some attributes also require to specify an operator in the Operator field).
    4. (optional) click + Add filter to add more attributes.
    5. In the Add/Remove column in this report section, click x for the columns you want to remove.
    6. In the Define maximum number of rows in this report section, type in , or use the - and + buttons, to specify the number of rows in your report. This value indicates the number of records in the report.
  4. Click Run Preview .
    The Report Preview section populates.
  5. Click Generate report .
    1. Select either the CSV or JSON option.

Create a Pre-Canned Report

Use one of the pre-canned File Analytics templates for your report.

Procedure

  1. Go to dropdown menu > Reports .
  2. Click Create a new report .
  3. In the Pre-Canned Reports Templates tab, do the following.
    1. In the Define Report Type section, select an entity from the drop-down menu.
    2. In the Define Filters section, select an attribute from the attributes dropdown.
    3. In the Add/Remove column in this report section, click x for the columns you want to remove.
    4. In the Define maximum number of rows in this report section, type in , or use the - and + buttons, to specify the number of rows in your report. This value indicates the number of records in the report.
  4. Click Run Preview .
    The Report Preview section populates.
  5. Click Generate report .
    1. Select either the CSV or JSON option.

File Analytics Options

You can get more insight into the usage and contents of files on your system by configuring and updating File Analytics features and settings. Some options include scanning the files on your file server on demand, updating data retention, and configuring data protection.

Updating Data Retention

The data retention period determines how long File Analytics retains event data.

About this task

Follow the steps as indicated to configure data retention.

Procedure

  1. In File Analytics, click gear icon > Update Data Retention .
  2. In the Data Retention Period drop-down, select the period for data retention.
  3. Click Update .

Scanning the File System

Once enabled, File Analytics scans the metadata of all files and shares on the system. You can perform an on-demand scan of shares in your file system.

About this task

To scan shares, perform the following task:

Procedure

  1. In File Analytics, click the gear icon .
  2. In the drop-down list, click Scan File System .
    Figure. Scan File System Option Click to enlarge

  3. In the list of shares, select the target shares for the scan.
    Figure. Select Scan Targets Click to enlarge

  4. Click Scan .
    The status of the share is In Progress . Once the scan is complete, the status changes to Completed .

Blacklisting

Blacklist users, file extensions, and client IPs.

About this task

Use the blacklisting feature to block audit events from being performed on specified file extensions or by specified users and clients.

Procedure

  1. Click the gear icon > Define Blacklisting Rules .
  2. Click the pencil icon in the user, file extension, or client IP row.
  3. Add a comma separated list of entities that you want blocked.
  4. Click save in the updated row.

Managing File Categories

File Analytics uses the file category configuration to classify file extensions.

About this task

The capacity widget in the dashboard uses the category configuration to calculate capacity details.

Procedure

  1. Click gear icon > Manage File Category .
  2. To create a category, click + New Category . (Otherwise, move on to step 3).
    1. In the Category column, name the category.
    2. In the Extensions column, specify file extensions for the category.
  3. To delete an existing category, click the x icon next to the category. (Otherwise, move on to step 4)
  4. To modify an existing category, click the pencil icon next to the category and modify the specified file extensions.
  5. Click save .

Data Protection

Configure File Analytics disaster recovery (DR) using Prism Element.

File Analytics only supports async disaster recovery. File Analytics does not support NearSync and metro availability.

Create an async protection domain, configure a protection domain schedule, and configure remote site mapping. The remote site must have symmetric configurations to the primary site. The remote site must also deploy File Analytics to restore a File Analytics VM (FAVM).

The Data Protection section in the Prism Web Console Guide provides more detail on the disaster recovery process.

Configuring Disaster Recovery

To set up disaster recovery for File Analytics, create an async protection domain, configure a protection domain schedule, and configure remote site mapping.

About this task

By default, the File Analytics volume group resides on the same container that hosts vDisks for Nutanix Files.

Procedure

  1. If you have not done so already, configure a remote site for the local cluster.
    See the Configuring a Remote Site (Physical Cluster) topic in the Prism Web Console Guide for this procedure.
  2. Create an async DR protection domain for the File Analytics volume group as the entity. The volume group name is File_Analytics_VG .
    See Configuring a Protection Domain (Async DR) in the Prism Web Console Guide .
  3. In the Schedule tab, click the New Schedule button to add a schedule.
    Add a schedule, as File Analytics does not provide a default schedule. See Creating a Protection Domain Schedule (Files) Nutanix Files Guide.
  4. Configure local and remote container mapping.
    See the Configuring Disaster Recovery (Files) section in the Nutanix Files Guide for steps to configure mapping between local and remote containers.
  5. Create a protection domain schedule.
    See Creating a Protection Domain Schedule (Files) in the Nutanix Files Guide .

Activating Disaster Recovery

Recover a File Analytics VM (FAVM) after a planned or unplanned migration to the remote site.

About this task

Perform the following tasks on the remote site.

Procedure

  1. Fail over to the protection domain for disaster recovery activation.
    See the Failing Over a Protection Domain topic in the Prism Web Console Guide .
  2. Fail back the protection domain to the primary site.
    See the Failing Back a Protection Domain topic in the Prism Web Console Guide .

Deploying File Analytics on a Remote Site (AHV)

Deploy a File Analytics VM (FAVM) after a planned or unplanned (disaster) migration to the remote site.

About this task

To perform disaster recovery, deploy and enable File Analytics on the remote site. Restore the data using a snapshot of the volume group from the primary FAVM.

Before you begin

Ensure that the Nutanix Files and AOS versions match the versions on the remote and primary sites.

About this task

Run the following commands from the command prompt inside the FAVM.

Procedure

  1. Deploy a new File Analytics instance on the remote site, see Deploying File Analytics.
    Caution: Do not enable File Analytics.
    The remote site requires an iSCSI data service IP address to configure the FAVM on the remote site. This procedure deploys a new volume group File_Analytics_VG and deletes in a subsequent step.
  2. On the remote site, create a volume group by restoring the snapshot of the File_Analytics_VG .
    See Restoring an Entity from a Protection Domain in Data Protection and Recovery with Prism Element . For the How to Restore step, use the Create new entities option, and specify a name in the Volume Group Name Prefix field. The restored volume group name format is prefix -File_Analytics_VG.
  3. To configure the FAVM on the remote, follow these steps:
    Caution: If the IP address of the File Analytics VM has changed on the remote site, contact Nutanix Support before proceeding.
    1. Log on to the FAVM with SSH.
      Tip: See KB 1661 for default credential details.
    2. To discover all storage devices accessed by the FAVM, run the following commands.
      nutanix@favm$  sudo blkid 
    3. Copy the cvm.config file to the temporary files directory.
      nutanix@favm$ cd /mnt/containers/config/common_config /tmp
    4. Stop the File Analytics services.
      nutanix@favm$  sudo systemctl stop monitoring
      nutanix@favm$  docker stop $(docker ps -q)
      nutanix@favm$  sudo systemctl stop docker
    5. Unmount the volume group.
      nutnix@avm$ sudo umount /mnt
    6. Detach the volume group File_Analytics_VG from the FAVM.
      See the "Managing a VM (AHV)" topic in the Prism Web Console Guide .
    7. Attach the cloned volume group prefix -File_Analytics_VG to the FAVM.
      See "Managing a VM (AHV)" in the Prism Web Console Guide .
    8. Restart the AVM to discover the attached volume group.
      nutanix@avm$ sudo reboot

    9. Log on to the FAVM with SSH.
      Tip: See KB 1661 for default credential details.
    10. Discover all storage devices accessed by the FAVM.
      nutanix@favm$  sudo blkid
      /dev/sr0: UUID="2019-06-11-12-18-52-00" LABEL="cidata" TYPE="iso9660" 
      /dev/sda1: LABEL="_master-x86_64-2" UUID="b1fb6e26-a782-4cf7-b5de-32941cc92722" TYPE="ext4"
      The FAVM discovers the attached volume group and assigns to the /dev/sdb device.
    11. Delete the deployed volume group File_Analytics_VG.
      See the "Modifying a Volume Group" topic in the Prism Web Console Guide .
    12. Rename the restored volume group prefix -File_Analytics_VG to File_Analytics_VG.
      See the "Modifying a Volume Group" topic in the Prism Web Console Guide .
    13. Create a backup of the cvm.config file.
      nutanix@favm$ mv /mnt/containers/config/common_config/cvm.config \
      /mnt/containers/config/common_config/cvm_bck.config
    14. Copy the cvm.config file from the /tmp directory to /common_config/ on the FAVM.
      nutanix@favm$ mv /tmp/cvm.config /mnt/containers/config/common_config/
    15. Reconfigure the password of the user on Prism for internal FAVM operations. Specify a passphrase for new password . File Analytics uses the password only for internal communication between Prism and the FAVM. You must issue the same command twice.
      nutanix@favm$ sudo python /opt/nutanix/analytics/bin/reset_password.py --user_type=prism \
       --password='new password' --local_update
      nutanix@favm$  sudo python /opt/nutanix/analytics/bin/reset_password.py --user_type=prism \
      --password='new password' --prism_user=admin --prism_password='Prism admin password'
    16. In File Analytics, go to gear icon > Scan File System to check if a file system scan can be initiated.
      Note: If you receive errors, disable and re-enable File Analytics, see "Disabling File Analytics" and "Enabling File Analytics."

Deploying File Analytics on a Remote Site (ESXi)

Deploy a File Analytics VM (FAVM) after a planned or unplanned (disaster) migration to the remote site.

About this task

To perform disaster recovery, deploy and enable File Analytics on the remote site. Restore the data using a snapshot of the volume group from the primary FAVM.

Before you begin

Ensure that the Nutanix Files and AOS versions match the versions on the remote and primary sites.

About this task

Run the following commands from the command prompt inside the FAVM.

Procedure

  1. Deploy a new File Analytics instance on the remote site, see Deploying File Analytics.
    Caution: Do not enable File Analytics.
    The remote site requires an iSCSI data service IP address to configure the FAVM on the remote site. This procedure deploys a new volume group File_Analytics_VG and deletes in a subsequent step.
  2. On the remote site, create a volume group by restoring the snapshot of the File_Analytics_VG .
    See Restoring an Entity from a Protection Domain in Data Protection and Recovery with Prism Element . For the How to Restore step, use the Create new entities option, and specify a name in the Volume Group Name Prefix field. The restored volume group name format is prefix -File_Analytics_VG.
  3. In the Storage Table view, go to the Volumes tab.
    1. Copy the target IQN prefix from the Volume Group Details column.
      Tip: Click the tooltip to see the entire IQN prefix.
  4. To configure the FAVM on the remote, follow these steps:
    Caution: If the IP address of the File Analytics VM has changed on the remote site, contact Nutanix Support before proceeding.
    1. Log on to the FAVM with SSH.
      Tip: See KB 1661 for default credential details.
    2. To discover all storage devices accessed by the FAVM, run the following commands.
      nutanix@favm$  sudo blkid 
    3. Copy the cvm.config file to the temporary files directory.
      nutanix@favm$ cd /mnt/containers/config/common_config/ /tmp
    4. Stop the File Analytics services.
      nutanix@favm$  sudo systemctl stop monitoring
      nutanix@favm$  docker stop $(docker ps -q)
      nutanix@favm$  sudo systemctl stop docker
    5. Unmount and log off from all iSCSI targets.
      nutnix@avm$ sudo umount /mnt
      nutnix@avm$ sudo /sbin/iscsiadm -m node -u
      
    6. Remove the disconnected target records from the discoverydb mode of the FAVM.
      nutanix@favm$  sudo /sbin/iscsiadm -m node –o delete
    7. Discover all storage devices accessed by the FAVM.
      nutanix@favm$  sudo blkid
      /dev/sr0: UUID="2019-06-11-12-18-52-00" LABEL="cidata" TYPE="iso9660" 
      /dev/sda1: LABEL="_master-x86_64-2" UUID="b1fb6e26-a782-4cf7-b5de-32941cc92722" TYPE="ext4"
      The output does not show the /dev/sdb device.
    8. Get the File Analytics Linux client iSCSI initiator name.
      nutanix@favm$  sudo cat /etc/iscsi/initiatorname.iscsi
      The output displays the initiator name.
      InitiatorName=iqn.1991-05.com.redhat:8ef967b5b8f
    9. Copy the iSCSI initiator name.
    10. Remove the iSCSI initiator name from the client whitelist of the deployed volume group File_Analytics_VG.
      See the "Modifying a Volume Group" topic in the Prism Web Console Guide .
    11. Whitelist the AVM client on the cloned volume group prefix -File_Analytics_VG using the iSCSI initiator name of the AVM client.
      See the "Modifying a Volume Group" topic in the Prism Web Console Guide .
    12. Let the Analytics initiator discover the cluster and its volume groups.
      nutanix@favm$  sudo /sbin/iscsiadm --mode discovery --type sendtargets --portal  data_services_IP_address:3260
      Clicking the Nutanix cluster name in Prism displays cluster details including the data service IP address. The output displays the restored iSCSI target from step 2.
    13. Connect to the volume target by specifying IQN prefix.
      nutanix@favm$  sudo /sbin/iscsiadm --mode node --targetname iqn_name --portal data_services_IP_address:3260,1 --login
    14. Restart the FAVM to restart the iSCSI host adapters, which allows the discovery of the attached volume group.
      nutanix@favm$  sudo reboot
    15. Log on to the FAVM with SSH.
      Tip: See KB 1661 for default credential details.
    16. Discover all storage devices accessed by the FAVM.
      nutanix@favm$  sudo blkid
      The FAVM discovers the attached iSCSI volume group and assigns to the /dev/sdb device.
      /dev/sr0: UUID="2019-06-11-12-18-52-00" LABEL="cidata" TYPE="iso9660" 
      /dev/sda1: LABEL="_master-x86_64-2" UUID="b1fb6e26-a782-4cf7-b5de-32941cc92722" TYPE="ext4"
      /dev/sdb: UUID="30749ab7-58e7-437e-9a09-5f6d9619e85b" TYPE="ext4"
    17. Delete the deployed volume group File_Analytics_VG.
      See the "Modifying a Volume Group" topic in the Prism Web Console Guide .
    18. Rename the restored volume group prefix -File_Analytics_VG to File_Analytics_VG.
      See the "Modifying a Volume Group" topic in the Prism Web Console Guide .
    19. Create a backup of the cvm.config file.
      nutanix@favm$ mv /mnt/containers/config/common_config/cvm.config \
      /mnt/containers/config/common_config/cvm_bck.config
    20. Copy the cvm.config file from the /tmp directory to /common_config/ on the FAVM.
      nutanix@favm$ mv /tmp/cvm.config /mnt/containers/config/common_config/
    21. Reconfigure the password of the user on Prism for internal FAVM operations. Specify a passphrase for new password . File Analytics uses the password only for internal communication between Prism and the FAVM. You must issue the same command twice.
      nutanix@favm$ sudo python /opt/nutanix/analytics/bin/reset_password.py --user_type=prism \
       --password='new password' --local_update
      nutanix@favm$  sudo python /opt/nutanix/analytics/bin/reset_password.py --user_type=prism \
      --password='new password' --prism_user=admin --prism_password='Prism admin password'
    22. In File Analytics, go to gear icon > Scan File System to check if a file system scan can be initiated.
      Note: If you receive errors, disable and re-enable File Analytics, see "Disabling File Analytics" and "Enabling File Analytics."
Read article

File Analytics Guide

Files 2.2

Last updated: 2022-06-14

File Analytics

File Analytics provides data and statistics on the operations and contents of a file server.

Once deployed, Files adds an File Analytics VM to the Files cluster. A single File Analytics VM supports all file servers in the cluster; however, you must enable File Analytics separately for each file server. Data on the File Analytics VM is protected, and is kept in a separate volume group.

Once you deploy File Analytics, a new File Analytics link appears on the file server actions bar. You can access File Analytics through this link for any file server where it is enabled.

Figure. File Analytics VM Click to enlarge

Display Features

The File Analytics web console consists of display features:

Main menu bar : The main menu bar appears at the top of every page of the File Analytics web console. The main menu bar includes the following display features:

  • Dashboard tab : View widgets that present data on file trends, distribution, and operations.
  • Audit Trails tab : Search for a specific user or file and view various widgets to audit activity.
  • Anomalies tab : Create anomaly policies and view anomaly trends.
  • Status icon : Check the file system scan status.
  • File server drop-down : View the name of the file server for which data is displayed.
  • Settings drop-down : Manage File Analytics and configure settings.
  • Health icon : Check the health of File Analytics.
  • Admin dropdown : Collect logs and view the current File Analytics version.

Deployment Requirements

Meet the following requirements prior to deploying File Analytics.

Ensure that you have performed the following tasks and your Files deployment meets the following specifications.

  • Assign the file server administrator role to an Active Directory (AD) user, see Managing Roles in the Nutanix Files Guide .
  • Log on as the Prism admin user to deploy the File File Analytics server.
  • Configure a VLAN with one dedicated IP address for File Analytics, or you can use an IP address from an existing Files external network. This IP address must have connectivity to AD, the control VM (CVM), and Files. See "Configuring a Virtual Network For Guest VM Interfaces" in the Prism Web Console Guide.
    Note: Do not install File Analytics on the Files internal network.
  • (optional) Assign the file server administrator role to an LDAP user, see Managing Roles in the Nutanix Files Guide .
  • Ensure that all software components meet the supported configurations and system limits, see the File Analytics Release Notes .

Network Requirements

Open the required ports and ensure that your firewall allows bi-directional Internet Control Message Protocol (ICMP) traffic between the FAVM and CVMs.

The Port Reference provides detailed port information for Nutanix products and services, including port sources and destinations, service descriptions, directionality, and protocol requirements.

In addition to meeting the File Analytics network requirements, ensure to meet Nutanix Files port requirements as described in the Port Reference .

Limitations

File Analytics has the following limitations.

Note: Depending on data set size, file count, and workload type, enabling File Analytics can affect the performance of Nutanix Files. High latency is more common with heavy file-metadata operations (directory and file creation, deletion, permission changes, and so on). To minimize the impact on performance, ensure that the host has enough CPU and memory resources to handle the File Analytics VM (FAVM), file servers, and guest VMs (if any).
  • Only Prism admin users can deploy Analytics. Active Directory (AD) users and AD users mapped to Prism admin roles cannot deploy File Analytics.
  • Analytics analyzes data from 1 month up to 1 year based on the configuration. Analytics automatically deletes data beyond the defined configuration.
    Note: After surpassing the 750 million audit event threshold, Analytics archives the oldest events. Archived audit events do not appear in the Analytics UI.
  • You cannot deploy or decommission Analytics when a file server has high-availability (HA) mode enabled.
  • You cannot use network segmentation for Nutanix Volumes with File Analytics.
  • If file server DNS or IP changes, File Analytics does not automatically reconfigure.
  • File Analytics does not collect metadata for files on Kerberos authenticated NFS v4.0 shares.
  • If File Analytics is running on a one-node file server, you cannot upgrade using the Life Cycle Manager (LCM)
  • File Analytics does not support hard links.
  • You cannot enable File Analytics on a file server clone.
  • You cannot move File Analytics to another storage container.
  • File Analytics creates an unprotected Prism and an unprotected file server user for integration purposes. Do not delete these users.
  • The legacy file blocking policy has an upper limit of 300 ransomware extensions.
    Note: For higher limits, it is recommended to use Nutanix Data Lens.

Administration

Overview of administrative processes for File Analytics.

As an admin, you have the privileges to perform administrative tasks for File Analytics. To add a file server admin user, see Managing Roles in the Nutanix Files Guide . The topics in this chapter describe the basics for administering your File Analytics environment. For advanced administrative options, refer to the File Analytics Options chapter.

Deploying File Analytics

Follow this procedure to deploy the File Analytics server.

Before you begin

Ensure that your environment meets all requirements prior to deployment, see Deployment Requirements.

Procedure

Deploying the File Analytics server.
  1. Go to Support Portal > Downloads > File Analytics .
  2. Download the File Analytics QCOW2 and JSON files.
  3. Log on to Prism with the user name and password of the Prism administrator.
    Note: An Active Directory (AD) user or an AD user mapped to a Prism admin role cannot deploy File Analytics.
  4. In Prism, go to the File Server view and click the Deploy File Analytics action link.
    Figure. File Analytics
    Click to enlarge

  5. Review the File Analytics requirements and best practices in the Pre-Check dialog box.
  6. In the Deploy File Analytics Server dialog box, do the following in the Image tab.
    • Under Available versions , select one of the available File Analytics versions. (continue to step 8.).
    • Install by uploading installation binary files (continue to next step).
  7. Upload installation files.
    1. In the Upload binary section, click upload the File Analytics binary to upload the File Analytics JSON and QCOW files.
      Figure. Upload Binary Link Click to enlarge
    2. Under File Analytics Metadata File (.Json) , click Choose File to choose the downloaded JSON file.
    3. Under File Analytics Installation Binary (.Qcow2) click Choose File to choose the downloaded QCOW file.
      Figure. Upload Binary Files Click to enlarge
    4. Click Upload Now after choosing the files.
  8. Click Next .
  9. In the VM Configuration tab, do the following in the indicated fields:
    1. Name : Enter a name for the File Analytics VM (FAVM).
    2. Server Size : Select either the small or large configuration. Large file servers require larger configurations for the FAVM. By default File Analytics selects the large configuration.
    3. Storage Container: select a storage container from the drop-down.
      The drop-down only displays file server storage containers.
    4. Network List : Select a VLAN.
      Note: If the selected network is unmanaged , enter more network details in the Subnet Mask , Default Gateway IP , and IP Address fields as indicated.
      Note: The FAVM must use the client-side network.
  10. Click Deploy .
    In the main menu drop-down, select the Tasks view to monitor the deployment progress.

Results

Once deployment is complete, File Analytics creates an FAVM, CVM, and a new Files user to make REST API calls. Do not delete the CVM, FAVM, or the REST API user. A new Manage File Analytics link appears in the Prism Element File Server view.

Enabling File Analytics

Steps for enabling File Analytics after deployment or disablement.

About this task

Attention: Nutanix recommends enabling File Analytics during off-peak hours.

Follow these steps to enable File Analytics after disabling the application.

Note: File Analytics saves all previous configurations.

Procedure

  1. In the File Server view in Prism , select the target file server.
  2. (skip to step 3 if you are re-enabling a file server) click Manage roles to add a file server admin user, see Managing Roles in the Nutanix Files Guide .
  3. In the File Server view, select the target file server and click File Analytics in the tabs bar.
  4. (Skip to step 5 if you are not re-enabling a disabled instance of File Analytics) to re-enable File Analytics, click Enable File Analytics in the message bar.
    Figure. Enabling File Analytics Link Click to enlarge
    The Enable File Analytics dialog-box appears. Skip the remaining steps.
  5. In the Data Retention field, select a data retention period. The data retention period refers to the length of time File Analytics retains audit events.
  6. In the Authentication section, enter the credentials as indicated:
    Note: AD passwords for the file server admin cannot contain the following special characters: comma (,), single quote ('), double quote ("). Using the special characters in passwords prevents File Analytics from performing file system scans.
    1. (For SMB users only) In the SMB section, do the following in the indicated fields to provide SMB authentication details:
      • Active Directory Realm Name : Confirm the AD realm name for the file server.
      • Username : Enter the AD username for the file server administrator, see File Analytics Prerequisites .
      • Password : Enter the AD user password for the file server administrator.
    2. (For NFS users only) In the NFS Authentication section, do the following in the indicated fields to provide NFS authentication details:
      • LDAP Server URI : Enter the URI of the LDAP server.
      • Base DN : Enter the base DN for the LDAP server.
      • Password : Enter the LDAP user password for the file server administrator.

    Click to enlarge

  7. Click Enable .

Results

After enablement, File Analytics performs a one-time file system scan to pull metadata information. The duration of the scan varies depending on the protocol of the share. There is no system downtime during the scan.

Example

Scanning 3–4 million NFS files or 1 million SMB files takes about 1 hour.

Disabling File Analytics

About this task

Follow the steps as indicated to disable File Analytics.

Procedure

  1. In File Analytics click the gear icon > Disable File Analytics .
  2. In the dialog-box, click Disable .
    Disabling File Analytics disables data collection. The following message banner appears.
     File Analytics is disabled on the server. Enable File Analytics to start collecting data again or Delete File Analytics Data. 

What to do next

To delete data, click the Delete File Analytics Data link in the banner described in Step 2.

Launching File Analytics

About this task

Do the following to launch File Analytics.

Procedure

  1. From the Prism views drop-down, select the File Server view.
  2. Select the target file server from the entity tab.
  3. Click the File Analytics action button below the entity table.
    Figure. Launch File Analytics Click to enlarge The File Analytics action button.

File Analytics VM Management

To update an File Analytics VM (FAVM), refer to the sizing guidelines in the File Analytics release notes and follow the steps in the VM Management topic of the Prism Web Console Guide .

Removing File Analytics VMs

Remove a File Analytics VM (FAVM) by disabling it and deleting it from the cluster in Prism.

About this task

Follow the steps as indicated to remove an FAVM.
Note: Do not delete an FAVM using the CLI, as this operation does not decommission the FAVM.

Procedure

  1. Disable File Analytics on all file servers in the cluster, see Disabling File Analytics.
  2. In the File Server view in Prism Element, do the following:
    1. In the top actions bar, click Manage File Analytics .
    2. Click Delete to remove the FAVM.
    When you delete an FAVM, you also delete all of your File Analytics configurations and audit data stored on the FAVM.

Updating Credentials

About this task

Follow the steps as indicated to update authentication credentials for LDAP or Active Directory.

Procedure

  1. Click gear icon > Update AD/LDAP Configuration .
  2. To update Active Directory credentials, do the following in the indicated fields (otherwise move on to the next step).
    Note: AD passwords for the file server admin cannot contain the following special characters: comma (,), single quote ('), double quote ("). Using the special characters in passwords prevents File Analytics from performing file system scans.
    1. Active Directory Realm Name: confirm or replace the realm name.
    2. Username: confirm or replace the username.
    3. Password: type in the new password.
  3. To update NFS configuration, do the following (otherwise move on to the next step).
    1. LDAP Server URI: confirm or replace the server URI.
    2. Base DN: confirm or replace the base distinguished name (DN).
    3. Bind DN (Optional): confirm or replace the bind distinguished name (DN).
    4. Password: type in the new password.
  4. Click Save .

Managing Deleted Share/Export Audits

Manage the audit data of delete shares and exports.

About this task

By default, File Analytics retains deleted share and export data. The dashboard widgets do not account for data of deleted shares and exports. The deleted marker appears adjacent to deleted shares and exports in audit trails. The Manage Share/Export Audit data window displays a list of deleted shares and exports.

Follow the directions as indicated to delete audit data for the deleted share or export.

Note: You cannot restore the deleted audit data of a deleted share or export.

Procedure

  1. Click the gear icon > Manage Deleted Share/Export Audit .
  2. Check the box adjacent to the share or export name.
  3. Click Delete .
  4. In the confirmation window, click Delete to confirm the deletion of data.
    In the Manage Deleted Share/Export Audit , a progress bar displays the progress of the deletion process next to the share name. File Analytics considers data deletion of a deleted share a low-priority task, which may take several hours to finish.

Upgrades

Perform File Analytics upgrades using the Life Cycle Manager feature in Prism Element.

Before you upgrade File Analytics, ensure that you are running a compatible version of AOS and Files. Refer to File Analytics release notes for compatibility details. You can upgrade both AOS and Files through Prism Element, see AOS Upgrade in the Prism Web Console Guide .

To upgrade File Analytics, perform inventory and updates using the Life-Cycle Manager (LCM), see the Life Cycle Manager Guide for instructions on performing inventory and updates. LCM cannot upgrade File Analytics when the protection domain (PD) for the File Analytics VM (FAVM) includes any other entities.

Note: The File Analytics UI is not accessible during upgrades.

During the upgrade process, File Analytics takes a snapshot of the volume group (VG) that contains File Analytics data. If issues occur during an upgrade, File Analytics restores the FAVM to the pre-upgrade state. If the volume group is protected and is part a protection domain, the File Analytics creates a snapshot and sets the expiry time to 30 days. If the volume group is not protected, File Analytics creates a snapshot and deletes the snapshot after completing the upgrade successfully. If any errors occur, the system keeps the snapshot for 30 days to troubleshoot the issue.

Upgrade File Analytics at a Dark Site

Upgrade File Analytics at a dark site using the Life-Cycle Manager (LCM).

About this task

Before you begin

You need a local web server reachable by your Nutanix clusters to host the LCM repository.

Procedure

  1. From a device that has public Internet access, go to Nutanix Portal > Downloads > Tools & Firmware .
    1. Download the tar file lcm_dark_site_version.tar.gz .
    2. Transfer lcm_dark_site_version.tar.gz to your local web server and untar into the release directory.
  2. From a device that has public Internet access, go to the Nutanix portal and select Downloads > File Analytics .
    1. Download the following files.
      • file_analytics_dark_site_version.tar.gz
      • nutanix_compatibility.tgz
      • nutanix_compatibility.tgz.sign
    2. Transfer file_analytics_dark_site_version.tar.gz to your local web server and untar into the release directory.
    3. Transfer the nutanix_compatibility.tgz and nutanix_compatibility.tgz.sign files to your local web server (overwrite existing files as needed).
  3. Log on to Prism Element.
  4. Click Home > LCM > > Settings .
    1. In the Fetch updates from field, enter the path to the directory where you extracted the tar file on your local server. Use the format http://webserver_IP_address/release .
    2. Click Save .
      You return to the Life Cycle Manager.
    3. In the LCM sidebar, click Inventory > Perform Inventory .
    4. Update the LCM framework before trying to update any other component.
      The LCM sidebar shows the LCM framework with the same version as the file you downloaded.

Dashboard

The Dashboard tab displays data on the operational trends of a file server.

Dashboard View

The Dashboard tab is the opening screen that appears after launching File Analytics from Prism. The dashboard displays widgets that present data on file trends, distribution, and operations.

Figure. File Analytics Dashboard Click to enlarge File Analytics data panes in the Dashboard view.

Table 1. Dashboard Widgets
Tile Name Description Intervals
Capacity Trend Displays capacity trends for the file server including capacity added, capacity removed, and net changes.

Clicking an event period widget displays the Capacity Trend Details view.

Seven days, the last 30 days, or the last 1 year.
Data Age Displays the percentage of data by age. Less than 3 months, 3–6 months, 6–12 months, and > 12 months.
Anomaly Alerts Displays alerts for configured anomalies, see Configuring Anomaly Detection.
Permission Denials Displays users who have had excessive permission denials and the number of denials. Clicking a user displays audit details, see Audit Trails - Users for more. [user id], [number of permission denials]
File Distribution by Size Displays the number of files by file size. Provides trend details for top 5 files. Less than 1 MB, 1–10 MB, 10–100 MB, 100 MB to 1 GB, greater than 1 GB).
File Distribution by Type Displays the space taken up by various applications and file types. The file type is determined by the file extension. See the File Types table for more details. MB or GB
File Distribution by Type Details view Displays a trend graph of the top 5 file types. File distribution details include file type, current space used, current number of file, and change in space for the last 7 or 30 days.

Clicking View Details displays the File Distribution by Type view.
Daily size trend for top 5 files (GB), file type (see File Type table), current space used (GB), current number of files (numeric), change in last 7 or 30 days (GB).
Top 5 active users Lists the users who have accessed the most files and number of operations the user performed for the specified period. When there are more than 5 active users, the more link provides details on the top 50 users. Clicking the user name displays the audit view for the user, see Audit Trails - Users for more. 24 hours, 7 days, 1 month, or 1 year.
Top 5 accessed files Lists the 5 most frequently accessed files. Clicking more provides details on the top 50 files.

Clicking the file name displays the audit view details for the file, see Audit Trails - Files for more.

Twenty-four hours, 7 days, 1 month, or 1 year.
Files Operations Displays the distribution of operation types for the specified period including a count for each operation type and the total sum of all operations.

Operations include: create, delete, read, write, rename, permission changed, set attribute, symlink, permission denied, permission denied (file blocking).

Clicking an operation displays the File Operation Trend view.
Twenty-four hours, 7 days, 1 month, or 1 year.

Capacity Trend Details

Clicking an event period in the Capacity Trend widget displays the Capacity Trend Details view for that period. The view includes three tabs Share/Export , Folder , and Category . Each tab includes columns detailing entity details: Name . Net Capacity Change, Capacity Added, and Capacity Removed.

Figure. Capacity Trend Details View Click to enlarge Clicking on the Capacity Trend widget in the Dashboard tab displays the Capacity Trend Details view.

Table 2. Capacity Trend Details
Category Supported File Type
Name Name of share/export, folder, or category.
Net Capacity Change The total difference between capacity at the beginning and the end of the specified period.
Share Name (for folders only) The name of the share or export that the folder belongs to.
Capacity Added Total added capacity for the specified period.
Capacity Removed Total removed capacity for the specified period.

File Distribution by Type Details

Clicking View Details for the File Distribution by Type widget displays granular details of file distribution, see the File Types table below for details.

Figure. File Distribution by Type Click to enlarge Clicking View Details on the File Distribution by Type widget displays the File Distribution by Type dashboard.

Table 3. Details of File Distribution Parameters
Category Supported File Type
File Type Name of file type
Current Space Used Space capacity occupied by the file type
Current Number of Files Number of files for the file type
Change (In Last 30 Days) The increase in capacity over a 30 day period of time for the specified file type .
Table 4. File Types
Category Supported File Type
Archives .cab, .gz, .rar, .tar, .z, .zip
Audio .aiff, .au, .mp3, .mp4, .wav, .wma
Backups .bak, .bkf, .bkp
CD/DVD Images .img, .iso, .nrg
Desktop Publishing .qxd
Email Archives .pst
Hard Drive images .tib, .gho, .ghs
Images .bmp, .gif, .jpg, .jpeg, .pdf .png, .psd, .tif, .tiff,
Installers .msi, .rpm
Log Files .log
Lotus Notes .box, .ncf, .nsf, .ns2, .ns3, .ns4, .ntf
MS Office Documents .accdb, .accde, .accdt, .accdr, .doc, .docx, .docm, .dot, .dotx, .dotm, .xls, .xlsx, .xlsm, .xlt, .xltx, .xltm, .xlsb, .xlam, .ppt, .pptx, .pptm, .potx, .potm, .ppam, .ppsx, .ppsm, .mdb
System Files .bin, .dll, .exe
Text Files .csv, .pdf, .txt
Video .avi, mpg, .mpeg, .mov, .m4v
Disk Image .hlog, .nvram, .vmdk, .vmx, .vmxf, .vmtm, .vmem, .vmsn, .vmsd

File Operation Trend

Clicking an operation type in the File Operations widget displays the File Operation Trend view. The File Operation Trend view breaks down the specified period into smaller intervals, and displays the number of occurrences of the operation during each interval.

Figure. Operation Trend Click to enlarge A graph displays the number of times the specified operation took place over time.

Table 5. File Operation Trend View Parameters
Category Description
Operation Type A drop-down option to specify the operation type. See Files Operations in the Dashboard Widgets table for a list of operation types.
Last (time period) A drop-down option to specify the period for the file operation trend.
File operation trend graph The x-axis displays shorter intervals for the specified period. The y-axis displays the number of operations trend over the extent of the intervals.

Managing File Categories

File Analytics uses the file category configuration to classify file extensions.

About this task

The capacity widget in the dashboard uses the category configuration to calculate capacity details.

Procedure

  1. Click gear icon > Manage File Category .
  2. To create a new category, click + New Category . (Otherwise, move on to step 3).
    1. In the Category column, name the category.
    2. In the Extensions column, specify file extensions for the category.
  3. To delete an existing category, click the x icon next to the category. (Otherwise, move on to step 4)
  4. To modify an existing category, click the pencil icon next to the category and modify the specified file extensions.
  5. Click save .

Health

The Health dashboard displays dynamically updated health information about each File File Analytics component.

The Health dashboard includes the following details:

  • Data Summary Data summary of all file servers with File Analytics enabled.
  • Host Memory Percent of used memory on the File Analytics VM (FAVM).
  • Host CPU Usage Percent of CPU used by the FAVM.
  • Storage Summary Amount of storage space used on the File Analytics data disk or FAVM disk.
  • Overall Health Overall health of File Analytics components.
  • Data Server Summary Data server usage by component.
Figure. Health Page Click to enlarge The Health page dashboard includes tiles that dynamically update to indicate the health of relevant entities.

Anomalies

Data panes in the Anomalies tab display data and trends for configured anomalies.

The Anomalies tab provides options for creating anomaly policies and displays dashboards for viewing anomaly trends.
Note: Configure an SMTP server to send anomaly alerts, see Configuring an SMTP Server

You can configure anomalies for the following operations:

  • Creating files and directories
  • Deleting files and directories
  • Permission changes
  • Permission denials
  • Renaming files and directories
  • Reading files and directories

Define anomaly rules by the specifying the following conditions:

  • Users exceed an operation count threshold
  • Users exceed an operation percentage threshold

Meeting the lower operation threshold triggers an anomaly.

Consider a scenario where you have 1 thousand files, the operation count threshold defined as 10, and the operation percentage threshold defined as 10%. The count threshold takes precedence, as 10% of 1 thousand is 100, which is greater than the count threshold of 10.

Figure. Anomalies Dashboard Click to enlarge The Anomalies dashboard displays anomaly trends.

Table 1. Anomalies Data Pane Descriptions
Pane Name Description Values
Anomaly Trend Displays the number of anomalies per day or per month. Last 7 days, Last 30 days, Last 1 year
Top Users Displays the users with the most anomalies and the number of anomalies per user. Last 7 days, Last 30 days, Last 1 year
Top Folders Displays the folders with the most anomalies and the number of anomalies per folder. Last 7 days, Last 30 days, Last 1 year
Operation Anomaly Types Displays the percentage of occurrences per anomaly type. Last 7 days, Last 30 days, Last 1 year

Anomaly Details

Clicking an anomaly bar in the Anomaly Trend graph displays the Anomaly Details view.

Figure. Anomaly Details View Click to enlarge

Table 2. Anomalies Details View Total Results Table
Column Description
Anomaly Type The configured anomaly type. Anomaly types not configured do not show up in the table.
Total User Count The number of users that have performed the operation causing the specified anomaly during the specified time range.
Total Folder Count The numbers of folders in which the anomaly occurred during the specified time range.
Total Operation Count Total number of anomalies for the specified anomaly type that occurred during the specified time range.
Time Range The time range for which the total user count, total folder count, and total operation count are specified.
Table 3. Anomalies Details View Users/Folders Table
Column Description
Username or Folders Indicates the entity for the operation count. Selecting the Users tab indicates operation count for specific users, and selecting the Folders tab indicates the operation count for specific folders.
Operation count The total number of operations causing anomalies for the selected user or folder during the time period for the bar in the Anomaly Trend graph.

Configuring Anomaly Detection

Steps for configuring anomaly rules.

About this task

Configure an SMTP server for File Analytics to send anomaly alerts, see Configuring an SMTP Server. To create an anomaly rule, do the following.

Procedure

  1. In the File Analytics web console, click the gear icon > Define Anomaly Rules. .
  2. In the Anomaly Email Recipients field, enter a comma-separated list of email recipients for all anomaly alerts and data.
    Note: File Analytics sends anomaly alerts and data to recipients whenever File Analytics detects an anomaly.
  3. To configure a new anomaly, do the following in the indicated fields:
    1. Events : Select a rule for the anomaly from one of the following:
      • Permission changed
      • Permission denied
      • Delete
      • Create
      • Rename
      • Read
      The event defines the scenario type for the anomaly.
    2. Minimum Operations % : Enter a percentage value for the minimum threshold.
      File Analytics calculates the minimum operations percentage based on the number of files. For example, if there are 100 files, and you set the minimum operations percentage to 5, five operations within the scan interval would trigger an anomaly alert.
    3. Minimum Operation Count : Enter a value for a minimum operation threshold.
      File Analytics triggers an anomaly alert after meeting the threshold.
    4. User : Choose if the anomaly rule is applicable for All Users or an Individual user.
    5. Type: the type determines the interval.
      The interval determines how far back File Analytics monitors the anomaly.
    6. Interval : Enter a value for the detection interval.
    7. (optional) Actions : Click the pencil icon to update an anomaly rule. Click the x icon to delete an existing rule.
    Figure. Anomaly Configuration Fields Click to enlarge Fill out these fields to configure a new anomaly rule.

  4. Click Save .

Configuring an SMTP Server

File Analytics uses a simple mail transport protocol (SMTP) server to send anomaly alerts.

About this task

To configure an SMTP server, do the following:

Procedure

  1. In the File Analytics web console, click the gear icon > SMTP Configuration .
  2. In the SMTP Configuration window, enter the indicated details in the following fields:
    1. Hostname Or IP Address : Enter a fully qualified domain name or IP address for the SMTP server.
    2. Port : Enter the port to use.
      The standard SMTP ports are 25 (encrypted), 587 (TLS), and 465 (SSL).
    3. Security Mode : Enter the desired security mode from the dropdown list.
      The options are:
      • NONE (unencrypted)
      • STARTTLS (TTL encryption)
      • SSL (SSL encryption)
    4. (If security mode is "NONE" go to step f.)
    5. User Name enter a user name for logging into the SMTP server. Depending on the authentication method, the user name may require a domain.
    6. Password enter password.
    7. From Email Address: enter the email address from which File Analytics will send the anomaly alerts.
    8. Recipient Email Address: enter a recipient email address to test the SMTP configuration.
    Figure. SMTP Configuration Click to enlarge Fields for configuring an SMTP server.

  3. Click Save .

Audit Trails

Use audit trails to look up operation data for a specific user, file, folder, or client.

The Audit Trails tab includes Files , Folders , Users , and Client IP options for specifying the audit type. Use the search bar for specifying the specific entity for the audit (user, folder, file, or client IP).

The results table presents details for entities that match the search criteria. Clicking the entity name (or client IP number) takes you to the Audit Trails dashboard for the target entity.

View Audit Trails

Audit a user, file, client, or folder.

About this task

Procedure

  1. Click the Audit Trails tab.
  2. Select the Files , Folders , Users , or Client IP option.
  3. Enter the audit trails target into the search bar.
  4. Click Search .
  5. To display audit results in the Audit Trails window, click the entity name (or client IP number).

Audit Trails - Users

Details for client IP Audit Trails.

Audit Trails Search - Users

When you search by user in the Audit Trails tab, search results display the following information in a table.

  • User Name
  • Domain
  • Last Operation
  • Last Operation On
  • Share Name
  • Operation Date
  • Action
Figure. Users Search Results Click to enlarge A table displays user search results for the query.

Audit Details Page - Users

Clicking View Audit displays the Audit Details page, which shows the following audit information for the selected user.

  • A User Events graph displays various operations the user performed during the selected period and the percentage of time each operation has occurred per total operations during the specified period.
    • The Filter by operations dropdown contains operation filters, which you can use to filter the audit by operation type. Operations include:
      • Create File
      • Delete
      • Make Directory
      • Permission Changed
      • Permission Denied
      • Read
      • Remove Directory
      • Rename
      • Set Attribute
      • Write
      • Symlink
    • The filter bar , above the User Events graph, displays the filters in use.
    • Use the From and To fields to filter by date.
  • The Results table displays operation-specific information. See more details below.
  • The Reset Filters button removes all filters.
Figure. User Audit Details - Events Click to enlarge User Events table displays event rates for various operations performed by the user.

The Results table provides granular details of the audit results. The following data is displayed for every event.

  • User Name
  • User IP Address
  • Operation
  • Operation Date
  • Target File

Click the gear icon for options to download the data as an xls, csv, or JSON file.

Note: The maximum limitation of downloading events to CSV and JSON format is 10,000.
Figure. Results Table Click to enlarge The results table displays a detailed view of the audit data.

Audit Trails - Folders

Dashboard details for folder audits.

The following information displays when you search by file in the Audit Trails tab.

  • Folder Name
  • Folder Owner Name
  • Share Name
  • Parent Folder
  • Last Operation
  • Last Operation By
  • Last Operation Date
  • Action
Figure. Folders Search Results Click to enlarge

The Audit Details page shows the following audit information for the selected folder.

  • A Folder Events graph displays various operations performed on the file during the selected period, and the percentage of time each operation has occurred per total operations during that period.
    • The Filter by operations dropdown contains operation filters, which you can use to filter the audit by operation type. Operations include:
      • Select All
      • Make Directory
      • Permission Changed
      • Permission Denied
      • Remove Directory
      • Rename
      • Set Attribute
    • A filter bar , above the File Events graph displays the filters in use.
    • Use the From and to fields to filter by date.
  • The Results table displays operation-specific details.
  • The Reset Filters button removes all filters.

The Results table provides granular details of the audit results. File Analytics displays the following data for every event.

  • User Name
  • Client IP
  • Operation
  • Operation Date

Click the gear icon for options to download the data as a CSV file.

Note: The maximum limitation of downloading events to CSV and JSON format is 10,000.

Audit Trails - Files

Dashboards details for file audit.

Audit Trails for Files

When you search by file in the Audit Trails tab, the following information displays:

  • File Name
  • File Owner Name
  • Share Name
  • Parent Folder
  • Last Operation
  • Last Operation By
  • Last Operation Date
  • Action
Figure. Files Search Results Click to enlarge A table displays file search results for the query.

Note: File Analytics does not support regular-expression (RegEx) based search.

The Audit Details page shows the following audit information for the selected file.

  • A File Events graph displays various operations performed on the file during the selected period, and the percentage of time each operation has occurred per total operations during that period.
    • The Filter by operations dropdown contains operation filters, which you can use to filter the audit by operation type. Operation types include:
      • Close File
      • Create File
      • Delete
      • Make Directory
      • Open
      • Read
      • Rename
      • Set Attribute
      • Write
      • Symlink
    • A filter bar , above the File Events graph displays the filters in use.
    • Use the From and to fields to filter by date.
  • The Results table displays operation-specific details.
    • The Reset Filters button removes all filters.
Figure. Files Audit Details - Events Click to enlarge File Events table displays event rates for various operations for the file.

The Results table provides granular details of the audit results. File Analytics displays the following data for every event.

  • Username
  • Client IP
  • Operation
  • Operation Date

Click the gear icon for options to download the data as a CSV file.

Note: The maximum limitation of downloading events to CSV and JSON format is 10,000.
Figure. Results Table Click to enlarge The results table displays a detailed view of the audit data.

Audit Trails - Client IP

Dashboard details for client IP Audit Trails.

Audit Trails Search - Client IP

When you search by client IP in the Audit Trails tab, search results display the following information in a table.

  • Client IP
  • User Name
  • Domain
  • Last Operation
  • Last Operation On
  • Share Name
  • Operation Date
  • Action
Figure. IP Search Results Click to enlarge A table displays IP search results for the query

The Audit Details page shows the following audit information for the selected client.

  • A User Events graph displays various operations performed on the client during the selected period, and the percentage of time each operation has occurred per total operations during that period.
    • The Filter by operations dropdown contains operation filters, which you can use to filter the audit by operation type. Operation types include:
      • Create File
      • Delete
      • Make Directory
      • Permission Changed
      • Permission Denied
      • Read
      • Removed Directory
      • Rename
      • Set Attribute
      • Write
      • Symlink
      • Permission Denied (File Blocking)
    • A filter bar , above the File Events graph displays the filters in use.
    • Use the From and to fields to filter by date.
  • The Results table displays operation-specific details.
    • The Reset Filters button removes all filters.
Figure. Files Audit Details - Events Click to enlarge File Events table displays event rates for various operations for the file.

The Results table provides granular details of the audit results. File Analytics displays the following data for every event.

  • User Name
  • Operation
  • Target File
  • Operation Date

Click the gear icon for an option to download the data as a CSV file.

Note: The maximum limitation of downloading events to CSV and JSON format is 10,000.

File Analytics Options

You can get more insight into the usage and contents of files on your system by configuring and updating File Analytics features and settings. Some options include scanning the files on your file server on demand, updating data retention, and configuring data protection.

Updating Data Retention

The data retention period determines how long File Analytics retains event data.

About this task

Follow the steps as indicated to configure data retention.

Procedure

  1. In File Analytics, click gear icon > Update Data Retention .
  2. In the Data Retention Period drop-down, select the period for data retention.
  3. Click Update .

Scanning the File System

Once enabled, File Analytics scans the metadata of all files and shares on the system. You can perform an on-demand scan of shares in your file system.

About this task

To scan shares, perform the following task:

Procedure

  1. In File Analytics, click the gear icon .
  2. In the drop-down list, click Scan File System .
    Figure. Scan File System Option Click to enlarge

  3. In the list of shares, select the target shares for the scan.
    Figure. Select Scan Targets Click to enlarge

  4. Click Scan .
    The status of the share is In Progress . Once the scan is complete, the status changes to Completed .

Blacklisting

Blacklist users, file extensions, and client IPs.

About this task

Use the blacklisting feature to block audit events from being performed on specified file extensions or by specified users and clients.

Procedure

  1. Click the gear icon > Define Blacklisting Rules .
  2. Click the pencil icon in the user, file extension, or client IP row.
  3. Add a comma separated list of entities that you want blocked.
  4. Click save in the updated row.

Data Protection

Configure File Analytics disaster recovery (DR) using Prism Element.

File Analytics only supports async disaster recovery. File Analytics does not support NearSync and metro availability.

Create an async protection domain, configure a protection domain schedule, and configure remote site mapping. The remote site must have symmetric configurations to the primary site. The remote site must also deploy File Analytics to restore a File Analytics VM (FAVM).

The Data Protection section in the Prism Web Console Guide provides more detail on the disaster recovery process.

Configuring Disaster Recovery

To set up disaster recovery for File Analytics, create an async protection domain, configure a protection domain schedule, and configure remote site mapping.

About this task

By default, the File Analytics volume group resides on the same container that hosts vDisks for Nutanix Files.

Procedure

  1. If you have not done so already, configure a remote site for the local cluster.
    See the Configuring a Remote Site (Physical Cluster) topic in the Prism Web Console Guide for this procedure.
  2. Create an async DR protection domain for the File Analytics volume group as the entity. The volume group name is File_Analytics_VG .
    See Configuring a Protection Domain (Async DR) in the Prism Web Console Guide .
  3. In the Schedule tab, click the New Schedule button to add a schedule.
    Add a schedule, as File Analytics does not provide a default schedule. See Creating a Protection Domain Schedule (Files) Nutanix Files Guide.
  4. Configure local and remote container mapping.
    See the Configuring Disaster Recovery (Files) section in the Nutanix Files Guide for steps to configure mapping between local and remote containers.
  5. Create a protection domain schedule.
    See Creating a Protection Domain Schedule (Files) in the Nutanix Files Guide .

Activating Disaster Recovery

Recover a File Analytics VM (FAVM) after a planned or unplanned migration to the remote site.

About this task

Perform the following tasks on the remote site.

Procedure

  1. Fail over to the protection domain for disaster recovery activation.
    See the Failing Over a Protection Domain topic in the Prism Web Console Guide .
  2. Fail back the protection domain to the primary site.
    See the Failing Back a Protection Domain topic in the Prism Web Console Guide .

Deploying File Analytics on a Remote Site

Deploy a File Analytics VM (FAVM) after a planned or unplanned (disaster) migration to the remote site.

About this task

To perform disaster recovery, deploy and enable File Analytics on the remote site. Restore the data using a snapshot of the volume group from the primary FAVM.

Before you begin

Ensure that the Nutanix Files and AOS versions match the versions on the remote and primary sites.

About this task

Run the following commands from the command prompt inside the FAVM.

Procedure

  1. Deploy a new File Analytics instance on the remote site, see Deploying File Analytics.
    Caution: Do not enable File Analytics.
    The remote site requires an iSCSI data service IP address to configure the FAVM on the remote site. This procedure deploys a new volume group File_Analytics_VG and deletes it in a subsequent step.
  2. On the remote site, create a volume group by restoring the snapshot of the File_Analytics_VG .
    See Restoring an Entity from a Protection Domain in the Prism Web Console Guide . For the How to Restore step, use the Create new entities option, and specify a name in the Volume Group Name Prefix field. The restored volume group name format is prefix -File_Analytics_VG.
  3. In the Storage Table view, go to the Volumes tab.
    1. Copy the target IQN prefix from the Volume Group Details column.
      Tip: Click the tooltip to see the entire IQN prefix.
  4. To configure the FAVM on the remote, follow these steps:
    Caution: If the IP address of the File Analytics VM has changed on the remote site, contact Nutanix Support before proceeding.
    1. Log on to the FAVM with SSH.
      Tip: See KB 1661 for default credential details.
    2. To discover all storage devices accessed by the FAVM, run the following commands.
      nutanix@favm$  sudo blkid 
    3. Copy the cvm.config file to the temporary files directory.
      nutanix@favm$ cd /mnt/containers/config/common_config/
      nutnix@avm$ sudo cp cvm.config /tmp
    4. Stop the File Analytics services.
      nutanix@favm$  sudo systemctl stop monitoring
      nutanix@favm$  docker stop $(docker ps -q)
      nutanix@favm$  sudo systemctl stop docker
    5. Unmount and log off from all iSCSI targets.
      nutnix@avm$ sudo umount /mnt
      nutnix@avm$ sudo /sbin/iscsiadm -m node -u
      
    6. Remove the disconnected target records from the discoverydb mode of the FAVM.
      nutanix@favm$  sudo /sbin/iscsiadm -m node –o delete
    7. Discover all storage devices accessed by the FAVM.
      nutanix@favm$  sudo blkid
      /dev/sr0: UUID="2019-06-11-12-18-52-00" LABEL="cidata" TYPE="iso9660" 
      /dev/sda1: LABEL="_master-x86_64-2" UUID="b1fb6e26-a782-4cf7-b5de-32941cc92722" TYPE="ext4"
      The output does not show the /dev/sdb device.
    8. Get the File Analytics Linux client iSCSI initiator name.
      nutanix@favm$  sudo cat /etc/iscsi/initiatorname.iscsi
      The output displays the initiator name.
      InitiatorName=iqn.1991-05.com.redhat:8ef967b5b8f
    9. Copy the iSCSI initiator name.
    10. Remove the iSCSI initiator name from the client whitelist of the deployed volume group File_Analytics_VG.
      See the "Modifying a Volume Group" topic in the Prism Web Console Guide .
    11. Whitelist the AVM client on the cloned volume group prefix -File_Analytics_VG using the iSCSI initiator name of the AVM client.
      See the "Modifying a Volume Group" topic in the Prism Web Console Guide .
    12. Let the Analytics initiator discover the cluster and its volume groups.
      nutanix@favm$  sudo /sbin/iscsiadm --mode discovery --type sendtargets --portal  data_services_IP_address:3260
      Clicking the Nutanix cluster name in Prism displays cluster details including the data service IP address. The output displays the restored iSCSI target from step 2.
    13. Connect to the volume target by specifying IQN prefix.
      nutanix@favm$  sudo /sbin/iscsiadm --mode node --targetname iqn_name --portal data_services_IP_address:3260,1 --login
    14. Restart the FAVM to restart the iSCSI host adapters, which allows the discovery of the attached volume group.
      nutanix@favm$  sudo reboot
    15. Log on to the FAVM with SSH.
      Tip: See KB 1661 for default credential details.
    16. Discover all storage devices accessed by the FAVM.
      nutanix@favm$  sudo blkid
      The FAVM discovers the attached iSCSI volume group and assigns to the /dev/sdb device.
      /dev/sr0: UUID="2019-06-11-12-18-52-00" LABEL="cidata" TYPE="iso9660" 
      /dev/sda1: LABEL="_master-x86_64-2" UUID="b1fb6e26-a782-4cf7-b5de-32941cc92722" TYPE="ext4"
      /dev/sdb: UUID="30749ab7-58e7-437e-9a09-5f6d9619e85b" TYPE="ext4"
    17. Delete the deployed volume group File_Analytics_VG.
      See the "Modifying a Volume Group" topic in the Prism Web Console Guide .
    18. Rename the restored volume group prefix -File_Analytics_VG to File_Analytics_VG.
      See the "Modifying a Volume Group" topic in the Prism Web Console Guide .
    19. Create a backup of the cvm.config file.
      nutanix@favm$ cd /mnt/containers/config/common_config/
      nutanix@favm$ mv cvm.config cvm_bck.config
    20. Copy the cvm.config file from the /tmp directory to /common_config/ on the FAVM.
      nutanix@favm$ cd /tmp
      nutanix@favm$ mv cvm.config /mnt/containers/config/common_config/
    21. Reconfigure the password of the user on Prism for internal FAVM operations. Specify a passphrase for new password . File Analytics uses the password only for internal communication between Prism and the FAVM. You must issue the same command twice.
      nutanix@favm$ sudo python /opt/nutanix/analytics/bin/reset_password.py --user_type=prism \
       --password='new password' --local_update
      nutanix@favm$  sudo python /opt/nutanix/analytics/bin/reset_password.py --user_type=prism \
      --password='new password' --prism_user=admin --prism_password='Prism admin password'
    22. In File Analytics, go to gear icon > Scan File System to check if a file system scan can be initiated.
      Note: If you receive errors, disable and re-enable File Analytics, see "Disabling File Analytics" and "Enabling File Analytics."
Read article

File Analytics Guide

Files 3.1

Product Release Date: 2022-04-05

Last updated: 2022-11-04

File Analytics

File Analytics provides data and statistics on the operations and contents of a file server.

Once deployed, Nutanix Files adds a File Analytics VM (FAVM) to the Files cluster. A single File Analytics VM supports all file servers in the cluster; however, you must enable File Analytics separately for each file server. File Analytics protects data on the FAVM, which is kept in a separate volume group.

Once you deploy File Analytics, a new File Analytics link appears on the file server actions bar. Use the link to access File Analytics on any file server that has File Analytics enabled.
Note: File Analytics supports dual NIC configuration for segmented networks. Contact Nutanix Support for assistance.
Figure. File Analytics VM Click to enlarge

Display Features

The File Analytics web console consists of display features:

Main menu bar : The main menu bar appears at the top of every page of the File Analytics web console. The main menu bar includes the following display features:

  • Dashboard tab : View widgets that present data on file trends, distribution, and operations, see Dashboard.
  • Audit Trails tab : Search for a specific user or file and view various widgets to audit activity, see Audit Trails.
  • Anomalies tab : Create anomaly policies and view anomaly trends, see Anomalies.
  • Ransomware tab : Configure ransomware protection and self-service restore (SSR) snapshots, see Ransomware Protection.
    Warning: Ransomware protection helps detect potential ransomware. Nutanix does not recommend using the File Analytics ransomware feature as an all-encompassing ransomware solution.
  • Reports tab : Create custom reports or use pre-canned report templates, see Reports.
  • Status icon : Check the file system scan status.
  • File server drop-down : View the name of the file server for which data is displayed.
  • Settings drop-down : Manage File Analytics and configure settings, see Administration and File Analytics Options.
  • Health icon : Check the health of File Analytics, see Health.
  • Admin dropdown : Collect logs and view the current File Analytics version.

Deployment Requirements

Meet the following requirements prior to deploying File Analytics.

Ensure that you have performed the following tasks and your Files deployment meets the following specifications.

  • Assign the file server administrator role to an Active Directory (AD) user, see Managing Roles in the Nutanix Files Guide .
  • Log on as the Prism admin user to deploy the File Analytics server.
  • Configure a VLAN with one dedicated IP address for File Analytics, or you can use an IP address from an existing Files external network. This IP address must have connectivity to AD, the control VM (CVM), and Files. See "Configuring a Virtual Network For guest VM Interfaces" in the Prism Web Console Guide.
    Note: Do not install File Analytics on the Files internal network.
  • (optional) Assign the file server administrator role to an LDAP user, see Managing Roles in the Nutanix Files Guide .
  • Ensure that all software components meet the supported configurations and system limits, see the File Analytics Release Notes .

Network Requirements

Open the required ports, and ensure that your firewall allows bi-directional Internet Control Message Protocol (ICMP) traffic between the FAVM and CVMs.

The Port Reference provides detailed port information for Nutanix products and services, including port sources and destinations, service descriptions, directionality, and protocol requirements.

In addition to meeting the File Analytics network requirements, ensure to meet Nutanix Files port requirements as described in the Port Reference .

Limitations

File Analytics has the following limitations.

Note: Depending on data set size, file count, and workload type, enabling File Analytics can affect the performance of Nutanix Files. High latency is more common with heavy file-metadata operations (directory and file creation, deletion, permission changes, and so on). To minimize the impact on performance, ensure that the host has enough CPU and memory resources to handle the File Analytics VM (FAVM), file servers, and guest VMs (if any).
  • Only Prism admin can deploy File Analytics.
  • File Analytics analyzes data from daily up to 1 year based on the configuration. File Analytics automatically deletes data beyond the defined configuration.
    Note: After surpassing the audit event threshold, as specified in File Analytics Release Notes , Analytics archives the oldest events. Archived audit events do not appear in the Analytics UI.
  • You cannot deploy or decommission File Analytics when a file server has high-availability (HA) mode enabled.
  • You cannot use network segmentation for Nutanix Volumes with File Analytics.
  • If file server DNS or IP changes, File Analytics does not automatically reconfigure.
  • File Analytics does not collect metadata for files on Kerberos authenticated NFS v4.0 shares.
  • File Analytics does not support hard links.
  • You cannot enable File Analytics on a file server clone.
  • You cannot move File Analytics to another storage container.
  • File Analytics creates an unprotected Prism and an unprotected file server user for integration purposes. Do not delete these users.
  • The legacy file blocking policy has an upper limit of 300 ransomware extensions.
    Note: For higher limits, it is recommended to use Nutanix Data Lens.
  • File Analytics does not support the following operations for graceful shutdown:
    • AHV: power cycle, power off
    • ESXi: power off, reset
  • File Analytics log collection from CVM fails with dual NIC setup.
  • File Analytics does not collect metadata information on shares, offline shares, and encrypted shares.
  • Teardown of File Analytics fails in case of dual NIC setup.

Administration

Overview of administrative processes for File Analytics.

As an admin, you have the required permissions for performing File Analytics administrative tasks. To add a file server admin user, see Managing Roles in the Nutanix Files Guide . The topics in this chapter describe the basics for administering your File Analytics environment. For advanced administrative options, refer to the File Analytics Options chapter.

Role-based Access Control for File Analytics

Prism Element supports role-based access control (RBAC) that allows you to configure and provide customized access to the users based on their assigned roles.

Note: Log in to File Analytics with local user created on Prism Central is not supported.

From the Prism Element dashboard, you can assign a set of predefined built-in roles (system roles) roles to users or user groups. File Analytics support the following built-in roles (system roles) that are defined by default:

Note: Only administrators (Super Admin or a Prism Admin in Prism Element) can create roles for File Analytics.
    • Viewer : Allows users with view-only access to the information and cannot perform any administrative (create or modify) tasks.
    • Cluster and User Admin : Allows users to view information, perform administrative tasks, and to create and modify operations.
    For more information on Role Based Access Control, refer to the Controlling User Access (RBAC) , Built-in Role Management , Configuring Role Mapping , and Managing Local User Accounts sections in the Security Guide .

Deploying File Analytics

Follow this procedure to deploy the File Analytics server.

Before you begin

Ensure that your environment meets all requirements prior to deployment, see Deployment Requirements.

Procedure

Deploying the File Analytics server.
  1. Go to Support Portal > Downloads > File Analytics .
  2. Download the File Analytics QCOW2 and JSON files.
  3. Log on to Prism with the user name and password of the Prism administrator.
    Note: An Active Directory (AD) user or an AD user mapped to a Prism admin role cannot deploy File Analytics.
  4. In Prism, go to the File Server view and click the Deploy File Analytics action link.
    Figure. File Analytics
    Click to enlarge

  5. Review the File Analytics requirements and best practices in the Pre-Check dialog box.
  6. In the Deploy File Analytics Server dialog box, do the following in the Image tab.
    • Under Available versions , select one of the available File Analytics versions. (continue to step 8.).
    • Install by uploading installation binary files (continue to next step).
  7. Upload installation files.
    1. In the Upload binary section, click upload the File Analytics binary to upload the File Analytics JSON and QCOW files.
      Figure. Upload Binary Link Click to enlarge
    2. Under File Analytics Metadata File (.Json) , click Choose File to choose the downloaded JSON file.
    3. Under File Analytics Installation Binary (.Qcow2) click Choose File to choose the downloaded QCOW file.
      Figure. Upload Binary Files Click to enlarge
    4. Click Upload Now after choosing the files.
  8. Click Next .
  9. In the VM Configuration tab, do the following in the indicated fields:
    1. Name : Enter a name for the File Analytics VM (FAVM).
    2. Server Size : Select either the small or large configuration. Large file servers require larger configurations for the FAVM. By default File Analytics selects the large configuration.
    3. Storage Container: select a storage container from the drop-down.
      The drop-down displays the storage containers.
      Note: From AOS 5.15.3 version onward, the drop-down displays all storage containers. For earlier AOS versions, the drop-down only displays file server storage containers.
    4. Network List : Select a VLAN.
      Note: If the selected network is unmanaged , enter more network details in the Subnet Mask , Default Gateway IP , and IP Address fields as indicated.
      Note: The FAVM must use the client-side network.
      Note: For ESXi, do not use the Controller VM (CVM) backplane network. The CVM backplane network is not supported and any later upgrade operations might fail.
  10. Click Deploy .
    In the main menu drop-down, select the Tasks view to monitor the deployment progress.

Results

Once deployment is complete, File Analytics creates an FAVM, CVM, and a new Files user to make REST API calls. Do not delete the CVM, FAVM, or the REST API user.

Enabling File Analytics

Steps for enabling File Analytics after deployment or disablement.

About this task

Attention: Nutanix recommends enabling File Analytics during off-peak hours.

Follow these steps to enable File Analytics after disabling the application.

Note: File Analytics saves all previous configurations.

Procedure

  1. In the File Server view in Prism , select the target file server.
  2. (skip to step 3 if you are re-enabling a file server) click Manage roles to add a file server admin user, see Managing Roles in the Nutanix Files Guide .
  3. In the File Server view, select the target file server and click File Analytics in the tabs bar.
  4. (Skip to step 5 if you are not re-enabling a disabled instance of File Analytics) to re-enable File Analytics, click Enable File Analytics in the message bar.
    Figure. Enabling File Analytics Link Click to enlarge
    The Enable File Analytics dialog-box appears. Skip the remaining steps.
  5. In the Data Retention field, select a data retention period. The data retention period refers to the length of time File Analytics retains audit events.
  6. In the Authentication section, enter the credentials as indicated:
    Note: AD passwords for the file server admin cannot contain the following special characters: comma (,), single quote ('), double quote ("). Using the special characters in passwords prevents File Analytics from performing file system scans.
    1. (For SMB users only) In the SMB section, do the following in the indicated fields to provide SMB authentication details:
      • Active Directory Realm Name : Confirm the AD realm name for the file server.
      • Username : Enter the AD username for the file server administrator, see File Analytics Prerequisites .
      • Password : Enter the AD user password for the file server administrator.
    2. (For NFS users only) In the NFS Authentication section, do the following in the indicated fields to provide NFS authentication details:
      • LDAP Server URI : Enter the URI of the LDAP server.
      • Base DN : Enter the base DN for the LDAP server.
      • Password : Enter the LDAP user password for the file server administrator.

    Click to enlarge

  7. Click Enable .

Results

After enablement, File Analytics performs a one-time file system scan to pull metadata information. The duration of the scan varies depending on the protocol of the share. There is no system downtime during the scan.

Example

Scanning 3–4 million NFS files or 1 million SMB files takes about 1 hour.

Disabling File Analytics

About this task

Follow the steps as indicated to disable File Analytics.

Procedure

  1. In File Analytics click the gear icon > Disable File Analytics .
  2. In the dialog-box, click Disable .
    Disabling File Analytics disables data collection. The following message banner appears.
     File Analytics is disabled on the server. Enable File Analytics to start collecting data again or Delete File Analytics Data. 

What to do next

To delete data, click the Delete File Analytics Data link in the banner described in Step 2.

Launching File Analytics

About this task

Do the following to launch File Analytics.

Procedure

  1. From the Prism views drop-down, select the File Server view.
  2. Select the target file server from the entity tab.
  3. Click the File Analytics action button below the entity table.
    Figure. Launch File Analytics Click to enlarge The File Analytics action button.

File Analytics VM Management

To update a File Analytics VM (FAVM), refer to the sizing guidelines in the File Analytics release notes and follow the steps in the VM Management topic of the Prism Web Console Guide .

Removing File Analytics VMs

Remove a File Analytics VM (FAVM) by disabling it and deleting it from the cluster in Prism.

About this task

Follow the steps as indicated to remove an FAVM.
Note: Do not delete an FAVM using the CLI, as this operation does not decommission the FAVM.

Procedure

  1. Disable File Analytics on all file servers in the cluster, see Disabling File Analytics.
  2. In the File Server view in Prism Element, do the following:
    1. In the top actions bar, click Manage File Analytics .
    2. Click Delete to remove the FAVM.
    When you delete an FAVM, you also delete all of your File Analytics configurations and audit data stored on the FAVM.

Updating Credentials

About this task

Follow the steps as indicated to update authentication credentials for LDAP or Active Directory.

Procedure

  1. Click gear icon > Update AD/LDAP Configuration .
  2. To update Active Directory credentials, do the following in the indicated fields (otherwise move on to the next step).
    Note: AD passwords for the file server admin cannot contain the following special characters: comma (,), single quote ('), double quote ("). Using the special characters in passwords prevents File Analytics from performing file system scans.
    1. Active Directory Realm Name: confirm or replace the realm name.
    2. Username: confirm or replace the username.
    3. Password: type in the new password.
  3. To update NFS configuration, do the following (otherwise move on to the next step).
    1. LDAP Server URI: confirm or replace the server URI.
    2. Base DN: confirm or replace the base distinguished name (DN).
    3. Bind DN (Optional): confirm or replace the bind distinguished name (DN).
    4. Password: type in the new password.
  4. Click Save .

Managing Deleted Share/Export Audits

Manage the audit data of delete shares and exports.

About this task

By default, File Analytics retains deleted share and export data. The dashboard widgets do not account for data of deleted shares and exports. The deleted marker appears next to deleted shares and exports in audit trails. The Manage Share/Export Audit data window displays a list of deleted shares and exports.

Follow the directions as indicated to delete audit data for the deleted share or export.

Note: You cannot restore the deleted audit data of a deleted share or export.

Procedure

  1. Click the gear icon > Manage Deleted Share/Export Audit .
  2. Check the box next to the share or export name.
  3. Click Delete .
  4. In the confirmation window, click Delete to confirm the deletion of data.
    In the Manage Deleted Share/Export Audit , a progress bar displays the progress of the deletion process next to the share name. File Analytics considers data deletion of a deleted share a low-priority task, which can take several hours to finish.

Changing an FAVM Password

Steps for updating the password of a File Analytics VM (FAVM).

About this task

Context for the current task

Procedure

  1. Log on to an FAVM with SSH.
  2. Change the nutanix password.
    nutanix@fsvm$ sudo passwd nutanix
  3. Respond to the prompts, providing the current and new nutanix user password.
    Changing password for user nutanix.
    Old Password:
    New password:
    Retype new password:
    passwd: all authentication tokens updated successfully.
    Note:

    The password must meet the following complexity requirements:

    • At least 8 characters long
    • At least 1 lowercase letter
    • At least 1 uppercase letter
    • At least 1 number
    • At least 1 special character
    • At least 4 characters difference from the old password
    • Should not be among the last 10 passwords

Upgrades

Perform File Analytics upgrades using the Life Cycle Manager feature in Prism Element.

Before you proceed with the FA upgrade, ensure you meet the following:

  • Have a compatible version of AOS and Files.

    Refer to File Analytics release notes for compatibility details. You can upgrade both AOS and Files through Prism Element, see AOS Upgrade in the Prism Web Console Guide .

  • Check the health page of File Analytics to confirm if the overall health is green. See Health.
  • The protection domain (PD) for the File Analytics VM (FAVM) should not include any other entities.

To upgrade File Analytics, perform inventory and updates using the Life-Cycle Manager (LCM), see the Life Cycle Manager Guide for instructions on performing inventory and updates.

Note: The File Analytics UI is not accessible during upgrades.

During the upgrade process, File Analytics takes a snapshot of the volume group (VG) that contains File Analytics data. If issues occur during an upgrade, File Analytics restores the FAVM to the pre-upgrade state. If the volume group is protected and is part a protection domain, the File Analytics creates a snapshot and sets the expiry time to 30 days. If the volume group is not protected, File Analytics creates a snapshot and deletes the snapshot after completing the upgrade successfully. If any errors occur, the system keeps the snapshot for 30 days to troubleshoot the issue.

Upgrade File Analytics at a Dark Site

Upgrade File Analytics at a dark site using the Life-Cycle Manager (LCM).

About this task

Before you begin

You need a local web server reachable by your Nutanix clusters to host the LCM repository.

Procedure

  1. From a device that has public Internet access, go to Nutanix Portal > Downloads > Tools & Firmware .
    1. Download the tar file lcm_dark_site_version.tar.gz .
    2. Transfer lcm_dark_site_version.tar.gz to your local web server and untar into the release directory.
  2. From a device that has public Internet access, go to the Nutanix portal and select Downloads > File Analytics .
    1. Download the following files.
      • file_analytics_dark_site_version.tar.gz
      • nutanix_compatibility.tgz
      • nutanix_compatibility.tgz.sign
    2. Transfer file_analytics_dark_site_version.tar.gz to your local web server and untar into the release directory.
    3. Transfer the nutanix_compatibility.tgz and nutanix_compatibility.tgz.sign files to your local web server (overwrite existing files as needed).
  3. Log on to Prism Element.
  4. Click Home > LCM > > Settings .
    1. In the Fetch updates from field, enter the path to the directory where you extracted the tar file on your local server. Use the format http://webserver_IP_address/release .
    2. Click Save .
      You return to the Life Cycle Manager.
    3. In the LCM sidebar, click Inventory > Perform Inventory .
    4. Update the LCM framework before trying to update any other component.
      The LCM sidebar shows the LCM framework with the same version as the file you downloaded.

Dashboard

The Dashboard tab displays data on the operational trends of a file server.

Dashboard View

The Dashboard tab is the opening screen that appears after launching File Analytics for a specific file server. The dashboard displays widgets that present data on file trends, distribution, and operations.

Note: Widgets refresh hourly.
Figure. Analytics Dashboard Click to enlarge Widgets in the dashboard view.

Table 1. Dashboard Widgets
Tile Name Description Intervals
Capacity trend Displays capacity trends for the file server including capacity added, capacity removed, and net changes.

Clicking an event period widget displays the Capacity Trend Details view.

7 days, the last 30 days, or the last 1 year.
Data age Displays the percentage of data by age. Data age determines the data heat, including: hot, warm, and cold. Default intervals are as follows:
  • Hot data – accessed within the last week.
  • Warm data – accessed within 2 to 4 weeks.
  • Cold data – accessed 4 weeks ago or later.
Permission denials Displays users who have had excessive permission denials and the number of denials. Clicking a user displays audit details, see Audit Trails - Users for more. [user id], [number of permission denials]
File distribution by size Displays the number of files by file size. Provides trend details for top 5 files. Less than 1 MB, 1–10 MB, 10–100 MB, 100 MB to 1 GB, greater than 1 GB).
File distribution by type Displays the space taken up by various applications and file types. The file extension determines the file type. See the File types table for more details. MB or GB
File distribution by type details view Displays a trend graph of the top 5 file types. File distribution details include file type, current space used, current number of files, and change in space for the last 7 or 30 days.

Clicking View Details displays the File Distribution by Type view.
Daily size trend for top 5 files (GB), file type (see the "File Type" table), current space used (GB), current number of files (numeric), change in last 7 or 30 days (GB).
Top 5 active users Lists the users who have accessed the most files and number of operations the user performed for the specified period. When there are more than 5 active users, the more link provides details on the top 50 users. Clicking the user name displays the audit view for the user, see Audit Trails - Users for more. 24 hours, 7 days, 1 month, or 1 year.
Top 5 accessed files Lists the 5 most frequently accessed files. Clicking more provides details on the top 50 files.

Clicking the file name displays the audit view details for the file, see Audit Trails - Files for more.

24 hours, 7 days, 1 month, or 1 year.
Files operations Displays the distribution of operation types for the specified period, including a count for each operation type and the total sum of all operations.

Operations include: create, delete, read, write, rename, permission changed, set attribute, symlink, permission denied, permission denied (file blocking).

Clicking an operation displays the File Operation Trend view.
24 hours, 7 days, 1 month, or 1 year.

Capacity Trend Details

Clicking an event period in the Capacity Trend widget displays the Capacity Trend Details view for that period. The view includes three tabs Share/Export , Folder , and Category . Each tab includes columns detailing entity details: Name . Net capacity change, capacity added, and capacity removed.

Figure. Capacity Trend Details View Click to enlarge Clicking on the Capacity Trend widget in the Dashboard tab displays the Capacity Trend Details view.

Table 2. Capacity Trend Details
Category Supported File Type
Name Name of share/export, folder, or category.
Net capacity change The total difference between capacity at the beginning and the end of the specified period.
Share name (for folders only) The name of the share or export that the folder belongs to.
Capacity added Total added capacity for the specified period.
Capacity removed Total removed capacity for the specified period.

File Distribution by Type Details

Clicking View Details for the File Distribution by Type widget displays granular details of file distribution, see the File Types table for details.

Figure. File Distribution by Type Click to enlarge Clicking View Details on the File Distribution by Type widget displays the File Distribution by Type dashboard.

Table 3. Details of File Distribution Parameters
Category Supported File Type
File type Name of file type
Current space used Space capacity occupied by the file type
Current number of files Number of files for the file type
Change (in last 30 days) The increase in capacity over a 30-day period for the specified file type
Table 4. File Types
Category Supported File Type
Archives .cab, .gz, .rar, .tar, .z, .zip
Audio .aiff, .au, .mp3, .mp4, .wav, .wma
Backups .bak, .bkf, .bkp
CD/DVD images .img, .iso, .nrg
Desktop publishing .qxd
Email archives .pst
Hard drive images .tib, .gho, .ghs
Images .bmp, .gif, .jpg, .jpeg, .pdf .png, .psd, .tif, .tiff,
Installers .msi, .rpm
Log Files .log
Lotus notes .box, .ncf, .nsf, .ns2, .ns3, .ns4, .ntf
MS Office documents .accdb, .accde, .accdt, .accdr, .doc, .docx, .docm, .dot, .dotx, .dotm, .xls, .xlsx, .xlsm, .xlt, .xltx, .xltm, .xlsb, .xlam, .ppt, .pptx, .pptm, .potx, .potm, .ppam, .ppsx, .ppsm, .mdb
System files .bin, .dll, .exe
Text files .csv, .pdf, .txt
Video .avi, mpg, .mpeg, .mov, .m4v
Disk image .hlog, .nvram, .vmdk, .vmx, .vmxf, .vmtm, .vmem, .vmsn, .vmsd

File Operation Trend

Clicking an operation type in the File Operations widget displays the File Operation Trend view. The File Operation Trend view breaks down the specified period into smaller intervals, and displays the number of occurrences of the operation during each interval.

Figure. Operation Trend Click to enlarge A graph displays the number of times the specified operation took place over time.

Table 5. File Operation Trend View Parameters
Category Description
Operation type A drop-down option to specify the operation type. See Files Operations in the Dashboard Widgets table for a list of operation types.
Last (time period) A drop-down option to specify the period for the file operation trend.
File operation trend graph The x-axis displays shorter intervals for the specified period. The y-axis displays the number of operations trend over the extent of the intervals.

Health

The Health dashboard displays dynamically updated health information about each file server component.

The Health dashboard includes the following details:

  • Data Summary Data summary of all file servers with File Analytics enabled.
  • Host Memory Percent of used memory on the File Analytics VM (FAVM).
  • Host CPU Usage Percent of CPU used by the FAVM.
  • Storage Summary Amount of storage space used on the File Analytics data disk or FAVM disk.
  • Overall Health Overall health of File Analytics components.
  • Data Server Summary Data server usage by component.
Figure. Health Page Click to enlarge The Health page dashboard includes tiles that dynamically update to indicate the health of relevant entities.

Data Age

The Data Age widget in the dashboard provides details on data heat.

Share-level data is displayed to provide details on share capacity trends. There are three levels of data heat:

  • Hot – frequently accessed data (last accessed within the last week).
  • Warm – infrequently accessed data (last accessed within the last 2 to 4 weeks).
  • Cold – rarely accessed data (last accessed longer than 4 weeks ago).

You can configure the definitions for each level of data heat rather than using the default values. See Configuring Data Heat Levels.

Configuring Data Heat Levels

Update the values that constitute different data heat levels.

Procedure

  1. In the Data Age widget, click Explore .
  2. Click Edit Data Age Configuration .
  3. Do the following in the Hot Data section:
    1. In the entry field next to Older Than , enter an integer.
    2. In the dropdown, choose a value for Week(s) , Month(s) , or Year(s) .
  4. Do the following in the Warm Data section to configure two ranges :
    1. In the first entry field, enter an integer to configure the first range.
    2. In the dropdown, choose a value for Week(s) , Month(s) , or Year(s) .
    3. In the second entry field, enter an integer to configure the second range.
    4. In the dropdown, choose a value for Week(s) , Month(s) , or Year(s) .
  5. Do the following in the Cold Data section to configure four ranges :
    1. In the first entry field, enter an integer to configure the first range.
    2. In the dropdown, choose a value for Week(s) , Month(s) , or Year(s) .
    3. In the second entry field, enter an integer to configure the second range.
    4. In the dropdown, choose a value for Week(s) , Month(s) , or Year(s) .
    5. In the 3rd entry field, enter an integer to configure the 3rd range.
    6. In the dropdown, choose a value for Week(s) , Month(s) , or Year(s) .
    7. (optional) In the 4th entry field, enter an integer to configure the 4th range.
    8. In the dropdown, choose a value for Week(s) , Month(s) , or Year(s) .
  6. Click Apply .
    Note: The new values do not affect the already calculated heat statistics. File Analytics uses the updated values for future heat calculations.

Anomalies

Data panes in the Anomalies tab display data and trends for configured anomalies.

The Anomalies tab provides options for creating anomaly policies and displays dashboards for viewing anomaly trends.

You can configure anomalies for the following operations:

  • Creating files and directories
  • Deleting files and directories
  • Permission changes
  • Permission denials
  • Renaming files and directories
  • Reading files and directories

Define anomaly rules by the specifying the following conditions:

  • Users exceed an operation count threshold
  • Users exceed an operation percentage threshold

Meeting the lower operation threshold triggers an anomaly.

Consider a scenario where you have 1 thousand files, the operation count threshold defined as 10, and the operation percentage threshold defined as 10%. The count threshold takes precedence, as 10% of 1 thousand is 100, which is greater than the count threshold of 10.

Figure. Anomalies Dashboard Click to enlarge The Anomalies dashboard displays anomaly trends.

Table 1. Anomalies Data Pane Descriptions
Pane Name Description Values
Anomaly Trend Displays the number of anomalies per day or per month. Last 7 days, Last 30 days, Last 1 year
Top Users Displays the users with the most anomalies and the number of anomalies per user. Last 7 days, Last 30 days, Last 1 year
Top Folders Displays the folders with the most anomalies and the number of anomalies per folder. Last 7 days, Last 30 days, Last 1 year
Operation Anomaly Types Displays the percentage of occurrences per anomaly type. Last 7 days, Last 30 days, Last 1 year

Anomaly Details

Clicking an anomaly bar in the Anomaly Trend graph displays the Anomaly Details view.

Figure. Anomaly Details View Click to enlarge

Table 2. Anomalies Details View Total Results Table
Column Description
Anomaly Type The configured anomaly type. Anomaly types not configured do not show up in the table.
Total User Count The number of users that have performed the operation causing the specified anomaly during the specified time range.
Total Folder Count The numbers of folders in which the anomaly occurred during the specified time range.
Total Operation Count Total number of anomalies for the specified anomaly type that occurred during the specified time range.
Time Range The time range for which the total user count, total folder count, and total operation count are specified.
Table 3. Anomalies Details View Users/Folders Table
Column Description
Username or Folders Indicates the entity for the operation count. Selecting the Users tab indicates operation count for specific users, and selecting the Folders tab indicates the operation count for specific folders.
Operation count The total number of operations causing anomalies for the selected user or folder during the time period for the bar in the Anomaly Trend graph.

Configuring Anomaly Detection

Steps for configuring anomaly rules.

About this task

To create an anomaly rule, do the following.

Note: Configure an SMTP server for File Analytics to send anomaly alerts, see Configuring an SMTP Server.

Procedure

  1. In the File Analytics web console, click the gear icon > Define Anomaly Rules. .
  2. In the Anomaly Email Recipients field, enter a comma-separated list of email recipients for all anomaly alerts and data.
    Note: File Analytics sends anomaly alerts and data to recipients whenever File Analytics detects an anomaly.
  3. To configure a new anomaly, do the following in the indicated fields:
    1. Events : Select a rule for the anomaly from one of the following:
      • Permission changed
      • Permission denied
      • Delete
      • Create
      • Rename
      • Read
      The event defines the scenario type for the anomaly.
    2. Minimum Operations % : Enter a percentage value for the minimum threshold.
      File Analytics calculates the minimum operations percentage based on the number of files. For example, if there are 100 files, and you set the minimum operations percentage to 5, five operations within the scan interval would trigger an anomaly alert.
    3. Minimum Operation Count : Enter a value for a minimum operation threshold.
      File Analytics triggers an anomaly alert after meeting the threshold.
    4. User : Choose if the anomaly rule is applicable for All Users or an Individual user.
    5. Type: the type determines the interval.
      The interval determines how far back File Analytics monitors the anomaly.
    6. Interval : Enter a value for the detection interval.
    7. (optional) Actions : Click the pencil icon to update an anomaly rule. Click the x icon to delete an existing rule.
    Figure. Anomaly Configuration Fields Click to enlarge Fill out these fields to configure a new anomaly rule.

  4. Click Save .

Configuring an SMTP Server

File Analytics uses a simple mail transport protocol (SMTP) server to send anomaly alerts.

About this task

To configure an SMTP server, do the following:

Procedure

  1. In the File Analytics web console, click the gear icon > SMTP Configuration .
  2. In the SMTP Configuration window, enter the indicated details in the following fields:
    1. Hostname Or IP Address : Enter a fully qualified domain name or IP address for the SMTP server.
    2. Port : Enter the port to use.
      The standard SMTP ports are 25 (encrypted), 587 (TLS), and 465 (SSL).
    3. Security Mode : Enter the desired security mode from the dropdown list.
      The options are:
      • NONE (unencrypted)
      • STARTTLS (TTL encryption)
      • SSL (SSL encryption)
    4. (If security mode is "NONE" go to step f.)
    5. User Name enter a user name for logging into the SMTP server. Depending on the authentication method, the user name may require a domain.
    6. Password enter password.
    7. From Email Address: enter the email address from which File Analytics will send the anomaly alerts.
    8. Recipient Email Address: enter a recipient email address to test the SMTP configuration.
    Figure. SMTP Configuration Click to enlarge Fields for configuring an SMTP server.

  3. Click Save .

Audit Trails

Use audit trails to look up operation data for a specific user, file, folder, or client.

The Audit Trails tab includes Files , Folders , Users , and Client IP options for specifying the audit type. Use the search bar for specifying the specific entity for the audit (user, folder, file, or client IP).

The results table presents details for entities that match the search criteria. Clicking the entity name (or client IP number) takes you to the Audit Trails dashboard for the target entity.

View Audit Trails

Audit a user, file, client, or folder.

About this task

Follow the steps as indicated.

Procedure

  1. Click the Audit Trails tab.
  2. Select the Files , Folders , Users , or Client IP option.
  3. Enter the audit trails target into the search bar.
  4. Click Search .
  5. To display audit results in the Audit Trails window, click the entity name (or client IP number).

Audit Trails - Users

Details for client IP Audit Trails.

Audit Trails Search - Users

When you search by user in the Audit Trails tab, search results display the following information in a table.

  • User Name
  • Domain
  • Last Operation
  • Last Operation On
  • Share Name
  • Operation Date
  • Action
Figure. Users Search Results Click to enlarge A table displays user search results for the query.

Audit Details Page - Users

Clicking View Audit displays the Audit Details page, which shows the following audit information for the selected user.

  • A User Events graph displays various operations the user performed during the selected period and the percentage of time each operation has occurred per total operations during the specified period.
    • The Filter by operations dropdown contains operation filters, which you can use to filter the audit by operation type. Operations include:
      • Create File
      • Delete
      • Make Directory
      • Permission Changed
      • Permission Denied
      • Read
      • Remove Directory
      • Rename
      • Set Attribute
      • Write
      • Symlink
    • The filter bar , above the User Events graph, displays the filters in use.
    • Use the From and To fields to filter by date.
  • The Results table displays operation-specific information. See more details below.
  • The Reset Filters button removes all filters.
Figure. User Audit Details - Events Click to enlarge User Events table displays event rates for various operations performed by the user.

The Results table provides granular details of the audit results. The following data is displayed for every event.

  • User Name
  • User IP Address
  • Operation
  • Operation Date
  • Target File

Click the gear icon for options to download the data as an xls, csv, or JSON file.

Note: The maximum limitation of downloading events to CSV and JSON format is 10,000.
Figure. Results Table Click to enlarge The results table displays a detailed view of the audit data.

Audit Trails - Folders

Dashboard details for folder audits.

The following information displays when you search by file in the Audit Trails tab.

  • Folder Name
  • Folder Owner Name
  • Share Name
  • Parent Folder
  • Last Operation
  • Last Operation By
  • Last Operation Date
  • Action
Figure. Folders Search Results Click to enlarge

The Audit Details page shows the following audit information for the selected folder.

  • A Folder Events graph displays various operations performed on the file during the selected period, and the percentage of time each operation has occurred per total operations during that period.
    • The Filter by operations dropdown contains operation filters, which you can use to filter the audit by operation type. Operations include:
      • Select All
      • Make Directory
      • Permission Changed
      • Permission Denied
      • Remove Directory
      • Rename
        Note: Rename operation shows both change of name and change of path for specific file or folder.
      • Set Attribute
    • A filter bar , above the File Events graph displays the filters in use.
    • Use the From and to fields to filter by date.
  • The Results table displays operation-specific details.
  • The Reset Filters button removes all filters.

The Results table provides granular details of the audit results. File Analytics displays the following data for every event.

  • User Name
  • Client IP
  • Operation
  • Operation Date

Click the gear icon for options to download the data as a CSV file.

Note: The maximum limitation of downloading events to CSV and JSON format is 10,000.

Audit Trails - Files

Dashboards details for file audit.

Audit Trails for Files

When you search by file in the Audit Trails tab, the following information displays:

  • File Name
  • File Owner Name
  • Share Name
  • Parent Folder
  • Last Operation
  • Last Operation By
  • Last Operation Date
  • Action
Figure. Files Search Results Click to enlarge A table displays file search results for the query.

Note:
  • File Analytics does not support regular expression (RegEx) based search.
  • Up to 500 million files with latest 3 months of audit data is supported for a file server.

The Audit Details page shows the following audit information for the selected file.

  • A File Events graph displays various operations performed on the file during the selected period, and the percentage of time each operation has occurred per total operations during that period.
    • The Filter by operations dropdown contains operation filters, which you can use to filter the audit by operation type. Operation types include:
      • Close File
      • Create File
      • Delete
      • Make Directory
      • Open
      • Read
      • Rename
        Note: Rename operation shows both change of name and change of path for specific file or folder.
      • Set Attribute
      • Write
      • Symlink
    • A filter bar , above the File Events graph displays the filters in use.
    • Use the From and to fields to filter by date.
  • The Results table displays operation-specific details.
    • The Reset Filters button removes all filters.
Figure. Files Audit Details - Events Click to enlarge File Events table displays event rates for various operations for the file.

The Results table provides granular details of the audit results. File Analytics displays the following data for every event.

  • Username
  • Client IP
  • Operation
  • Operation Date

Click the gear icon for options to download the data as a CSV file.

Note: The maximum limitation of downloading events to CSV and JSON format is 10,000.
Figure. Results Table Click to enlarge The results table displays a detailed view of the audit data.

Audit Trails - Client IP

Dashboard details for client IP Audit Trails.

Audit Trails Search - Client IP

When you search by client IP in the Audit Trails tab, search results display the following information in a table.

  • Client IP
  • User Name
  • Domain
  • Last Operation
  • Last Operation On
  • Share Name
  • Operation Date
  • Action
Figure. IP Search Results Click to enlarge A table displays IP search results for the query

The Audit Details page shows the following audit information for the selected client.

  • A User Events graph displays various operations performed on the client during the selected period, and the percentage of time each operation has occurred per total operations during that period.
    • The Filter by operations dropdown contains operation filters, which you can use to filter the audit by operation type. Operation types include:
      • Create File
      • Delete
      • Make Directory
      • Permission Changed
      • Permission Denied
      • Read
      • Removed Directory
      • Rename
      • Set Attribute
      • Write
      • Symlink
      • Permission Denied (File Blocking)
    • A filter bar , above the File Events graph displays the filters in use.
    • Use the From and to fields to filter by date.
  • The Results table displays operation-specific details.
    • The Reset Filters button removes all filters.
Figure. Files Audit Details - Events Click to enlarge File Events table displays event rates for various operations for the file.

The Results table provides granular details of the audit results. File Analytics displays the following data for every event.

  • User Name
  • Operation
  • Target File
  • Operation Date

Click the gear icon for an option to download the data as a CSV file.

Note: The maximum limitation of downloading events to CSV and JSON format is 10,000.

Ransomware Protection

Ransomware protection for your file server.

Caution: Ransomware protection helps detect potential ransomware. Nutanix does not recommend using the File Analytics ransomware feature as an all-encompassing ransomware solution.

File Analytics scans files for ransomware in real time and notifies you in the event of a ransomware attack once you configure email notifications.

Using a curated a list of over 250 signatures that frequently appear in ransomware files, the Nutanix Files file blocking mechanism identifies and blocks files with ransomware extensions from carrying out malicious operations. You can modify the list by manually adding or removing signatures.

Note: Removing curated blocked signatures can prevent File Analytics from blocking some ransomware files.

File Analytics also monitors shares for self-service restore (SSR) policies and identifies shares that do not have SSR enabled in the ransomware dashboard. You can enable SSR through the ransomware dashboard.

Ransomware Protection Features

The ransomware dashboard includes panes for managing ransomware protection and self-service restore (SSR).

Ransomware Dashboard

The ransomware dashboard includes two main sections:

  • The SSR Status pane for viewing, enabling, and managing SSR, see Enabling SSR.
  • The Vulnerabilities (Infection Attempts) pane for viewing total vulnerabilities, vulnerable shares, malicious clients, and top recent ransomware attempts.
    • Clicking on the number of total vulnerabilities provides a detailed view of recent vulnerabilities.
    • Clicking on the number of vulnerable shares provides a detailed view of vulnerable shares.
    • Clicking on the number of malicious clients provides a detailed view of malicious clients.
  • Click Settings , to enable and configure ransomware protection, see Enabling Ransomware Protection and Configuring Ransomware Protection.
  • Click Download (.csv) to download a list of blocked ransomware signatures.
Figure. Ransomware Dashboard Click to enlarge

Enabling Ransomware Protection

Enable ransomware protection on your file server.

About this task

Procedure

  1. Go to dropdown menu > Ransomware .
  2. In the message banner, click Enable Ransomware Protection .
  3. (optional) Click Configure SMTP to Add Recipients.
    Note: This option appears only if you have not configured a simple mail transfer protocol (SMTP) server, see Configuring an SMTP Server.
  4. Under Ransomware Email Recipients , add at least one email address. If there is a ransomware attack, File Analytics sends a notification to the specified email address.
    Figure. Enable Ransomware Click to enlarge

  5. Click Enable .
    See Configuring Ransomware Protection for configuration steps.

Configuring Ransomware Protection

Configure ransomware protection on file servers.

About this task

Do the following to add signature to the blocked extension list.

Procedure

  1. Go to dropdown menu > Ransomware > > Settings .
  2. (optional) Under Search for blocked File Signatures , enter ransomware signatures in the *. (signature) format.
    1. Note: You can also remove ransomware signatures.
      To check that the signature has been blocked, click Search .
    2. If the signature has not been blocked, click Add to Block List .
    Figure. Click to enlarge

  3. (optional) To download a list of blocked ransomware signatures, click Download (.csv) .
  4. (optional) Under Ransomware Email Recipients , add a comma separated list of email addresses. If there is a ransomware attack, File Analytics sends a notification to the specified email addresses.
  5. (optional) To disable the ransomware protection feature, click Disable Ransomware Protection .

Enabling SSR

Enable self-service restore on shares identified by File Analytics.

About this task

File Analytics scans shares for SSR policies.

Procedure

  1. Go to dropdown menu > Ransomware .
  2. Click Enable SSR on Prism .
  3. Check the box next to the shares for which to enable SSR.
    Figure. Enable SSR on Shares Click to enlarge

  4. Click Enable SSR .

Reports

Generate a report for entities on the file server.

Create a report with custom attribute values or use one of the File Analytics pre-canned report templates. To create a custom report, specify the entity, attributes (and operators for some attributes), attribute values, column headings, and the number of columns. Pre-canned reports define most of the attributes and headings based on the entity and template that you choose.

The Reports dashboard displays a table or previously generated reports. You can rerun existing reports rather than creating a template. After creating a report, you can download it as a JSON or CSV file.

Reports Dashboard

The reports dashboard includes options to create, view, and download reports.

The Reports dashboard includes options to create a report, download reports as a JSON, download reports as a CSV, rerun reports, and delete reports.

The reports table includes columns for the report name, status, last run, and actions.

Figure. Reports Dashboard Click to enlarge

Clicking Create a new report takes you to the report creation screen, which includes a Report builder and a Pre-canned Reports Templates tabs. The tabs include report options and filters for report configuration.

Both tabs include the following elements:

  • The Define Report Type section includes an Entity drop-down menu to select an entity.
  • The Define Filters section includes an Attribute drop-down menu and an option to add more attributes by clicking + Add filter .
  • The Add/remove columns in this report in your report section displays default columns. Clicking the columns field lets you add addition columns to the report. Clicking the x next to the column name removes it from the report.
  • The Define number of maximum rows in this report section includes a Count section to specify the number of rows in the report.
Table 1. Report Builder – Filter Options
Entity Attributes (filters) Operator Value Column
Events event_date
  • equal_to
  • greater_than
  • greater_than_equal_to
  • less_than
  • less_than_equal_to
(date)
  • audit_path (object path)
  • audit_objectname (object name)
  • audit_operation (operation)
  • audit_machine_name (source of operation)
  • audit_event_date (event date in UTC)
  • audit_username (user name)
Event_operation N/A
  • file_write
  • file_read
  • file_create
  • file_delete
  • rename
  • directory_create
  • directory_delete
  • SecurityChange (permission change)
  • set_attr
  • sym_link
Files Category
  • equal_to
  • greater_than
  • greater_than_equal_to
  • less_than
  • less_than_equal_to
(date)
  • object_name (file name)
  • share_UUID (share name)
  • object_owner_name (owner name)
  • object_size_logical (size)
  • file_type (extension)
  • object_creation_date (creation date in UTC)
  • last_event_date (access date in UTC)
  • share_UUID (share name)
  • fileserver_protocol
  • object_ID (file id)
  • object_last_operation_name (last operation)
  • audit_username (last operation user
  • object_last_operation_name (last operation)
  • file_path (file path)
Extensions N/A (type in value)
Deleted N/A Last (number of days from 1 to 30) days
creation_date
  • equal_to
  • greater_than
  • greater_than_equal_to
  • less_than
  • less_than_equal_to
(date)
access_date
  • equal_to
  • greater_than
  • greater_than_equal_to
  • less_than
  • less_than_equal_to
(date)
Size
  • equal_to
  • greater_than
  • greater_than_equal_to
  • less_than
  • less_than_equal_to
(number) (file size)

File size options:

  • B
  • KB
  • MB
  • GB
  • TB
Folders Deleted N/A Last (number of days from 1 to 30) days
  • object_name (Dir name)
  • object_owner_name (owner name)
  • object_creation_date (creation date in UTC)
  • last_event_date (access date in UTC)
  • share_UUID (share name)
  • object_last_operation_name (last operation)
  • audit_username (last operation user)
  • File server protocol
  • object_ID (file id)
  • file_path (Dir path)
creation_date
  • equal_to
  • greater_than
  • greater_than_equal_to
  • less_than
  • less_than_equal_to
(date)
Users last_event_date
  • equal_to
  • greater_than
  • greater_than_equal_to
  • less_than
  • less_than_equal_to
(date)
  • user_login_name (user name)
  • Last operation
  • last_event_date (access date in UTC)
  • last_operation_audit_path
Table 2. Pre-Canned Reports – Filters
Entity Pre-canned report template Columns
Events
  • PermissionDenied events
  • Permission Denied (file blocking) events
  • audit_path (object path)
  • audit_objectname (object name)
  • audit_operation (operation)
  • audit_machine_name (source of operation)
  • audit_event_date (event date in UTC)
  • audit_username (user name)
Files
  • Largest Files
  • Oldest Files
  • Files not accessed for last 1 year
  • Files accessed in last 30 days
  • object_name (file name)
  • share_UUID (share name)
  • object_owner_name (owner name)
  • object_size_logical (size)
  • file_type (extension)
  • object_creation_date (creation date in UTC)
  • last_event_date (access date in UTC)
  • share_UUID (share name)
  • fileserver_protocol
  • object_ID (file id)
  • object_last_operation_name (last operation)
  • audit_username (last operation user
  • object_last_operation_name (last operation)
  • file_path (file path)
Users
  • Top owners with space consumed
  • Top active users
  • All users
  • user_login_name (user name)
  • Last operation
  • last_event_date (access date in UTC)
  • last_operation_audit_path

Creating a Custom Report

Create a custom report by defining the entity, attribute, filters, and columns.

About this task

Follow the steps as indicated.

Procedure

  1. Go to dropdown menu > Reports .
  2. Click Create a new report .
  3. In the Report Builder tab, do the following:
    1. In the Define Report Type section, select an entity from the drop-down menu.
    2. In the Define Filters section, select an attribute from the attributes dropdown.
    3. Under Value , specify the values for the attribute (some attributes also require to specify an operator in the Operator field).
    4. (optional) click + Add filter to add more attributes.
    5. In the Add/Remove column in this report section, click x for the columns you want to remove.
    6. In the Define maximum number of rows in this report section, type in , or use the - and + buttons, to specify the number of rows in your report. This value indicates the number of records in the report.
  4. Click Run Preview .
    The Report Preview section populates.
  5. Click Generate report .
    1. Select either the CSV or JSON option.

Create a Pre-Canned Report

Use one of the pre-canned File Analytics templates for your report.

Procedure

  1. Go to dropdown menu > Reports .
  2. Click Create a new report .
  3. In the Pre-Canned Reports Templates tab, do the following:
    1. In the Define Report Type section, select an entity from the drop-down menu.
    2. In the Define Filters section, select an attribute from the attributes dropdown.
    3. In the Add/Remove column in this report section, click x for the columns you want to remove.
    4. In the Define maximum number of rows in this report section, type in, or use the - and + buttons, to specify the number of rows in your report. This value indicates the number of records in the report.
  4. Click Run Preview .
    The Report Preview section populates.
  5. Click Generate report .
    1. Select either the CSV or JSON option.

File Analytics Options

You can get more insight into the usage and contents of files on your system by configuring and updating File Analytics features and settings. Some options include scanning the files on your file server on demand, updating data retention, and configuring data protection.

Updating Data Retention

The data retention period determines how long File Analytics retains event data.

About this task

Follow the steps as indicated to configure data retention.

Procedure

  1. In File Analytics, click gear icon > Update Data Retention .
  2. In the Data Retention Period drop-down, select the period for data retention.
  3. Click Update .

Scanning the File System

Once enabled, File Analytics scans the metadata of all files and shares on the system. You can perform an on-demand scan of shares in your file system.

About this task

To scan shares, perform the following task.

Procedure

  1. In File Analytics, click the gear icon .
  2. In the drop-down list, click Scan File System .
  3. In the list of shares, select the target shares for the scan.
    Figure. Select Scan Targets Click to enlarge

  4. Click Scan .
    The status of the share is In Progress . Once the scan is complete, the status changes to Completed .

Deny List

Deny users, file extensions, and client IP addresses.

About this task

Use the Deny feature to block audit events from being performed on specified file extensions or by specified users and clients.
Note: Files with no extension cannot be denied.

Procedure

  1. Click the gear icon > Define Rules for Deny List .
  2. Click the pencil icon in the Client IPs , File Extensions , Users row.
  3. Add a comma separated list of entities that you want blocked.
  4. Click the done icon in the updated row, and then click Close .

Managing File Categories

File Analytics uses the file category configuration to classify file extensions.

About this task

The capacity widget in the dashboard uses the category configuration to calculate capacity details.

Procedure

  1. Click gear icon > Manage File Category .
  2. To create a category, click + New Category . (Otherwise, move on to step 3).
    1. In the Category column, name the category.
    2. In the Extensions column, specify file extensions for the category.
  3. To delete an existing category, click the x icon next to the category. (Otherwise, move on to step 4)
  4. To modify an existing category, click the pencil icon next to the category and modify the specified file extensions.
  5. Click Save .

Data Protection

Configure File Analytics disaster recovery (DR) using Prism Element.

File Analytics only supports async disaster recovery. File Analytics does not support NearSync and metro availability.

Create an async protection domain, configure a protection domain schedule, and configure remote site mapping. The remote site must have symmetric configurations to the primary site. The remote site must also deploy File Analytics to restore a File Analytics VM (FAVM).

The Data Protection section in the Prism Web Console Guide provides more detail on the disaster recovery process.

Configuring Disaster Recovery

To set up disaster recovery for File Analytics, create an async protection domain, configure a protection domain schedule, and configure remote site mapping.

About this task

By default, the File Analytics volume group resides on the same container that hosts vDisks for Nutanix Files.

Procedure

  1. If you have not done so already, configure a remote site for the local cluster.
    See the Configuring a Remote Site (Physical Cluster) topic in the Prism Web Console Guide for this procedure.
  2. Create an async DR protection domain for the File Analytics volume group as the entity. The volume group name is File_Analytics_VG .
    See Configuring a Protection Domain (Async DR) in the Prism Web Console Guide .
  3. In the Schedule tab, click the New Schedule button to add a schedule.
    Add a schedule, as File Analytics does not provide a default schedule. See Creating a Protection Domain Schedule (Files) Nutanix Files Guide.
  4. Configure local and remote container mapping.
    See the Configuring Disaster Recovery (Files) section in the Nutanix Files Guide for steps to configure mapping between local and remote containers.
  5. Create a protection domain schedule.
    See Creating a Protection Domain Schedule (Files) in the Nutanix Files Guide .

Activating Disaster Recovery

Recover a File Analytics VM (FAVM) after a planned or unplanned migration to the remote site.

About this task

Perform the following tasks on the remote site.

Procedure

  1. Fail over to the protection domain for disaster recovery activation.
    See the Failing Over a Protection Domain topic in the Prism Web Console Guide .
  2. Fail back the protection domain to the primary site.
    See the Failing Back a Protection Domain topic in the Prism Web Console Guide .

Deploying File Analytics on a Remote Site (AHV)

Deploy a File Analytics VM (FAVM) after a planned or unplanned (disaster) migration to the remote site.

About this task

To perform disaster recovery, deploy and enable File Analytics on the remote site. Restore the data using a snapshot of the volume group from the primary FAVM.

Before you begin

Ensure that the Nutanix Files and AOS versions match the versions on the remote and primary sites.

About this task

Run the following commands from the command prompt inside the FAVM.

Procedure

  1. Deploy a new File Analytics instance on the remote site, see Deploying File Analytics.
    Caution: Do not enable File Analytics.
    The remote site requires an iSCSI data service IP address to configure the FAVM on the remote site. This procedure deploys a new volume group File_Analytics_VG and deletes in a subsequent step.
  2. On the remote site, create a volume group by restoring the snapshot of the File_Analytics_VG .
    See Restoring an Entity from a Protection Domain in Data Protection and Recovery with Prism Element . For the How to Restore step, use the Create new entities option, and specify a name in the Volume Group Name Prefix field. The restored volume group name format is prefix -File_Analytics_VG.
  3. To configure the FAVM on the remote, follow these steps:
    Caution: If the IP address of the File Analytics VM has changed on the remote site, contact Nutanix Support before proceeding.
    1. Log on to the FAVM with SSH.
      Tip: See KB 1661 for default credential details.
    2. To discover all storage devices accessed by the FAVM, run the following commands.
      nutanix@favm$  sudo blkid 
    3. Copy the cvm.config file to the temporary files directory.
      nutanix@favm$ cd /mnt/containers/config/common_config /tmp
    4. Stop the File Analytics services.
      nutanix@favm$  sudo systemctl stop monitoring
      nutanix@favm$  docker stop $(docker ps -q)
      nutanix@favm$  sudo systemctl stop docker
    5. Unmount the volume group.
      nutnix@avm$ sudo umount /mnt
    6. Detach the volume group File_Analytics_VG from the FAVM.
      See the "Managing a VM (AHV)" topic in the Prism Web Console Guide .
    7. Attach the cloned volume group prefix -File_Analytics_VG to the FAVM.
      See "Managing a VM (AHV)" in the Prism Web Console Guide .
    8. Restart the AVM to discover the attached volume group.
      nutanix@avm$ sudo reboot

    9. Log on to the FAVM with SSH.
      Tip: See KB 1661 for default credential details.
    10. Discover all storage devices accessed by the FAVM.
      nutanix@favm$  sudo blkid
      /dev/sr0: UUID="2019-06-11-12-18-52-00" LABEL="cidata" TYPE="iso9660" 
      /dev/sda1: LABEL="_master-x86_64-2" UUID="b1fb6e26-a782-4cf7-b5de-32941cc92722" TYPE="ext4"
      The FAVM discovers the attached volume group and assigns to the /dev/sdb device.
    11. Delete the deployed volume group File_Analytics_VG.
      See the "Modifying a Volume Group" topic in the Prism Web Console Guide .
    12. Rename the restored volume group prefix -File_Analytics_VG to File_Analytics_VG.
      See the "Modifying a Volume Group" topic in the Prism Web Console Guide .
    13. Create a backup of the cvm.config file.
      nutanix@favm$ mv /mnt/containers/config/common_config/cvm.config \
      /mnt/containers/config/common_config/cvm_bck.config
    14. Copy the cvm.config file from the /tmp directory to /common_config/ on the FAVM.
      nutanix@favm$ mv /tmp/cvm.config /mnt/containers/config/common_config/
    15. Reconfigure the password of the user on Prism for internal FAVM operations. Specify a passphrase for new password . File Analytics uses the password only for internal communication between Prism and the FAVM. You must issue the same command twice.
      nutanix@favm$ sudo python /opt/nutanix/analytics/bin/reset_password.py --user_type=prism \
       --password='new password' --local_update
      nutanix@favm$  sudo python /opt/nutanix/analytics/bin/reset_password.py --user_type=prism \
      --password='new password' --prism_user=admin --prism_password='Prism admin password'
    16. In File Analytics, go to gear icon > Scan File System to check if a file system scan can be initiated.
      Note: If you receive errors, disable and re-enable File Analytics, see "Disabling File Analytics" and "Enabling File Analytics."

Deploying File Analytics on a Remote Site (ESXi)

Deploy a File Analytics VM (FAVM) after a planned or unplanned (disaster) migration to the remote site.

About this task

To perform disaster recovery, deploy and enable File Analytics on the remote site. Restore the data using a snapshot of the volume group from the primary FAVM.

Before you begin

Ensure that the Nutanix Files and AOS versions match the versions on the remote and primary sites.

About this task

Run the following commands from the command prompt inside the FAVM.

Procedure

  1. Deploy a new File Analytics instance on the remote site, see Deploying File Analytics.
    Caution: Do not enable File Analytics.
    The remote site requires an iSCSI data service IP address to configure the FAVM on the remote site. This procedure deploys a new volume group File_Analytics_VG and deletes in a subsequent step.
  2. On the remote site, create a volume group by restoring the snapshot of the File_Analytics_VG .
    See Restoring an Entity from a Protection Domain in Data Protection and Recovery with Prism Element . For the How to Restore step, use the Create new entities option, and specify a name in the Volume Group Name Prefix field. The restored volume group name format is prefix -File_Analytics_VG.
  3. In the Storage Table view, go to the Volumes tab.
    1. Copy the target IQN prefix from the Volume Group Details column.
      Tip: Click the tooltip to see the entire IQN prefix.
  4. To configure the FAVM on the remote, follow these steps:
    Caution: If the IP address of the File Analytics VM has changed on the remote site, contact Nutanix Support before proceeding.
    1. Log on to the FAVM with SSH.
      Tip: See KB 1661 for default credential details.
    2. To discover all storage devices accessed by the FAVM, run the following commands.
      nutanix@favm$  sudo blkid 
    3. Copy the cvm.config file to the temporary files directory.
      nutanix@favm$ cd /mnt/containers/config/common_config/ /tmp
    4. Stop the File Analytics services.
      nutanix@favm$  sudo systemctl stop monitoring
      nutanix@favm$  docker stop $(docker ps -q)
      nutanix@favm$  sudo systemctl stop docker
    5. Unmount and log off from all iSCSI targets.
      nutnix@avm$ sudo umount /mnt
      nutnix@avm$ sudo /sbin/iscsiadm -m node -u
      
    6. Remove the disconnected target records from the discoverydb mode of the FAVM.
      nutanix@favm$  sudo /sbin/iscsiadm -m node –o delete
    7. Discover all storage devices accessed by the FAVM.
      nutanix@favm$  sudo blkid
      /dev/sr0: UUID="2019-06-11-12-18-52-00" LABEL="cidata" TYPE="iso9660" 
      /dev/sda1: LABEL="_master-x86_64-2" UUID="b1fb6e26-a782-4cf7-b5de-32941cc92722" TYPE="ext4"
      The output does not show the /dev/sdb device.
    8. Get the File Analytics Linux client iSCSI initiator name.
      nutanix@favm$  sudo cat /etc/iscsi/initiatorname.iscsi
      The output displays the initiator name.
      InitiatorName=iqn.1991-05.com.redhat:8ef967b5b8f
    9. Copy the iSCSI initiator name.
    10. Remove the iSCSI initiator name from the client whitelist of the deployed volume group File_Analytics_VG.
      See the "Modifying a Volume Group" topic in the Prism Web Console Guide .
    11. Whitelist the AVM client on the cloned volume group prefix -File_Analytics_VG using the iSCSI initiator name of the AVM client.
      See the "Modifying a Volume Group" topic in the Prism Web Console Guide .
    12. Let the Analytics initiator discover the cluster and its volume groups.
      nutanix@favm$  sudo /sbin/iscsiadm --mode discovery --type sendtargets --portal  data_services_IP_address:3260
      Clicking the Nutanix cluster name in Prism displays cluster details including the data service IP address. The output displays the restored iSCSI target from step 2.
    13. Connect to the volume target by specifying IQN prefix.
      nutanix@favm$  sudo /sbin/iscsiadm --mode node --targetname iqn_name --portal data_services_IP_address:3260,1 --login
    14. Restart the FAVM to restart the iSCSI host adapters, which allows the discovery of the attached volume group.
      nutanix@favm$  sudo reboot
    15. Log on to the FAVM with SSH.
      Tip: See KB 1661 for default credential details.
    16. Discover all storage devices accessed by the FAVM.
      nutanix@favm$  sudo blkid
      The FAVM discovers the attached iSCSI volume group and assigns to the /dev/sdb device.
      /dev/sr0: UUID="2019-06-11-12-18-52-00" LABEL="cidata" TYPE="iso9660" 
      /dev/sda1: LABEL="_master-x86_64-2" UUID="b1fb6e26-a782-4cf7-b5de-32941cc92722" TYPE="ext4"
      /dev/sdb: UUID="30749ab7-58e7-437e-9a09-5f6d9619e85b" TYPE="ext4"
    17. Delete the deployed volume group File_Analytics_VG.
      See the "Modifying a Volume Group" topic in the Prism Web Console Guide .
    18. Rename the restored volume group prefix -File_Analytics_VG to File_Analytics_VG.
      See the "Modifying a Volume Group" topic in the Prism Web Console Guide .
    19. Create a backup of the cvm.config file.
      nutanix@favm$ mv /mnt/containers/config/common_config/cvm.config \
      /mnt/containers/config/common_config/cvm_bck.config
    20. Copy the cvm.config file from the /tmp directory to /common_config/ on the FAVM.
      nutanix@favm$ mv /tmp/cvm.config /mnt/containers/config/common_config/
    21. Reconfigure the password of the user on Prism for internal FAVM operations. Specify a passphrase for new password . File Analytics uses the password only for internal communication between Prism and the FAVM. You must issue the same command twice.
      nutanix@favm$ sudo python /opt/nutanix/analytics/bin/reset_password.py --user_type=prism \
       --password='new password' --local_update
      nutanix@favm$  sudo python /opt/nutanix/analytics/bin/reset_password.py --user_type=prism \
      --password='new password' --prism_user=admin --prism_password='Prism admin password'
    22. In File Analytics, go to gear icon > Scan File System to check if a file system scan can be initiated.
      Note: If you receive errors, disable and re-enable File Analytics, see "Disabling File Analytics" and "Enabling File Analytics."
Read article