This is one stop global knowledge base where you can learn about all the products, solutions and support features.
Product Release Date: 2021-05-17
Last updated: 2022-10-12
Legacy disaster recovery (DR) configurations use protection domains (PDs) and third-party integrations to protect your applications. These DR configurations replicate data between on-prem Nutanix clusters. Protection domains provide limited flexibility in terms of supporting complex operations (for example, VM boot order, network mapping). With protection domains, you have to perform manual tasks to protect new guest VMs as and when your application scales up.
Leap offers an entity-centric automated approach to protect and recover applications. It uses categories to group the guest VMs and automate the protection of the guest VMs as the application scales. Application recovery is more flexible with network mappings, an enforceable VM start sequence, and inter-stage delays. Application recovery can also be validated and tested without affecting your production workloads. Asynchronous, NearSync, and Synchronous replication schedules ensure that an application and its configuration details synchronize to one or more recovery locations for a smoother recovery.
Leap works with sets of physically isolated locations called availability zones. An instance of Prism Central represents an availability zone. One availability zone serves as the primary site for an application while one or more paired availability zones serve as the recovery sites.
When paired, the primary site replicates the entities (protection policies, recovery plans, and recovery points) to the recovery sites in the specified time intervals (RPO). The approach helps application recovery at any of the recovery sites when there is a service disruption at the primary site (For example, natural disasters or scheduled maintenance). The entities start replicating back to the primary site when the primary site is up and running to ensure High Availability of applications. The entities you create or update synchronize continuously between the primary and recovery sites. The reverse synchronization enables you to create or update entities (protection policies, recovery plans, or guest VMs) at either the primary or the recovery sites.
This guide is primarily divided into the following two parts.
The section walks you through the procedure of application protection and DR to other Nutanix clusters at the same or different on-prem sites. The procedure also applies to protection and DR to other Nutanix clusters in supported public cloud.
Xi Leap is essentially an extension of Leap to Xi Cloud Services. You can protect applications and perform DR to Xi Cloud Services or from Xi Cloud Services to an on-prem availability zone. The section describes application protection and DR from Xi Cloud Services to an on-prem Nutanix cluster. For application protection and DR to Xi Cloud Services, refer the supported capabilities in Protection and DR between On-Prem Sites (Leap) because the protection procedure remains the same when the primary site is an on-prem availability zone.
Configuration tasks and DR workflows are largely the same regardless of the type of recovery site. For more information about the protection and DR workflow, see Leap Deployment Workflow.
The following section describes the terms and concepts used throughout the guide. Nutanix recommends gaining familiarity with these terms before you begin configuring protection and Leap or Xi Leap disaster recovery (DR).
An availability zone (site) in your premises.
A site in the Nutanix Enterprise Cloud Platform (Xi Cloud Services).
A site that initially hosts guest VMs you want to protect.
A site where you can recover the protected guest VMs when a planned or an unplanned event occurs at the primary site causing its downtime. You can configure at most two recovery sites for a guest VM.
A cluster running AHV or ESXi nodes on an on-prem availability zone, Xi Cloud Services, or any supported public cloud. Leap does not support guest VMs from Hyper-V clusters.
The GUI that provides you the ability to configure, manage, and monitor a single Nutanix cluster. It is a service built into the platform for every Nutanix cluster deployed.
The GUI that allows you to monitor and manage many Nutanix clusters (Prism Element running on those clusters). Prism Starter, Prism Pro, and Prism Ultimate are the three flavors of Prism Central. For more information about the features available with these licenses, see Software Options.
Prism Central essentially is a VM that you deploy (host) in a Nutanix cluster (Prism Element). For more information about Prism Central, see Prism Central Guide. You can set up the following configurations of Prism Central VM.
A logically isolated network service in Xi Cloud Services. A VPC provides the complete IP address space for hosting user-configured VPNs. A VPC allows creating workloads manually or by failover from a paired primary site.
The following VPCs are available in each Xi Cloud Services account. You cannot create more VPCs in Xi Cloud Services.
The virtual network from which guest VMs migrate during a failover or failback.
The virtual network to which guest VMs migrate during a failover or failback operation.
A mapping between two virtual networks in paired sites. A network mapping specifies a recovery network for all guest VMs of the source network. When you perform a failover or failback, the guest VMs in the source network recover in the corresponding (mapped) recovery network.
A VM category is a key-value pair that groups similar guest VMs. Associating a protection
policy with a VM category ensures that the protection policy applies to all the guest VMs in
the group regardless of how the group scales with time. For example, you can associate a
group of guest VMs with the
Department: Marketing
category, where
Department
is a category that includes a value
Marketing
along with other values such as
Engineering
and
Sales
.
VM categories remain the same way on on-prem sites and Xi Cloud Services. For more information about VM categories, see Category Management in Prism Central Guide .
A copy of the state of a system at a particular point in time.
Application-consistent snapshots are more suited for systems and applications that can be quiesced and un-quiesced or thawed, such as database operating systems and applications such as SQL, Oracle, and Exchange.
A guest VM that you can recover from a recovery point.
A configurable policy that takes recovery points of the protected guest VMs in equal time intervals, and replicates those recovery points to the recovery sites.
A configurable policy that orchestrates the recovery of protected guest VMs at the recovery site.
The time interval that refers to the acceptable data loss if there is a failure. For example, if the RPO is 1 hour, the system creates a recovery point every 1 hour. On recovery, you can recover the guest VMs with data as of up to 1 hour ago. Take Snapshot Every in the Create Protection Policy GUI represents RPO.
The time period from failure event to the restored service. For example, an RTO of 30 minutes enables you to back up and run the protected guest VMs in 30 minutes after the failure event.
The following flowchart provides you with the detailed representation of the disaster recovery (DR) solutions of Nutanix. This decision tree covers both the DR solutions—protection domain-based DR and Leap helping you to make quick decisions on which DR strategy will best suit your environment.
For information about protection domain-based (legacy) DR, see Data Protection and Recovery with Prism Element guide. With Leap, you can protect your guest VMs and perform DR to on-prem availability zones (sites) or to Xi Cloud Services. A Leap deployment for DR from Xi Cloud Services to an on-prem Nutanix cluster is Xi Leap. The detailed information about Leap and Xi Leap DR configuration is available in the following sections of this guide.
Protection and DR between On-Prem Sites (Leap)
Protection and DR between On-Prem Site and Xi Cloud Service (Xi Leap)
The workflow for entity-centric protection and disaster recovery (DR) configuration is as follows. The workflow is largely the same for both Leap and Xi Leap configurations except a few extra steps you must perform while configuring Xi Leap.
For DR solutions with Asynchronous, NearSync, and Synchronous replication schedules to succeed, the nodes in the on-prem Availability Zones (AZs or sites) must have certain resources. This section provides information about the node, disk and Foundation configurations necessary to support the RPO-based recovery point frequencies.
The conditions and configurations provided in this section apply to Local and Remote recovery points.
Any node configuration with two or more SSDs, each SSD being 1.2 TB or greater capacity, supports recovery point frequency for NearSync.
Any node configuration that supports recovery point frequency of six (6) hours also supports AHV-based Synchronous replication schedules because a protection policy with Synchronous replication schedule takes recovery points of the protected VMs every 6 hours. See Protection with Synchronous Replication Schedule (0 RPO) and DR for more details about Synchronous replication.
Both the primary cluster and replication target cluster must fulfill the same minimum resource requirements.
Ensure that any new node or disk additions made to the on-prem sites (Availability Zones) meet the minimum requirements.
Features such as Deduplication and RF3 may require additional memory depending on the DR schedules and other workloads run on the cluster.
The table lists the supported frequency for the recovery points across various hardware configurations.
Type of disk | Capacity per node | Minimum recovery point frequency | Foundation Configuration - SSD and CVM requirements |
---|---|---|---|
Hybrid | Total HDD tier capacity of 32 TB or lower. Total capacity (HDD + SSD) of 40 TB or lower. |
|
No change required—Default Foundation configuration.
|
Total HDD tier capacity between 32-64 TB. Total capacity (HDD + SSD) of 92 TB or lower. Up to 64 TB HDD Up to 32 TB SSD (4 x 7.68 TB SSDs) |
|
Modify Foundation configurations to minimum:
|
|
Total HDD tier capacity between 32-64 TB. Total capacity (HDD + SSD) of 92 TB or lower. Up to 64 TB HDD Up to 32 TB SSD |
Async (every 6 Hours) | No change required—Default Foundation configuration. | |
Total HDD tier capacity between 64-80 TB. Total capacity (HDD + SSD) of 96 TB or lower. |
Async (every 6 Hours) | No change required—Default Foundation configuration. | |
Total HDD tier capacity greater than 80 TB. Total capacity (HDD + SSD) of 136 TB or lower. |
Async (every 6 Hours) |
Modify Foundation configurations to minimum:
|
|
All Flash | Total capacity of 48 TB or lower |
|
No change required—Default Foundation configuration. |
Total capacity between 48-92 TB |
|
Modify Foundation configurations to minimum:
|
|
Total capacity between 48-92 TB | Async (every 6 Hours) | No change required—Default Foundation configuration. | |
Total capacity greater than 92 TB | Async (every 6 Hours) |
Modify Foundation configurations to minimum:
|
Leap protects your guest VMs and orchestrates their disaster recovery (DR) to other Nutanix clusters when events causing service disruption occur at the primary availability zone (site). For protection of your guest VMs, protection policies with Asynchronous, NearSync, or Synchronous replication schedules generate and replicate recovery points to other on-prem availability zones (sites). Recovery plans orchestrate DR from the replicated recovery points to other Nutanix clusters at the same or different on-prem sites.
Protection policies create a recovery point—and set its expiry time—in every iteration of the specified time period (RPO). For example, the policy creates a recovery point every 1 hour for an RPO schedule of 1 hour. The recovery point expires at its designated expiry time based on the retention policy—see step 3 in Creating a Protection Policy with an Asynchronous Replication Schedule (Leap). If there is a prolonged outage at a site, the Nutanix cluster retains the last recovery point to ensure you do not lose all the recovery points. For NearSync replication (lightweight snapshot), the Nutanix cluster retains the last full hourly snapshot. During the outage, the Nutanix cluster does not clean up the recovery points due to expiry. When the Nutanix cluster comes online, it cleans up the recovery points that are past expiry immediately.
For High Availability of a guest VM, Leap enables replication of its recovery points to one or more on-prem sites. A protection policy can replicate recovery points to maximum two on-prem sites. For replication, you must add a replication schedule between sites. You can set up the on-prem sites for protection and DR in the following arrangements.
The replication to multiple sites enables DR to Nutanix clusters at all the sites where the recovery points replicate or exist. To enable performing DR to a Nutanix cluster at the same or different site (recovery site), you must create a recovery plan. To enable performing DR to two different Nutanix clusters at the same or different recovery sites, you must create two discrete recovery plans—one for each recovery site. In addition to performing DR to Nutanix clusters running the same hypervisor type, you can also perform cross-hypervisor disaster recovery (CHDR)—DR from AHV clusters to ESXi clusters, or from ESXi clusters to AHV clusters.
The protection policies and recovery plans you create or update synchronize continuously between the primary and recovery on-prem sites. The reverse synchronization enables you to create or update entities (protection policies, recovery plans, and guest VMs) at either the primary or the recovery sites.
The following section describes protection of your guest VMs and DR to a Nutanix cluster at the same or different on-prem sites. The workflow is the same for protection and DR to a Nutanix cluster in supported public cloud platforms. For information about protection of your guest VMs and DR from Xi Cloud Services to an on-prem Nutanix cluster (Xi Leap), see Protection and DR between On-Prem Site and Xi Cloud Service (Xi Leap).
The following are the general requirements of Leap. Along with the general requirements, there are specific requirements for protection with the following supported replication schedules.
The AOS license required depends on the features that you want to use. For information about the features that are available with an AOS license, see Software Options.
The underlying hypervisors required differ in all the supported replication schedules. For more information about underlying hypervisor requirements for the supported replication schedules, see:
Nutanix supports replications between the all the latest supported LTS and STS released AOS versions. To check the list of the latest supported AOS versions, see KB-5505. To determine if the AOS versions currently running on your clusters are EOL, see the EOL document .
Upgrade the AOS version to the next available supported LTS/STS release. To determine if an upgrade path is supported, check the Upgrade Paths page before you upgrade the AOS.
For example, the clusters are running AOS versions 5.5.x and 5.10.x respectively. Upgrade the cluster on 5.5.x to 5.10.x. After both the clusters are on 5.10.x, proceed to upgrade each cluster to 5.15.x (supported LTS). Once both clusters are on 5.15.x you can upgrade the clusters to 5.20.x or newer.
Nutanix recommends that both the primary and the replication clusters or sites run the same AOS version.
You must have one of the following roles in Prism Central.
To view the available roles or create a role, click the hamburger icon at the top-left corner of the window and go to Administration > Roles in the left pane.
To allow two-way replication between Nutanix clusters at the same or different sites, you must enable certain ports in your external firewall. To know about the required ports, see Disaster Recovery - Leap in Port Reference.
For information about installing NGT, see Nutanix Guest Tools in Prism Web Console Guide .
The empty CD-ROM is required for mounting NGT at the recovery site.
NM_CONTROLLED
field to
yes
. After setting the field, restart the network service on
the VM.
For information about installing NGT, see Nutanix Guest Tools in Prism Web Console Guide .
The empty CD-ROM is required for mounting NGT at the recovery site.
For example, if you have VLAN with id 0 and network 10.45.128.0/17, and three clusters PE1, PE2, and PE3 at the site AZ1, all the clusters must maintain the same name, IP address range, and IP address prefix length ( (Gateway IP/Prefix Length) ), for VLAN with id 0.
m
networks and the other cluster having
n
networks, ensure that the recovery cluster has
m + n
networks. Such
a design ensures that all recovered VMs attach to a network.
For more information about the scaled-out deployments of a Prism Central, see Leap Terminology.
Nutanix VM mobility drivers are required for accessing the guest VMs after failover. Without Nutanix VM mobility drivers, the guest VMs become inaccessible after a failover.
For example, if you have VLAN with id 0 and network 10.45.128.0/17, and three clusters PE1, PE2, and PE3 at the site AZ1, all the clusters must maintain the same name, IP address range, and IP address prefix length ( (Gateway IP/Prefix Length) ), for VLAN with id 0.
Consider the following general limitations before configuring protection and disaster recovery (DR) with Leap. Along with the general limitations, there are specific protection limitations with the following supported replication schedules.
You cannot do or implement the following.
When you perform DR of vGPU console-enabled guest VMs, the VMs recover with the default VGA console (without any alert) instead of vGPU console. The guest VMs fail to recover when you perform cross-hypervisor disaster recovery (CHDR). For more information about DR and backup behavior of guest VMs with vGPU, see vGPU Enabled Guest VMs.
You can configure NICs for a guest VM associated with either production or test VPC.
You cannot protect volume groups.
You cannot apply network segmentation for management traffic (any traffic not on the backplane network) in Leap.
Although there is no limit to the number of VLANs that you can create, only the first 500 VLANs list in the drop-down of Network Settings while creating a recovery plan. For more information about VLANs in the recovery plan, see Nutanix Virtual Networks.
Due to the way the Nutanix architecture distributes data, there is limited support for mapping a Nutanix cluster to multiple vSphere clusters. If a Nutanix cluster is split into multiple vSphere clusters, migrate and recovery operations fail.
The following table list the behavior of guest VMs with vGPU to disaster recovery (DR) and backup deployments.
Primary cluster | Recovery cluster | DR or Backup | Identical vGPU models | Unidentical vGPU models or no vGPU |
---|---|---|---|---|
AHV | AHV | Nutanix Disaster Recovery |
Supported:
|
Supported:
|
Backup: HYCU | Guest VMs with vGPU fail to recover. | Guest VMs with vGPU fail to recover. | ||
Backup: Veeam | Guest VMs with vGPU fail to recover. |
Tip:
The VMs start when you disable vGPU on the guest VM
|
||
ESXi | ESXi | Nutanix Disaster Recovery | Guest VMs with vGPU cannot be protected. | Guest VMs with vGPU cannot be protected. |
Backup | Guest VMs with vGPU cannot be protected. | Guest VMs with vGPU cannot be protected. | ||
AHV | ESXi | Nutanix Disaster Recovery | vGPU is disabled after failover of Guest VMs with vGPU. | vGPU is disabled after failover of Guest VMs with vGPU. |
ESXi | AHV | Nutanix Disaster Recovery | Guest VMs with vGPU cannot be protected. | Guest VMs with vGPU cannot be protected. |
For the maximum number of entities you can configure with different replication schedules and perform failover (disaster recovery), see Nutanix Configuration Maximums. The limits have been tested for Leap production deployments. Nutanix does not guarantee the system to be able to operate beyond these limits.
Nutanix recommends the following best practices for configuring protection and disaster recovery (DR) with Leap.
If you unpair the AZs while the guest VMs in the Nutanix clusters are still in synchronization, the Nutanix cluster becomes unstable. For more information about disabling Synchronous replication, see Synchronous Replication Management.
You can protect a guest VM either with legacy DR solution (protection domain-based) or with Leap. To protect a legacy DR-protected guest VM with Leap, you must migrate the guest VM from protection domain to a protection policy. During the migration, do not delete the guest VM snapshots in the protection domain. Nutanix recommends keeping the guest VM snapshots in the protection domain until the first recovery point for the guest VM is available on Prism Central. For more information, see Migrating Guest VMs from a Protection Domain to a Protection Policy.
If the single Prism Central that you use for protection and DR to Nutanix clusters at the same availability zone (site) becomes inactive, you cannot perform a failover when required. To avoid the single point of failure in such deployments, Nutanix recommends installing the single Prism Central at a different site (different fault domain).
Create storage containers with the same name on both the primary and recovery Nutanix clusters.
Leap automatically maps the storage containers during the first replication (seeding) of a guest VM. If a storage container with the same name exists on both the primary and recovery Nutanix clusters, the recovery points replicate to the same name storage container only. For example, if your protected guest VMs are in the SelfServiceContainer on the primary Nutanix cluster, and the recovery Nutanix cluster also has SelfServiceContainer , the recovery points replicate to SelfServiceContainer only. If a storage container with the same name does not exist at the recovery AZ, the recovery points replicate to a random storage container at the recovery AZ. For more information about creating storage containers on the Nutanix clusters, see Creating a Storage Container in Prism Web Console Guide .
If you unpair the sites while the guest VMs in the Nutanix clusters are still in synchronization, the Nutanix cluster becomes unstable. For more information about disabling Synchronous replication, see Synchronous Replication Management.
Leap enables protection of your guest VMs and disaster recovery (DR) to one or more Nutanix clusters at the same or different on-prem sites. A Nutanix cluster is essentially an AHV or an ESXi cluster running AOS. In addition to performing DR to Nutanix clusters running the same hypervisor type, you can also perform cross-hypervisor disaster recovery (CHDR)—DR from AHV clusters to ESXi clusters, or from ESXi clusters to AHV clusters.
Leap supports DR (and CHDR) to maximum two different Nutanix clusters at the same or different availability zones (sites). You can protect your guest VMs with the following replication schedules.
To maintain the efficiency in protection and DR, Leap allows to protect a guest VM with Synchronous replication schedule to only one AHV cluster and at the different on-prem availability zone.
The disaster recovery (DR) views enable you to perform CRUD operations on the following types of Leap entities.
This chapter describes the views of Prism Central (on-prem site).
The Availability Zones view under the hamburger icon > Administration lists all of your paired availability zones.
The following figure is a sample view, and the tables describe the fields and the actions that you can perform in this view.
Field | Description |
---|---|
Name | Name of the availability zone. |
Region | Region to which the availability zone belongs. |
Type | Type of availability zone. Availability zones that are backed by on-prem Prism Central instances are shown to be of type physical. The availability zone that you are logged in to is shown as a local availability zone. |
Connectivity Status | Status of connectivity between the local availability zone and the paired availability zone. |
Workflow | Description |
---|---|
Connect to Availability Zone (on-prem Prism Central only) | Connect to an on-prem Prism Central or to a Xi Cloud Services for data replication. |
Action | Description |
---|---|
Disconnect | Disconnect the remote availability zone. When you disconnect an availability zone, the pairing is removed. |
The Protection Policies view under the hamburger icon > Data Protection lists all of configured protection policies from all the paired availability zones.
The following figure is a sample view, and the tables describe the fields and the actions that you can perform in this view.
Field | Description |
---|---|
Policy Name | Name of the protection policy. |
Schedules | Number of schedules configured in the protection policy. If the protection policy has multiple schedules, a drop-down icon is displayed. Click the drop-down icon to see the primary location:primary Nutanix cluster , recovery location:recovery Nutanix cluster , and RPO of the schedules in the protection policy. |
Alerts | Number of alerts issued for the protection policy. |
Workflow | Description |
---|---|
Create protection policy | Create a protection policy. |
Action | Description |
---|---|
Update | Update the protection policy. |
Clone | Clone the protection policy. |
Delete | Delete the protection policy. |
The Recovery Plans view under the hamburger icon > Data Protection lists all of configured recovery plans from all the paired availability zones.
The following figure is a sample view, and the tables describe the fields and the actions that you can perform in this view.
Field | Description |
---|---|
Name | Name of the recovery plan. |
Primary Location | Replication source site for the recovery plan. |
Recovery Location | Replication target site for the recovery plan. |
Entities |
Sum of the following VMs:
|
Last Validation Status | Status of the most recent validation of the recovery plan. |
Last Test Status | Status of the most recent test performed on the recovery plan. |
Last Failover Status | Status of the most recent failover performed on the recovery plan. |
Workflow | Description |
---|---|
Create Recovery Plan | Create a recovery plan. |
Action | Description |
---|---|
Validate | Validates the recovery plan to ensure that the VMs in the recovery plan have a valid configuration and can be recovered. |
Test | Tests the recovery plan. |
Clean-up test VMs | Cleans up the VMs failed over as a result of testing recovery plan. |
Update | Updates the recovery plan. |
Failover | Performs a failover. |
Delete | Deletes the recovery plan. |
The dashboard includes widgets that display the statuses of configured protection policies and recovery plans. If you have not configured these VMs, the widgets display a summary of the steps required to get started with Leap.
To view these widgets, click the Dashboard tab.
The following figure is a sample view of the dashboard widgets.
To perform disaster recovery (DR) to Nutanix clusters at different on-prem available zones (sites), enable Leap at both the primary and recovery sites (Prism Central). Without enabling Leap, you can configure protection policies and recovery plans that synchronize to the paired sites but you cannot perform failover and failback operations. To perform DR to different Nutanix clusters at the same site, enable Leap in the single Prism Central.
To enable Leap, perform the following procedure.
To replicate entities (protection policies, recovery plans, and recovery points) to different on-prem availability zones (sites) bidirectionally, pair the sites with each other. To replicate entities to different Nutanix clusters at the same site bidirectionally, you need not pair the sites because the primary and the recovery Nutanix clusters are registered to the same site (Prism Central). Without pairing the sites, you cannot perform DR to a different site.
To pair an on-prem AZ with another on-prem AZ, perform the following procedure at either of the on-prem AZs.
Automated disaster recovery (DR) configurations use protection policies to protect your guest VMs, and recovery plans to orchestrate the recovery of those guest VMs to different Nutanix clusters at the same or different availability zones (sites). You can automate protection of your guest VMs with the following supported replication schedules in Leap.
To maintain the efficiency in protection and DR, Leap allows to protect a guest VM with Synchronous replication schedule to only one AHV cluster and at the different on-prem availability zone.
Asynchronous replication schedules enable you to protect your guest VMs with an RPO of 1 hour or beyond. A protection policy with an Asynchronous replication schedule creates a recovery point in an hourly time interval, and replicates it to the recovery availability zones (sites) for High Availability. For guest VMs protected with Asynchronous replication schedule, you can perform disaster recovery (DR) to different Nutanix clusters at same or different sites. In addition to performing DR to Nutanix clusters running the same hypervisor type, you can also perform cross-hypervisor disaster recovery (CHDR)—DR from AHV clusters to ESXi clusters, or from ESXi clusters to AHV clusters.
The following are the specific requirements for protecting your guest VMs with Asynchronous replication schedule. Ensure that you meet the following requirements in addition to the general requirements of Leap.
For information about the general requirements of Leap, see Leap Requirements.
For information about node, disk and Foundation configurations required to support Asynchronous replication schedules, see On-Prem Hardware Resource Requirements.
AHV or ESXi
Each on-prem site must have a Leap enabled Prism Central instance.
The primary and recovery Prism Central and Prism Element on the Nutanix clusters must be running the following versions of AOS.
Guest VMs protected with Asynchronous replication schedule support cross-hypervisor disaster recovery. You can perform failover (DR) to recover guest VMs from AHV clusters to ESXi clusters or guest VMs from ESXi clusters to AHV clusters by considering the following requirements.
NGT configures the guest VMs with all the required drivers for VM portability. For more information about general NGT requirements, see Nutanix Guest Tools Requirements and Limitations in Prism Web Console Guide .
For information about operating systems that support UEFI and Secure Boot, see UEFI and Secure Boot Support for CHDR.
If you have delta disks attached to a guest VM and you proceed with failover, you get a validation warning and the guest VM does not recover. Contact Nutanix Support for assistance.
Operating System | Version | Requirements and limitations |
---|---|---|
Windows |
|
|
Linux |
|
|
The storage container name of the protected guest VMs must be the same on both the primary and recovery clusters. Therefore, a storage container must exist on the recovery cluster with the same name as the one on the primary cluster. For example, if the protected VMs are in the SelfServiceContainer storage container on the primary cluster, there must also be a SelfServiceContainer storage container on the recovery cluster.
Consider the following specific limitations before protecting your guest VMs with Asynchronous replication schedule. These limitations are in addition to the general limitations of Leap.
For information about the general limitations of Leap, see Leap Limitations.
CHDR does not preserve hypervisor-specific properties (for example, multi-writer flags, independent persistent and non-persistent disks, changed block tracking (CBT), PVSCSI disk configurations).
To protect the guest VMs in an hourly replication schedule, configure an Asynchronous replication schedule while creating the protection policy. The policy takes recovery points of those guest VMs in the specified time intervals (hourly) and replicates them to the recovery availability zones (sites) for High Availability. To protect the guest VMs at the same or different recovery sites, the protection policy allows you to configure Asynchronous replication schedules to at most two recovery sites—a unique replication schedule to each recovery site. The policy synchronizes continuously to the recovery sites in a bidirectional way.
To create a protection policy with an Asynchronous replication schedule, do the following at the primary site. You can also create a protection policy at the recovery site. Protection policies you create or update at a recovery site synchronize back to the primary site.
The drop-down lists all the sites paired with the local site. Local AZ represents the local site (Prism Central). For your primary site, you can check either the local site or a non-local site.
The drop-down lists all the Nutanix clusters registered to Prism Central representing the selected site. If you want to protect the guest VMs from multiple Nutanix clusters in the same protection policy, check the clusters that host those guest VMs. All Clusters protects the guest VMs of all Nutanix clusters registered to Prism Central.
Clicking Save activates the Recovery Location pane. After saving the primary site configuration, you can optionally add a local schedule (step iv) to retain the recovery points at the primary site.
Specify the following information in the Add Schedule window.
When you enter the frequency in minutes , the system selects the Roll-up retention type by default because minutely recovery points do not support Linear retention types.
For more information about the roll-up recovery points, see step d.iii.
Irrespective of the local or replication schedules, the recovery points are of the specified type. If you check Take App-Consistent Recovery Point , the recovery points generated are application-consistent and if you do not check Take App-Consistent Recovery Point , the recovery points generated are crash-consistent. If the time in the local schedule and the replication schedule match, the single recovery point generated is application-consistent.
The drop-down lists all the sites paired with the local site. Local AZ represents the local site (Prism Central). Select Local AZ if you want to configure DR to a different Nutanix cluster at the same site.
If you do not select a site, local recovery points that are created by the protection policy do not replicate automatically. You can, however, replicate the recovery points manually and use recovery plans to recover the guest VMs. For more information, see Manual Disaster Recovery (Leap).
The drop-down lists all the Nutanix clusters registered to Prism Central representing the selected site. You can select one cluster at the recovery site. If you want to replicate the recovery points to more clusters at the same or different sites, add another recovery site with a replication schedule. For more information to add another recovery site with a replication schedule, see step e.
Clicking Save activates the + Add Schedule button between the primary and the recovery site. After saving the recovery site configuration, you can optionally add a local schedule to retain the recovery points at the recovery site.
Specify the following information in the Add Schedule window.
When you enter the frequency in minutes , the system selects the Roll-up retention type by default because minutely recovery points do not support Linear retention types.
For more information about the roll-up recovery points, see step d.iii.
Irrespective of the local or replication schedules, the recovery points are of the specified type. If you check Take App-Consistent Recovery Point , the recovery points generated are application-consistent and if you do not check Take App-Consistent Recovery Point , the recovery points generated are crash-consistent. If the time in the local schedule and the replication schedule match, the single recovery point generated is application-consistent.
Specify the following information in the Add Schedule window. The window auto-populates the Primary Location and Recovery Location that you have selected in step b and step c.
The specified frequency is the RPO. For more information about RPO, see Leap Terminology.
This field is unavailable if you do not specify a recovery location.
If you select linear retention, the remote and local retention count represents the number of recovery points to retain at any given time. If you select roll-up retention, these numbers specify the retention period.
Reverse retention maintains the retention numbers of recovery points even after failover to a recovery site in the same or different availability zones. For example, if you retain two recovery points at the primary site and three recovery points at the recovery site, and you enable reverse retention, a failover event does not change the initial retention numbers when the recovery points replicate back to the primary site. The recovery site still retains two recovery points while the primary site retains three recovery points. If you do not enable reverse retention, a failover event changes the initial retention numbers when the recovery points replicate back to the primary site. The recovery site retains three recovery points while the primary site retains two recovery points.
Maintaining the same retention numbers at a recovery site is required if you want to retain a particular number of recovery points, irrespective of where the guest VM is after its failover.
Application-consistent recovery points ensure that application consistency is maintained in the replicated recovery points. For application-consistent recovery points, install NGT on the guest VMs running on AHV clusters. For guest VMs running on ESXi clusters, you can take application-consistent recovery points without installing NGT, but the recovery points are hypervisor-based, and leads to VM stuns (temporary unresponsive VMs) after failover to the recovery sites.
By default, recovery point creation begins immediately after you create the protection policy. If you want to specify when recovery point creation must begin, click Immediately at the top-right corner, and then, in the Start Time dialog box, do the following.
For example, the guest VM VM_SherlockH is in the category Department:Admin , and you add this category to the protection policy named PP_AdminVMs . Now, if you add VM_SherlockH from the VMs page to another protection policy named PP_VMs_UK , VM_SherlockH is protected in PP_VMs_UK and unprotected from PP_AdminVMs .
If you do not want to protect the guest VMs category wise, proceed to the next step without checking VM categories. You can add the guest VMs individually to the protection policy later from the VMs page (see Adding Guest VMs individually to a Protection Policy).
This topic describes the conditions and limitations for application-consistent recovery points that you can generate through a protection policy. For information about the operating systems that support the AOS version you have deployed, see the Compatibility Matrix.
Applications running in your guest VM must be able to quiesce I/O operations. For example, For example, you can quiesce I/O operations for database applications and similar workload types.
For installing and enabling NGT, see Nutanix Guest Tools in the Prism Web Console Guide .
For guest VMs running on ESXi, consider these points.
Operating system | Version |
---|---|
Windows |
|
Linux |
|
When you configure a protection policy and select Take App-Consistent Recovery Point , the Nutanix cluster transparently invokes the VSS (also known as Shadow copy or volume snapshot service).
Third party Backup products can choose between VSS_BT_FULL (full backup )and VSS_BT_COPY (copy backup) backup types.
Nutanix VSS recovery points fail for such guest VMs.
C:\Program Files\Nutanix\Scripts\pre_freeze.bat
C:\Program Files\Nutanix\Scripts\post_thaw.bat
/usr/local/sbin/pre_freeze
Replace pre_freeze with the script name (without extension).
/usr/local/sbin/post_thaw
Replace post_thaw with the script name (without extension).
#!/bin/sh
#pre_freeze-script
date >> '/scripts/pre_root.log'
echo -e "\n attempting to run pre_freeze script for MySQL as root user\n" >> /scripts/pre_root.log
if [ "$(id -u)" -eq "0" ]; then
python '/scripts/quiesce.py' &
echo -e "\n executing query flush tables with read lock to quiesce the database\n" >> /scripts/pre_freeze.log
echo -e "\n Database is in quiesce mode now\n" >> /scripts/pre_freeze.log
else
date >> '/scripts/pre_root.log'
echo -e "not root useri\n" >> '/scripts/pre_root.log'
fi
#!/bin/sh
#post_thaw-script
date >> '/scripts/post_root.log'
echo -e "\n attempting to run post_thaw script for MySQL as root user\n" >> /scripts/post_root.log
if [ "$(id -u)" -eq "0" ]; then
python '/scripts/unquiesce.py'
else
date >> '/scripts/post_root.log'
echo -e "not root useri\n" >> '/scripts/post_root.log'
fi
@echo off
echo Running pre_freeze script >C:\Progra~1\Nutanix\script\pre_freeze_log.txt
@echo off
echo Running post_thaw script >C:\Progra~1\Nutanix\script\post_thaw_log.txt
If these requirements are not met, the system captures crash-consistent snapshots.
Server | ESXi | AHV | ||
---|---|---|---|---|
NGT status | Result | NGT status | Result | |
Microsoft Windows Server edition | Installed and active. Also pre-freeze and post-thaw scripts are present. | Nutanix script-based VSS snapshots | Installed and active. Also pre-freeze and post-thaw scripts are present. | Nutanix script-based VSS snapshots |
Installed and active | Nutanix VSS-enabled snapshots. | Installed and active | Nutanix VSS-enabled snapshots | |
Not enabled | Hypervisor-based application-consistent or crash-consistent snapshots. | Not enabled | Crash-consistent snapshots | |
Microsoft Windows Client edition | Installed and active. Also pre-freeze and post-thaw scripts are present. | Nutanix script-based VSS snapshots | Installed and active. Also pre-freeze and post-thaw scripts are present. | Nutanix script-based VSS snapshots |
Not enabled | Hypervisor-based snapshots or crash-consistent snapshots. | Not enabled | Crash-consistent snapshots | |
Linux VMs | Installed and active. Also pre-freeze and post-thaw scripts are present. | Nutanix script-based VSS snapshots | Installed and active. Also pre-freeze and post-thaw scripts are present. | Nutanix script-based VSS snapshots |
Not enabled | Hypervisor-based snapshots or crash-consistent snapshots. | Not enabled | Crash-consistent snapshots |
To orchestrate the failover (disaster recovery) of the protected guest VMs to the recovery site, create a recovery plan. After a failover, a recovery plan recovers the protected guest VMs to the recovery site. If you have configured two on-prem recovery sites in a protection policy, create two recovery plans for DR—one for recovery to each recovery site. The recovery plan synchronizes continuously to the recovery site in a bidirectional way.
To create a recovery plan, do the following at the primary site. You can also create a recovery plan at a recovery site. The recovery plan you create or update at a recovery site synchronizes back to the primary site.
Traditionally, to achieve this new configuration, you would manually log on to the recovered VM and modify the relevant files. With in-guest scripts, you have to write a script to automate the required steps and enable the script when you configure a recovery plan. The recovery plan execution automatically invokes the script and performs the reassigning of DNS IP address and reconnection to the database server at the recovery site.
Traditionally, to reconfigure, you would manually log on to the VM, remove the VM from an existing domain controller, and then add the VM to a new domain controller. With in-guest scripts, you can automate the task of changing the domain controller.
C:\Program Files\Nutanix\scripts\production\vm_recovery
C:\Program Files\Nutanix\scripts\test\vm_recovery
/usr/local/sbin/production_vm_recovery
/usr/local/sbin/test_vm_recovery
A command prompt icon appears against the guest VMs or VM categories to indicate that in-guest script execution is enabled on those guest VMs or VM categories.
A stage defines the order in which the protected guest VMs start at the recovery cluster. You can create multiple stages to prioritize the start sequence of the guest VMs. In the Power On Sequence , the VMs in the preceding stage start before the VMs in the succeeding stages. On recovery, it is desirable to start some VMs before the others. For example, database VMs must start before the application VMs. Place all the database VMs in the stage before the stage containing the application VMs, in the Power On Sequence .
You perform failover of the protected guest VMs when unplanned failure events (for example, natural disasters) or planned events (for example, scheduled maintenance) happen at the primary availability zone (site) or the primary cluster. The protected guest VMs migrate to the recovery site where you perform the failover operations. On recovery, the protected guest VMs start in the Nutanix cluster you specify in the recovery plan that orchestrates the failover.
The following are the types of failover operations.
At the recovery site, the guest VMs can recover using the recovery points replicated from the primary site only. The guest VMs cannot recover using the local recovery points. For example, if you perform an unplanned failover from the primary site AZ1 to the recovery site AZ2 , the guest VMs recover at AZ2 using the recovery points replicated from AZ1 to AZ2 .
You can perform a planned or an unplanned failover in different scenarios of network failure. For more information about network failure scenarios, see Leap and Xi Leap Failover Scenarios.
At the recovery site after a failover, the recovery plan creates only the VM category that was used to include the guest VM in the recovery plan. Manually create the remaining VM categories at the recovery site and associate the guest VMs with those categories.
The recovered guest VMs generate recovery points as per the replication schedule that protects it even after recovery. The recovery points replicate back to the primary site when the primary site starts functioning. The approach for reverse replication enables you to perform failover of the guest VMs from the recovery site back to the primary site (failback). The same recovery plan applies to both the failover and the failback operations. Only, for failover, you must perform the failover operations on the recovery plan at the recovery site while for failback, you must perform the failover operations on the recovery plan at the primary site. For example, if a guest VM fails over from AZ1 (Local) to AZ2 , the failback fails over the same VMs from AZ2 (Local) back to AZ1 .
You have the flexibility to perform a real or simulated failover for the full and partial workloads (with or without networking). The term virtual network is used differently on on-prem clusters and Xi Cloud Services. In Xi Cloud Services, the term virtual network is used to describe the two built-in virtual networks—production and test. Virtual networks on the on-prem clusters are virtual subnets bound to a single VLAN. Manually create these virtual subnets, and create separate virtual subnets for production and test purposes. Create these virtual subnets before you configure recovery plans. When configuring a recovery plan, you map the virtual subnets at the primary site to the virtual subnets at the recovery site.
The following are the various scenarios that you can encounter in Leap configurations for disaster recovery (DR) to an on-prem availability zone (site) or to Xi Cloud (Xi Leap). Each scenario is explained with the required network-mapping configuration for Xi Leap. However, the configuration remains the same irrespective of disaster recovery (DR) using Leap or Xi Leap. You can either create a recovery plan with the following network mappings (see Creating a Recovery Plan (Leap)) or update an existing recovery plan with the following network mappings (see Updating a Recovery Plan).
Full network failure is the most common scenario. In this case, it is desirable to bring up the whole primary site in the Xi Cloud. All the subnets must failover, and the WAN IP address must change from the on-prem IP address to the Xi WAN IP address. Floating IP addresses can be assigned to individual guest VMs, otherwise, everything use Xi network address translation (NAT) for external communication.
Perform the failover when the on-prem subnets are down and jump the host available on the public Internet through the floating IP address of Xi production network.
To set up the recovery plan that orchestrates the full network failover, perform the following.
The selection auto-populates the Xi production and test failover subnets.
Perform steps 1–4 for every subnet.
You want to failover one or more subnets from the primary site to Xi Cloud. The communications between the sites happen through the VPN or using the external NAT or floating IP addresses. A use case of this type of scenario is that the primary site needs maintenance, but some of its subnets must see no downtime.
Perform partial failover when some subnets are active in the production networks at both on-prem and Xi Cloud, and jump the host available on the public Internet through the floating IP address of Xi production network.
On-prem guest VMs can connect to the guest VMs on the Xi Cloud Services.
To set up the recovery plan that orchestrates the partial network failover, perform the following.
The selection auto-populates the Xi production and test failover subnets.
Perform steps 1–4 for one or more subnets based on the maintenance plan.
You want to failover some guest VMs to Xi Cloud, while keeping the other guest VMs up and running at the on-prem cluster (primary site). A use case of this type of scenario is that the primary site needs maintenance, but some of its guest VMs must see no downtime.
This scenario requires changing IP addresses for the guest VMs running at Xi Cloud. Since you cannot have the subnet active on both the sites, create a subnet to host the failed over guest VMs. Jump the host available on the public Internet through the floating IP address of Xi production network.
On-prem guest VMs can connect to the guest VMs on the Xi Cloud Services.
To set up the recovery plan that orchestrates the partial subnet network failover, perform the following.
The selection auto-populates the Xi production and test failover subnets for a full subnet failover
Perform steps 1–4 for one or more subnets based on the maintenance plan.
You want to test all the preceding three scenarios by creating an isolated test network so that no routing or IP address conflict happens. Clone all the guest VMs from a local recovery point and bring up to test failover operations. Test failover test when all on-prem subnets are active and on-prem guest VMs can connect to the guest VMs at the Xi Cloud. Jump the host available on the public Internet through the floating IP address of Xi production network.
In this case, focus on the test failover section when creating the recovery plan. When you select a local AZ production subnet, it copies to the test network. You can go one step further and create a test subnet at the Xi Cloud.
You can perform test failover, planned failover, and unplanned failover of the guest VMs protected with Asynchronous replication schedule across different Nutanix clusters at the same or different on-prem availability zones (sites). The steps to perform test, planned, and unplanned failover are largely the same irrespective of the replication schedules that protect the guest VMs.
After you create a recovery plan, you can run a test failover periodically to ensure that the failover occurs smoothly when required. To perform a test failover, do the following procedure at the recovery site. If you have two recovery sites for DR, perform the test at the site where you want to recover the guest VMs.
Resolve the error conditions and then restart the test procedure.
After testing a recovery plan, you can remove the test VMs that the recovery plan creates in the recovery test network. To clean up the test VMs, do the following at the recovery site where the test failover created the test VMs.
If there is a planned event (for example, scheduled maintenance of guest VMs) at the primary availability zone (site), perform a planned failover to the recovery site. To perform a planned failover, do the following procedure at the recovery site. If you have two recovery sites for DR, perform the failover at the site where you want to recover the guest VMs.
Resolve the error conditions and then restart the failover procedure.
The entities of AHV/ESXi clusters recover at a different path on the ESXi clusters if their files conflict with the existing files on the recovery ESXi cluster. For example, there is a file name conflict if a VM (VM1) migrates to a recovery cluster that already has a VM (VM1) in the same container.
However, the entities recover at a different path with VmRecoveredAtAlternatePath alert only if the following conditions are met.
If these conditions are not satisfied, the failover operation fails.
If there is an unplanned event (for example, a natural disaster or network failure) at the primary availability zone (site), perform an unplanned failover to the recovery site. To perform an unplanned failover, do the following procedure at the recovery site. If you have two recovery sites for DR, perform the failover at the site where you want to recover the guest VMs.
Resolve the error conditions and then restart the failover procedure.
The entities of AHV/ESXi clusters recover at a different path on the ESXi clusters if their files conflict with the existing files on the recovery ESXi cluster. For example, there is a file name conflict if a VM (VM1) migrates to a recovery cluster that already has a VM (VM1) in the same container.
However, the entities recover at a different path with VmRecoveredAtAlternatePath alert only if the following conditions are met.
If these conditions are not satisfied, the failover operation fails.
A failback is failover of the guest VMs from the recovery availability zone (site) back to the primary site. The same recovery plan applies to both the failover and the failback operations. Only, for failover, you must perform the failover operations on the recovery plan at the recovery site while for failback, you must perform the failover operations on the recovery plan at the primary site.
To perform a failback, do the following procedure at the primary site.
Resolve the error conditions and then restart the failover procedure.
The entities of AHV/ESXi clusters recover at a different path on the ESXi clusters if their files conflict with the existing files on the recovery ESXi cluster. For example, there is a file name conflict if a VM (VM1) migrates to a recovery cluster that already has a VM (VM1) in the same container.
However, the entities recover at a different path with VmRecoveredAtAlternatePath alert only if the following conditions are met.
If these conditions are not satisfied, the failover operation fails.
After you trigger a failover operation, you can monitor failover-related tasks. To monitor a failover, perform the following procedure at the recovery site. If you have two recovery sites for DR, perform the procedure at the site where you trigger the failover.
The self-service restore (also known as file-level restore) feature allows you to do a self-service data recovery from the Nutanix data protection recovery points with minimal intervention. You can perform self-service data recovery on both on-prem and Xi Cloud Services.
You must deploy NGT 2.0 or newer on guest VMs to enable self-service restore from Prism Central. For more information about enabling and mounting NGT, see Enabling and Mounting Nutanix Guest Tools in the Prism Web Console Guide . When you enable self-service restore and attach a disk by logging into the VM, you can recover files within the guest OS. If you fail to detach the disk from the VM, the disk is detached automatically from the VM after 24 hours.
The requirements of self-service restore of Windows and Linux VMs are as follows.
The following are the general requirements of self-service restore. Ensure that you meet the requirements before configuring self-service restore for guest VMs.
AOS Ultimate. For more information about the features available with AOS Starter license, see Software Options.
Two AHV or ESXi clusters, each registered to the same or different Prism Centrals.
The on-prem clusters must be running the version of AHV that comes bundled with the supported version of AOS.
The on-prem clusters must be running on version ESXi 6.5 GA or newer.
Prism Centrals and their registered on-prem clusters (Prism Elements) must be running the following versions of AOS.
The following are the specific requirements of self-service restore for guest VMs running Windows OS. Ensure that you meet the requirements before proceeding.
The following are the specific requirements of self-service restore for guest VMs running Linux OS. Ensure that you meet the requirements before proceeding.
The limitations of self-service restore of Windows and Linux VMs are as follows.
The following are the general limitations of self-service restore.
The following are the specific limitations of self-service restore for guest VMs running Windows OS.
Whenever the snapshot disk has an inconsistent filesystem (as indicated by the fsck check), the disk is only attached and not mounted.
After enabling NGT for a guest VM, you can enable the self-service restore for that guest VM. Also, you can enable the self-service restore for a guest VM while you are installing NGT on that guest VM.
For more information, see Enabling and Mounting Nutanix Guest Tools in the Prism Web Console Guide .
Ensure that you have installed and enabled NGT 2.0 or newer on the guest VM.
To enable self-service restore, perform the following procedure.
You can restore the desired files from the VM through the web interface or by using the ngtcli utility of self-service restore.
After you install NGT in the Windows guest VM, you can restore the desired files from the VM through the web interface.
To restore a file in Windows guest VMs by using web interface, perform the following.
After you install NGT in the Windows guest VM, you can restore the desired files from the VM through the ngtcli utility.
To restore a file in Windows guest VMs by using ngtcli, perform the following.
> cd c:\Program Files\Nutanix\ngtcli
> python ngtcli.py
creates a terminal
with auto-complete.
ngtcli> ssr ls-snaps
The snapshot ID, disk labels, logical drives, and create time of the snapshot is displayed. You can use this information and take a decision to restore the files from the relevant snapshot that has the data.
ngtcli> ssr ls-snaps snapshot-count=count_value
Replace count_value with the number that you want to list.
ngtcli> ssr attach-disk disk-label=disk_label snapshot-id=snap_id
Replace disk_label with the name of the disk that you want to attach.
Replace snap_id with the snapshot ID of the disk that you want to attach.
For example, to attach a disk with snapshot ID 16353 and disk label scsi0:1, type the folllowing command.
ngtcli> ssr attach-disk snapshot-id=16353 disk-label=scsi0:1
ngtcli> ssr detach-disk attached-disk-label=attached_disk_label
Replace attached_disk_label with the name of the disk that you want to attach.
ngtcli> ssr list-attached-disks
The Linux guest VM user with sudo privileges can restore the desired files from the VM through the web interface or by using the ngtcli utility.
After you install NGT in the Linux guest VM, you can restore the desired files from the VM through the web interface.
To restore a file in Linux guest VMs by using web interface, perform the following.
The selected disk or disks are mounted and the relevant disk label is displayed.
After you install NGT in the Linux guest VM, you can restore the desired files from the VM through the ngtcli utility.
To restore a file in Linux guest VMs by using ngtcli, perform the following.
> cd /usr/local/nutanix/ngt/ngtcli
ngtcli> ssr ls-snaps
The snapshot ID, disk labels, logical drives, and create time of the snapshot is displayed. You can use this information and take a decision to restore the files from the relevant snapshot that has the data.
ngtcli> ssr ls-snaps snapshot-count=count_value
Replace count_value with the number that you want to list.
ngtcli> ssr attach-disk disk-label=disk_label snapshot-id=snap_id
Replace disk_label with the name of the disk that you want to attach.
Replace snap_id with the snapshot ID of the disk that you want to attach.
For example, to attach a disk with snapshot ID 1343 and disk label scsi0:2,
ngtcli> ssr attach-disk snapshot-id=1343 disk-label=scsi0:2
After successfully running the command, a new disk with new label is attached to the guest VM.
ngtcli> ssr detach-disk attached-disk-label=attached_disk_label
Replace attached_disk_label with the name of the disk that you want to attach.
For example, to remove the disk with disk label scsi0:3, type the following command.
ngtcli> ssr detach-disk attached-disk-label=scsi0:3
ngtcli> ssr list-attached-disks
NearSync replication enables you to protect your guest VMs with an RPO of as low as 1 minute. A protection policy with a NearSync replication creates a recovery point in a minutely time interval (between 1–15 minutes), and replicates it to the recovery availability zones (sites) for High Availability. For guest VMs protected with NearSync replication schedule, you can perform disaster recovery (DR) to a different Nutanix cluster at same or different sites. In addition to DR to Nutanix clusters of the same hypervisor type, you can also perform cross-hypervisor disaster recovery (CHDR)—disaster recovery from AHV clusters to ESXi clusters, or from ESXi clusters to AHV clusters.
The following are the advantages of protecting your guest VMs with a NearSync replication schedule.
Stun time is the time of application freeze when the recovery point is taken.
To implement the NearSync feature, Nutanix has introduced a technology called lightweight snapshots (LWSs). LWS recovery points are created at the metadata level only, and they continuously replicate incoming data generated by workloads running on the active clusters. LWS recovery points are stored in the LWS store, which is allocated on the SSD tier. When you configure a protection policy with a NearSync replication schedule, the system allocates the LWS store automatically.
When you create a NearSync replication schedule, the schedule remains an hourly schedule until its transition into a minutely schedule is complete.
To transition into NearSync (minutely) replication schedule, initial seeding of the recovery site with the data is performed, the recovery points are taken on an hourly basis, and replicated to the recovery site. After the system determines that the recovery points containing the seeding data have replicated within a specified amount of time (default is an hour), the system automatically transitions the replication schedule into NearSync schedule depending on the bandwidth and the change rate. After you transition into the NearSync replication schedule, you can see the configured minutely recovery points in the web interface.
The following are the characteristics of the process.
To transition out of the NearSync replication schedule, you can do one of the following.
Repeated transitioning in and out of NearSync replication schedule can occur because of the following reasons.
Depending on the RPO (1–15 minutes), the system retains the recovery points for a specific time period. For a NearSync replication schedule, you can configure the retention policy for days, weeks, or months on both the primary and recovery sites instead of defining the number of recovery points you want to retain. For example, if you desire an RPO of 1 minute and want to retain the recovery points for 5 days, the retention policy works in the following way.
You can also define recovery point retention in weeks or months. For example, if you configure a 3-month schedule, the retention policy works in the following way.
The following are the specific requirements for protecting your guest VMs with NearSync replication schedule. Ensure that you meet the following requirements in addition to the general requirements of Leap.
For more information about the general requirements of Leap, see Leap Requirements.
For information about node, disk and Foundation configurations required to support NearSync replication schedules, see On-Prem Hardware Resource Requirements.
AHV or ESXi
Each on-prem site must have a Leap enabled Prism Central instance.
The primary and recovery Prism Centrals and their registered Nutanix clusters must be running the following versions of AOS.
Guest VMs protected with NearSync replication schedule support cross-hypervisor disaster recovery. You can perform failover (DR) to recover guest VMs from AHV clusters to ESXi clusters or guest VMs from ESXi clusters to AHV clusters by considering the following requirements.
NGT configures the guest VMs with all the required drivers for VM portability. For more information about general NGT requirements, see Nutanix Guest Tools Requirements and Limitations in Prism Web Console Guide .
For information about operating systems that support UEFI and Secure Boot, see UEFI and Secure Boot Support for CHDR.
Operating System | Version | Requirements and limitations |
---|---|---|
Windows |
|
|
Linux |
|
|
Consider the following specific limitations before protecting your guest VMs with NearSync replication schedule. These limitations are in addition to the general limitations of Leap.
For information about the general limitations of Leap, see Leap Limitations.
Cross hypervisor disaster recovery (CHDR) does not preserve hypervisor-specific properties (for example, multi-writer flags, independent persistent and non-persistent disks, changed block tracking (CBT), PVSCSI disk configurations).
For example, if you have 1 day retention at the primary site and 5 days retention at the recovery site, and you want to go back to a recovery point from 5 days ago. NearSync replication schedule does not support replicating 5 days retention back from the recovery site to the primary site.
To protect the guest VMs in a minutely replication schedule, configure a NearSync replication schedule while creating the protection policy. The policy takes recovery points of the protected guest VMs in the specified time intervals (1–15 minutes) and replicates them to the recovery availability zone (site) for High Availability. To maintain the efficiency of minutely replication, the protection policy allows you to configure a NearSync replication schedule to only one recovery site. When creating a protection policy, you can specify only VM categories. If you want to include VMs individually, you must first create the protection p policy—which can also include VM categories and then include the VMs individually in the protection policy from the VMs page.
Ensure that the primary and the recovery AHV or ESXi clusters at the same or different sites are NearSync capable. A cluster is NearSync capable if the capacity of each SSD in the cluster is at least 1.2 TB.
See NearSync Replication Requirements (Leap) and NearSync Replication Limitations (Leap) before you start.
To create a protection policy with a NearSync replication schedule, do the following at the primary site. You can also create a protection policy at the recovery site. Protection policies you create or update at a recovery site synchronize back to the primary site.
The drop-down lists all the sites paired with the local site. Local AZ represents the local site (Prism Central). For your primary site, you can check either the local site or a non-local site.
The drop-down lists all the Nutanix clusters registered to Prism Central representing the selected site. If you want to protect the guest VMs from multiple Nutanix clusters in the same protection policy, select the clusters that host those guest VMs. All Clusters protects the guest VMs of all Nutanix clusters registered to Prism Central.
Clicking Save activates the Recovery Location pane. After saving the primary site configuration, you can optionally add a local schedule to retain the recovery points at the primary site.
Specify the following information in the Add Schedule window.
When you enter the frequency in minutes, the system selects the Roll-up retention type by default because minutely recovery points do not support Linear retention types.
For more information about the roll-up recovery points, see step d.iii.
Irrespective of the local or replication schedules, the recovery points are of the specified type. If you check Take App-Consistent Recovery Point , the recovery points generated are application-consistent and if you do not check Take App-Consistent Recovery Point , the recovery points generated are crash-consistent. If the time in the local schedule and the replication schedule match, the single recovery point generated is application-consistent.
The drop-down lists all the sites paired with the local site. Local AZ represents the local site (Prism Central). Select Local AZ if you want to configure DR to a different Nutanix cluster at the same site.
If you do not select a site, local recovery points that are created by the protection policy do not replicate automatically. You can, however, replicate the recovery points manually and use recovery plans to recover the guest VMs. For more information, see Manual Disaster Recovery (Leap).
The drop-down lists all the Nutanix clusters registered to Prism Central representing the selected site. You can select one cluster at the recovery site. To maintain the efficiency of minutely replication, a protection policy allows you to configure only one recovery site for a NearSync replication schedule. However, you can add another Asynchronous replication schedule for replicating recovery points to the same or different sites. For more information to add another recovery site with a replication schedule, see step e.
Clicking Save activates the + Add Schedule button between the primary and the recovery site. After saving the recovery site configuration, you can optionally add a local schedule to retain the recovery points at the recovery site.
Specify the following information in the Add Schedule window.
When you enter the frequency in minutes, the system selects the Roll-up retention type by default because minutely recovery points do not support Linear retention types.
For more information about the roll-up recovery points, see step d.iii.
Irrespective of the local or replication schedules, the recovery points are of the specified type. If you check Take App-Consistent Recovery Point , the recovery points generated are application-consistent and if you do not check Take App-Consistent Recovery Point , the recovery points generated are crash-consistent. If the time in the local schedule and the replication schedule match, the single recovery point generated is application-consistent.
Specify the following information in the Add Schedule window. The window auto-populates the Primary Location and Recovery Location that you have selected in step b and step c.
The specified frequency is the RPO. For more information about RPO, see Leap Terminology.
This field is unavailable if you do not specify a recovery location.
Reverse retention maintains the retention numbers of recovery points even after failover to a recovery site in the same or different availability zones. For example, if you retain two recovery points at the primary site and three recovery points at the recovery site, and you enable reverse retention, a failover event does not change the initial retention numbers when the recovery points replicate back to the primary site. The recovery site still retains two recovery points while the primary site retains three recovery points. If you do not enable reverse retention, a failover event changes the initial retention numbers when the recovery points replicate back to the primary site. The recovery site retains three recovery points while the primary site retains two recovery points.
Maintaining the same retention numbers at a recovery site is required if you want to retain a particular number of recovery points, irrespective of where the guest VM is after its failover.
Application-consistent recovery points ensure that application consistency is maintained in the replicated recovery points. For application-consistent recovery points, install NGT on the guest VMs running on AHV clusters. For guest VMs running on ESXi clusters, you can take application-consistent recovery points without installing NGT, but the recovery points are hypervisor-based, and leads to VM stuns (temporary unresponsive VMs) after failover to the recovery sites.
The Add Schedule window shows that auto-populates the Primary Location and the additional Recovery Location . Perform step d again to add the replication schedule.
By default, recovery point creation begins immediately after you create the protection policy. If you want to specify when recovery point creation must begin, click Immediately at the top-right corner, and then, in the Start Time dialog box, do the following.
For example, the guest VM VM_SherlockH is in the category Department:Admin , and you add this category to the protection policy named PP_AdminVMs . Now, if you add VM_SherlockH from the VMs page to another protection policy named PP_VMs_UK , VM_SherlockH is protected in PP_VMs_UK and unprotected from PP_AdminVMs .
If you do not want to protect the guest VMs category wise, proceed to the next step without checking VM categories. You can add the guest VMs individually to the protection policy later from the VMs page (see Adding Guest VMs individually to a Protection Policy).
To orchestrate the failover of the protected guest VMs to the recovery site, create a recovery plan. After a failover, a recovery plan recovers the protected guest VMs to the recovery site. If you have configured two recovery sites in a protection policy, create two recovery plans for DR—one for recovery to each recovery site. The recovery plan synchronizes continuously to the recovery site in a bidirectional way.
For more information about creating a recovery plan, see Creating a Recovery Plan (Leap).
You can perform test failover, planned failover, and unplanned failover of the guest VMs protected with NearSync replication schedule across different Nutanix clusters at the same or different on-prem availability zone (site). The steps to perform test, planned, and unplanned failover are largely the same irrespective of the replication schedules that protect the guest VMs.
Refer Failover and Failback Management for test, planned, and unplanned failover procedures.
Synchronous replication enables you to protect your guest VMs with a zero recovery point objective (0 RPO). A protection policy with Synchronous replication schedule replicates all the writes on the protected guest VMs synchronously to the recovery availability zone (sites) for High Availability. The policy also takes recovery points of those protected VMs every 6 hours—the first snapshot is taken immediately—for raw node (HDD+SSD) size up to 120 TB. Since the replication is synchronous, the recovery points are crash-consistent only. For guest VMs (AHV) protected with Synchronous replication schedule, you can perform DR only to an AHV cluster at the same or different site. Replicating writes synchronously and also generating recovery points helps to eliminate data losses due to:
Nutanix recommends that the round-trip latency (RTT) between AHV clusters be less than 5 ms for optimal performance of Synchronous replication schedules. Maintain adequate bandwidth to accommodate peak writes and have a redundant physical network between the clusters.
To perform the replications synchronously yet efficiently, the protection policy limits you to configure only one recovery site if you add a Synchronous replication schedule. If you configure Synchronous replication schedule for a guest VM, you cannot add an Asynchronous or NearSync schedule to the same guest VM. Similarly, if you configure an Asynchronous or a NearSync replication schedule, you cannot add a Synchronous schedule to the same guest VM.
If you unpair the sites while the guest VMs in the Nutanix clusters are still in synchronization, the Nutanix cluster becomes unstable. Therefore, disable Synchronous replication and clear stale stretch parameters if any on both the primary and recovery Prism Element before unpairing the sites. For more information about disabling Synchronous replication, see Synchronous Replication Management.
The following are the specific requirements for protecting your AHV guest VMs with Synchronous replication schedule. Ensure that you meet the following requirements in addition to the general requirements of Leap.
For information about the general requirements of Leap, see Leap Requirements.
For information about node, disk and Foundation configurations required to support Synchronous replication schedules, see On-Prem Hardware Resource Requirements.
AHV
The AHV clusters must be running on version 20190916.189 or newer.
The primary and recovery Nutanix Clusters can be registered with a single Prism Central instance or each can be registered with different Prism Central instances.
For hardware and Foundation configurations required to support Synchronous replication schedules, see On-Prem Hardware Resource Requirements.
nutanix@cvm$ allssh 'modify_firewall -f -r remote_cvm_ip,remote_virtual_ip -p 2030,2036,2073,2090 -i eth0'
Replace remote_cvm_ip with the IP address of the recovery cluster CVM. If there are multiple CVMs, replace remote_cvm_ip with the IP addresses of the CVMs separated by comma.
Replace remote_virtual_ip with the virtual IP address of the recovery cluster.
nutanix@cvm$ allssh 'modify_firewall -f -r source_cvm_ip,source_virtual_ip -p 2030,2036,2073,2090 -i eth0'
Replace source_cvm_ip with the IP address of the primary cluster CVM. If there are multiple CVMs, replace source_cvm_ip with the IP addresses of the CVMs separated by comma.
Replace source_virtual_ip with the virtual IP address of the primary cluster.
Consider the following specific limitations before protecting your guest VMs with Synchronous replication schedule. These limitations are in addition to the general limitations of Leap.
For information about the general limitations of Leap, see Leap Limitations.
To protect the guest VMs in an instant replication schedule, configure a Synchronous replication schedule while creating the protection policy. The policy replicates all the writes on the protected guest VMs synchronously to the recovery availability zone (site) for High Availability. For a raw node (HDD+SSD) size up to 120 TB, the policy also takes crash-consistent recovery points of those guest VMs every 6 hours and replicates them to the recovery site—the first snapshot is taken immediately. To maintain the efficiency of synchronous replication, the protection policy allows you to add only one recovery site for the protected VMs. When creating a protection policy, you can specify only VM categories. If you want to protect guest VMs individually, you must first create the protection policy—which can also include VM categories, and then include the guest VMs individually in the protection policy from the VMs page.
To create a protection policy with the Synchronous replication schedule, do the following at the primary site. You can also create a protection policy at the recovery site. Protection policies you create or update at a recovery site synchronize back to the primary site.
The drop-down lists all the sites paired with the local site. Local AZ represents the local site (Prism Central). For your primary site, you can check either the local site or a non-local site.
The drop-down lists all the Nutanix clusters registered to Prism Central representing the selected site. If you want to protect the guest VMs from multiple AHV clusters in the same protection policy, select the AHV clusters that host those guest VMs. All Clusters protects the guest VMs of all Nutanix clusters registered to Prism Central. Select All Clusters only if all the clusters are running AHV.
Clicking Save activates the Recovery Location pane. Do not add a local schedule to retain the recovery points locally. To maintain the replication efficiency, Synchronous replication allows only the replication schedule. If you add a local schedule, you cannot click Synchronous in step d.
The drop-down lists all the sites paired with the local site. Local AZ represents the local site (Prism Central). Select Local AZ if you want to configure DR to a different AHV cluster at the same site.
If you do not select a site, local recovery points that are created by the protection policy do not replicate automatically. You can, however, replicate the recovery points manually and use recovery plans to recover the guest VMs. For more information, see Manual Disaster Recovery (Leap).
The drop-down lists all the Nutanix clusters registered to Prism Central representing the selected site. You can select one AHV cluster at the recovery site. Do not select an ESXi cluster because DR configurations using Leap support only AHV cluster. If you select an ESXi cluster and configure a Synchronous replication schedule, replications fail.
Clicking Save activates the + Add Schedule button between the primary and the recovery site. Do not add a local schedule to retain the recovery points locally. To maintain the replication efficiency, Synchronous replication allows only the replication schedule. If you add a local schedule, you cannot click Synchronous in step d.
Specify the following information in the Add Schedule window. The window auto-populates the Primary Location and Recovery Location that you have selected in step b and step c.
Clicking Save Schedule disables the + Add Recovery Location button at the top-right because to maintain the efficiency of synchronous replication, the policy allows you to add only one recovery site.
By default, recovery point creation begins immediately after you create the protection policy. If you want to specify when recovery point creation must begin, click Immediately at the top-right corner, and then, in the Start Time dialog box, do the following.
For example, the guest VM VM_SherlockH is in the category Department:Admin , and you add this category to the protection policy named PP_AdminVMs . Now, if you add VM_SherlockH from the VMs page to another protection policy named PP_VMs_UK , VM_SherlockH is protected in PP_VMs_UK and unprotected from PP_AdminVMs .
If you do not want to protect the guest VMs category wise, proceed to the next step without checking VM categories. You can add the guest VMs individually to the protection policy later from the VMs page (see Adding Guest VMs individually to a Protection Policy).
To orchestrate the failover of the protected guest VMs to the recovery site, create a recovery plan. After a failover, a recovery plan recovers the protected guest VMs to the recovery site. If you have configured two recovery sites in a protection policy, create two recovery plans for DR—one for recovery to each recovery site. The recovery plan synchronizes continuously to the recovery site in a bidirectional way.
For more information about creating a recovery plan, see Creating a Recovery Plan (Leap).
Synchronous replication instantly replicates all writes on the protected guest VMs to the recovery cluster. Replication starts when you configure a protection policy and add the guest VMs to protect. You can manage the replication by enabling, disabling, pausing, or resuming the Synchronous replication on the protected guest VMs from the Prism Central.
When you configure a protection policy with Synchronous replication schedule and add guest VMs to protect, the replication is enabled by default. However, if you have disabled the Synchronous replication on a guest VM, you have to enable it to start replication.
To enable Synchronous replication on a guest VM, perform the following procedure at the primary availability zone (site). You can also perform the following procedure at the recovery site. The operations you perform at a recovery site synchronize back to the primary site.
The protected guest VMs on the primary cluster stop responding when the recovery cluster is disconnected abruptly (for example, due to network outage or internal service crash). To come out of the unresponsive state, you can pause Synchronous replication on the guest VMs. Pausing Synchronous replication temporarily suspends the replication state of the guest VMs without completely disabling the replication relationship.
To pause Synchronous replication on a guest VM, perform the following procedure.
You can resume the Synchronous replication that you had paused to come out of the unresponsive state of the primary cluster. Resuming Synchronous replication restores the replication status and reconciles the state of the guest VMs. To resume Synchronous replication on a guest VM, perform the following procedure.
You can perform test failover, planned failover, and unplanned failover of the guest VMs protected with Synchronous replication schedule across the AHV clusters at the different on-prem availability zone (site). The steps to perform test, planned, and unplanned failover are largely the same irrespective of the replication schedules that protects the guest VMs. Additionally, a planned failover of the guest VMs protected with Synchronous replication schedule also allows for live migration of the protected guest VMs.
Refer Failover and Failback Management for test, planned, and unplanned failover procedures.
Planned failover of the guest VMs protected with Synchronous replication schedule supports live migration to another AHV cluster. Live migration offers zero downtime for your applications during a planned failover event to the recovery cluster (for example, during scheduled maintenance).
The following are the specific requirements to successfully migrate your guest VMs with Live Migration.
Ensure that you meet the following requirements in addition to the requirements of Synchronous replication schedule (Synchronous Replication Requirements) and general requirements of Leap (Leap Requirements).
Network stretch spans your network across different sites. A stretched L2 network retains the IP addresses of guest VMs after their Live Migration to the recovery site.
The primary and recovery Nutanix clusters must have identical CPU feature set. If the CPU feature sets (set of CPU flags) are unidentical, Live Migration fails.
Consider the following limitation in addition to the limitations of Synchronous replication schedule (Synchronous Replication Limitations) and general limitations of Leap (Leap Limitations) before performing live migration of your guest VMs.
If due to a planned event (for example, scheduled maintenance of guest VMs) at the primary availability zone (site), you want to migrate your applications to another AHV cluster without downtime, perform a planned failover with Live Migration to the recovery site.
To live migrate the guest VMs, do the following procedure at the recovery site.
Resolve the error conditions and then restart the failover procedure.
To use disaster recovery (DR) features that support only single Prism Central (AZ) managed deployments, you can convert your multi-AZ deployment to single-AZ deployment. For example, in two AZ deployments where each Prism Central (Prism Central A, Prism Central B) instance hosts one Prism Element cluster (Prism Element A, Prism Element B) , you can perform the following procedure to convert to a single-AZ deployment (Prism Central A managing both Prism Element A, Prism Element B) .
Perform this procedure to convert deployments protected in Asynchronous and NearSync replications schedules also. The conversion procedure for deployments protected in Asynchronous and NearSync replications schedules are identical except that the protection status (step 2 in the described procedure) of Asynchronous and NearSync replications schedules is available only in Focus > Data Protection .
nutanix@cvm$ stretch_params_printer
Empty
response
indicates that all stretch states are
deleted.
pcvm$ mcli
mcli> mcli dr_coordinator.list
Empty
response
indicates that all stretch states are
deleted.
pcvm$ mcli
mcli> mcli dr_coordinator.list
Empty
response
indicates that all stretch states are
deleted.
A protection policy automates the creation and replication of recovery points. When creating a protection policy, you specify replication schedules, retention policies for the recovery points, and the guest VMs you want to protect. You also specify a recovery availability zone (maximum 2) if you want to automate recovery point replication to the recovery availability zones (sites).
When you create, update, or delete a protection policy, it synchronizes to the recovery sites and works bidirectionally. The recovery points generated at the recovery sites replicate back to the primary site when the primary site starts functioning. For information about how Leap determines the list of sites for synchronization, see Entity Synchronization Between Paired Availability Zones.
You can also protect guest VMs individually in a protection policy from the VMs page, without the use of a VM category. To protect guest VMs individually in a protection policy, perform the following procedure.
You can remove guest VMs individually from a protection policy from the VMs page. To remove guest VMs individually from a protection policy, perform the following procedure.
If the requirements of the protection policy that you want to create are similar to an existing protection policy, you can clone the existing protection policy and update the clone. To clone a protection policy, perform the following procedure.
You can modify an existing protection policy in Prism Central. To update an existing protection policy, perform the following procedure.
You can use the data protection focus on the VMs page to determine the protection policies to which a guest VM belongs. To determine the protection policy, perform the following procedure.
A recovery plan orchestrates the recovery of protected VMs at the recovery site. Recovery plans are predefined procedures (runbooks) that use stages to enforce VM power-on sequence. You can also specify the inter-stage delays to recover applications.
When you create, update, or delete a recovery plan, it synchronizes to the recovery sites and works bidirectionally. For information about how Leap determines the list of sites for synchronization, see Entity Synchronization Between Paired Availability Zones. After a failover from the primary site to a recovery site, you can failback to the primary site by using the same recovery plan.
Recovery plans are independent of protection policies and do not reference protection policies in their configuration information. Also, they do not create recovery points. While the process of planned failover includes the creation of a recovery point so that the latest data can be used for recovery, unplanned and test failovers rely on the availability of the required recovery points at the recovery site. A recovery plan therefore requires the guest VMs in the recovery plan to also be associated with a protection policy.
You can also add guest VMs individually to a recovery plan from the VMs page, without the use of a VM category. To add VMs individually to a recovery plan, perform the following procedure.
You can also remove guest VMs individually from a recovery plan. To remove guest VMs individually from a recovery plan, perform the following procedure.
You can update an existing recovery plan. To update a recovery plan, perform the following procedure.
You can validate a recovery plan from the recovery site. Recovery plan validation does not perform a failover like the test failover does, but reports warnings and errors. To validate a recovery plan, perform the following procedure.
Manual data protection involves manually creating recovery points, manually replicating recovery points, and manually recovering the VMs at the recovery site. You can also automate some of these tasks. For example, the last step—that of manually recovering VMs at the recovery site—can be performed by a recovery plan while the underlying recovery point creation and replication can be performed by protection policies. Conversely, you can configure protection policies to automate recovery point creation and replication and recover VMs at the recovery site manually.
To create recovery points manually, do the following.
You can manually replicate recovery points only from the availability zone (site) where the recovery points exist.
To replicate recovery points manually, do the following.
You can recover a guest VM by cloning the guest VM from a recovery point.
To recover a guest VM from a recovery point, do the following.
When paired with each other, availability zones (sites) synchronize disaster recovery (DR) configuration entities. Paired sites synchronize the following DR configuration entities.
If you specify VM categories in a recovery plan, Leap determines which protection policies use those VM categories, and then synchronizes the recovery plans to the availability zones specified in those protection plans.
If you include guest VMs individually (without VM categories) in a recovery plan, Leap uses the recovery points of those guest VMs to determine which protection policies created those recovery points, and then synchronizes the recovery plans to the availability zones (sites) specified in those protection policies. If you create a recovery plan for VM categories or guest VMs that are not associated with a protection policy, Leap cannot determine the availability zone list and therefore cannot synchronize the recovery plan. If a recovery plan includes only individually added guest VMs and a protection policy associated with a guest VM has not yet created guest VM recovery points, Leap cannot synchronize the recovery plan to the availability zone specified in that protection policy. However, recovery plans are monitored every 15 minutes for the availability of recovery points that can help derive availability zone information. When recovery points become available, the paired on-prem site derives the availability zone by the process described earlier and synchronizes the recovery plan to the availability zone.
If you do not update entities before a connectivity issue is resolved or before you pair the availability zones again, the synchronization behavior described earlier resumes. Also, pairing previously unpaired availability zones trigger an automatic synchronization event. For recommendations to avoid facing such issues, see Entity Synchronization Recommendations (Leap).
Consider the following recommendations to avoid inconsistencies and the resulting synchronization issues.
You can safely create entities at either or both the sites as long as you do not assign the same name to entities at the two sites. After the connectivity issue is resolved, force synchronization from the site where you created entities.
Entity synchronization, when forced from an availability zone (site), overwrites the corresponding entities in paired sites. Forced synchronization also creates, updates, and removes those entities from paired sites.
The availability zone (site) to which a particular entity is forcefully synchronized depends on which site requires the entity (see Entity Synchronization Between Paired Availability Zones). To avoid inadvertently overwriting required entities, ensure to force VM synchronization from the site in which the entities have the desired configuration.
If a site is paired with two or more availability zones (sites), you cannot select one or more sites with which to synchronize entities.
To force entity synchronization, do the following.
Xi Leap is essentially an extension of Leap to Xi Cloud Services. Xi Leap protects your guest VMs and orchestrates their disaster recovery (DR) to Xi Cloud Services when events causing service disruption occur at the primary availability zone (site). For protection of your guest VMs, protection policies with Asynchronous and NearSync replication schedules generate and replicate recovery points to Xi Cloud Services. Recovery plans orchestrate DR from the replicated recovery points to Xi Cloud Services.
Protection policies create a recovery point—and set its expiry time—in every iteration of the specified time period (RPO). For example, the policy creates a recovery point every 1 hour for an RPO schedule of 1 hour. The recovery point expires at its designated expiry time based on the retention policy—see step 3 in Creating a Protection Policy with Asynchronous Replication Schedule (Xi Leap). If there is a prolonged outage at a site, the Nutanix cluster retains the last recovery point to ensure you do not lose all the recovery points. For NearSync replication (lightweight snapshot), the Nutanix cluster retains the last full hourly snapshot. During the outage, the Nutanix cluster does not clean up the recovery points due to expiry. When the Nutanix cluster comes online, it cleans up the recovery points that are past expiry immediately.
If a guest VM is removed from a protection policy, Delete all the recovery points associated with the guest VM. If the recovery points are not deleted explicitly, the recovery points adhere to the expiration period set in the protection policy and will continue to incur charges until the expiry. To stop the charges immediately, log on to Xi Cloud Services and delete all of these explicitly.
For High Availability of a guest VM, Leap can enable replication of recovery points to one or more sites. A protection policy can replicate recovery points to maximum two sites. One of the two sites can be in cloud (Xi Cloud Services). For replication to Xi Cloud Services, you must add a replication schedule between the on-prem site and Xi Cloud Services. You can set up the on-prem site and Xi Cloud Services in the following arrangements.
The replication schedule between an on-prem site and Xi Cloud Services enables DR to Xi Cloud Services. To enable performing DR to Xi Cloud Services, you must create a recovery plan. In addition to performing DR from AHV clusters to Xi Cloud Services (only AHV), you can also perform cross-hypervisor disaster recovery (CHDR)—DR from ESXi clusters to Xi Cloud Services.
The protection policies and recovery plans you create or update synchronize continuously between the on-prem site and Xi Cloud Services. The reverse synchronization enables you to create or update entities (protection policies, recovery plans, and guest VMs) at either the primary or the recovery site.
This section describes protection of your guest VMs and DR from Xi Cloud Services to a Nutanix cluster at the on-prem site. In Xi Cloud Services, you can protect your guest VMs and DR to a Nutanix cluster at only one on-prem site. For information about protection of your guest VMs and DR to Xi Cloud Services, see Protection and DR between On-Prem Sites (Leap).
The following are the general requirements of Xi Leap. Along with the general requirements, there are specific requirements for protection with the following supported replication schedules.
The AOS license required depends on the features that you want to use. For information about the features that are available with an AOS license, see Software Options.
The underlying hypervisors required differ in all the supported replication schedules. For more information about underlying hypervisor requirements for the supported replication schedules, see:
Nutanix supports replications between the all the latest supported LTS and STS released AOS versions. To check the list of the latest supported AOS versions, see KB-5505. To determine if the AOS versions currently running on your clusters are EOL, see the EOL document .
Upgrade the AOS version to the next available supported LTS/STS release. To determine if an upgrade path is supported, check the Upgrade Paths page before you upgrade the AOS.
For example, the clusters are running AOS versions 5.5.x and 5.10.x respectively. Upgrade the cluster on 5.5.x to 5.10.x. After both the clusters are on 5.10.x, proceed to upgrade each cluster to 5.15.x (supported LTS). Once both clusters are on 5.15.x you can upgrade the clusters to 5.20.x or newer.
Nutanix recommends that both the primary and the replication clusters or sites run the same AOS version.
You must have one of the following roles in Xi Cloud Services.
To allow two-way replication between an on-prem Nutanix cluster and and Xi Cloud Services, you must enable certain ports in your external firewall. To know about the required ports, see Disaster Recovery - Leap in Port Reference.
For information about installing NGT, see Nutanix Guest Tools in Prism Web Console Guide .
The empty CD-ROM is required for mounting NGT at the recovery site.
NM_CONTROLLED
field to
yes
. After setting the field, restart the network service on
the VM.
For information about installing NGT, see Nutanix Guest Tools in Prism Web Console Guide .
The empty CD-ROM is required for mounting NGT at the recovery site.
m
networks and the other cluster having
n
networks, ensure that the recovery cluster has
m + n
networks. Such
a design ensures that all recovered VMs attach to a network.
If protected guest VMs and Prism Central VM are on the same network, the Prism Central VM becomes inaccessible when the route to the network is removed after failover.
For more information about the scaled-out deployments of a Prism Central, see Leap Terminology.
Nutanix VM mobility drivers are required for accessing the guest VMs after failover. Without Nutanix VM mobility drivers, the guest VMs become inaccessible after a failover.
Consider the following general limitations before configuring protection and disaster recovery (DR) with Xi Leap. Along with the general limitations, there are specific limitations of protection with the following supported replication schedules.
When you perform DR of vGPU console-enabled guest VMs, the VMs recover with the default VGA console (without any alert) instead of vGPU console. The guest VMs fail to recover when you perform cross-hypervisor disaster recovery (CHDR).
However, you can manually restore guest VMs with vGPU.
You can configure NICs for a guest VM associated with either production or test VPC.
You cannot protect volume groups.
You cannot apply network segmentation for management traffic (any traffic not on the backplane network) in Xi Leap.
You cannot perform self-service restore.
Although there is no limit to the number of VLANs that you can create, only the first 500 VLANs list in drop-down of Network Settings while creating a recovery plan. For more information about VLANs in the recovery plan, see Nutanix Virtual Networks.
For the maximum number of entities you can configure with different replication schedules and perform failover (disaster recovery), see Nutanix Configuration Maximums. The limits have been tested for Xi Leap production deployments. Nutanix does not guarantee the system to be able to operate beyond these limits.
Nutanix recommends the following best practices for configuring protection and disaster recovery (DR) with Xi Leap.
You can protect a guest VM either with legacy DR solution (protection domain-based) or with Leap. To protect a legacy DR-protected guest VM with Leap, you must migrate the guest VM from protection domain to a protection policy. During the migration, do not delete the guest VM snapshots in the protection domain. Nutanix recommends keeping the guest VM snapshots in the protection domain until the first recovery point for the guest VM is available on Prism Central. For more information, see Migrating Guest VMs from a Protection Domain to a Protection Policy.
Xi Leap is essentially an extension of Leap to Xi Cloud Services. Xi Leap enables protection of your guest VMs and disaster recovery (DR) to Xi Cloud Services. With reverse synchronization, Xi Leap can protect you guest VMs and enable DR from Xi Cloud Services to a Nutanix cluster at an on-prem availability zone (site). A Nutanix cluster is essentially an AHV or an ESXi cluster running AOS. In addition to performing DR from AHV clusters to Xi Cloud Services (only AHV), you can also perform cross-hypervisor disaster recovery (CHDR)—DR from ESXi clusters to Xi Cloud Services.
You can protect your guest VMs with the following replication schedules.
The disaster recovery views enable you to perform CRUD options on the following types of Leap VMs.
Some views available in the Xi Cloud Services differ from the corresponding view in on-prem Prism Central. For example, the option to connect to an availability zone is on the Availability Zones page in an on-prem Prism Central, but not on the Availability Zones page in Xi Cloud Services. However, the views of both user interfaces are largely the same. This chapter describes the views of Xi Cloud Services.
The Availability Zones view lists all of your paired availability zones.
The following figure is a sample view, and the tables describe the fields and the actions that you can perform in this view.
Field | Description |
---|---|
Name | Name of the availability zone. |
Region | Region to which the availability zone belongs. |
Type | Type of availability zone. Availability zones in Xi Cloud Services are shown as being of type Xi. Availability zones that are backed by on-prem Prism Central instances are shown to be of type physical. The availability zone that you are logged in to is shown as a local availability zone. |
Connectivity Status | Status of connectivity between the local availability zone and the paired availability zone. |
Workflow | Description |
---|---|
Connect to Availability Zone (on-prem Prism Central only) | Connect to an on-prem Prism Central or to a Xi Cloud Services for data replication. |
Action | Description |
---|---|
Disconnect | Disconnect the remote availability zone. When you disconnect an availability zone, the pairing is removed. |
The Protection Policies view lists all of configured protection policies from all availability zones.
The following figure is a sample view, and the tables describe the fields and the actions that you can perform in this view.
Field | Description |
---|---|
Name | Name of the protection policy. |
Primary Location | Replication source site for the protection policy. |
Recovery Location | Replication target site for the protection policy. |
RPO | Recovery point objective for the protection policy. |
Remote Retention | Number of retention points at the remote site. |
Local Retention | Number of retention points at the local site. |
Workflow | Description |
---|---|
Create protection policy | Create a protection policy. |
Action | Description |
---|---|
Update | Update the protection policy. |
Clone | Clone the protection policy. |
Delete | Delete the protection policy. |
The Recovery Plans view lists all of configured recovery plans from all availability zones.
The following figure is a sample view, and the tables describe the fields and the actions that you can perform in this view.
Field | Description |
---|---|
Name | Name of the recovery plan. |
Source | Replication source site for the recovery plan. |
Destination | Replication target site for the recovery plan. |
Entities |
Sum of the following VMs:
|
Last Validation Status | Status of the most recent validation of the recovery plan. |
Last Test Status | Status of the most recent test performed on the recovery plan. |
Workflow | Description |
---|---|
Create Recovery Plan | Create a recovery plan. |
Action | Description |
---|---|
Validate | Validates the recovery plan to ensure that the VMs in the recovery plan have a valid configuration and can be recovered. |
Test | Test the recovery plan. |
Update | Update the recovery plan. |
Failover | Perform a failover. |
Delete | Delete the recovery plan. |
The Xi Cloud Services dashboard includes widgets that display the statuses of configured protection policies and recovery plans. If you have not configured these VMs, the widgets display a summary of the steps required to get started with Leap.
To view these widgets, click the Dashboard tab.
The following figure is a sample view of the dashboard widgets.
To perform disaster recovery (DR) from Xi Cloud Services to a Nutanix cluster at an on-prem availability zone (site), enable Leap at the on-prem site (Prism Central) only. You need not enable Leap in the Xi Cloud Services portal; Xi Cloud Services does that by default for you. Without enabling Leap, you can configure protection policies and recovery plans that synchronize to the on-prem site but you cannot perform failover and failback operations.
To enable Leap at the on-prem site, see Enabling Leap for On-Prem Site.
You can set up a secure environment to enable replication between an on-prem site and Xi Cloud Services with virtual private network (VPN). To configure the required environment, perform the following steps.
To perform disaster recovery (DR) from Xi Cloud Services to a Nutanix cluster at an on-prem availability zone (site), pair the on-prem site (Prism Central) only to Xi Cloud Services. For reverse synchronization, you need not pair again from Xi Cloud Services portal; Xi Cloud Services captures the paring configuration from the on-prem site that pairs Xi Cloud Services.
To pair an on-prem site with Xi Cloud Services, see Pairing Availability Zones (Leap).
Xi Cloud Services enables you to set up a secure VPN connection between your on-prem sites and Xi Cloud Services to enable end-to-end disaster recovery services of Leap. A VPN solution between your on-prem site and Xi Cloud Services enables secure communication between your on-prem Prism Central instance and the production virtual private cloud (VPC) in Xi Cloud Services. If your workload fails over to Xi Cloud Services, the communication between the on-prem resources and failed over resources in Xi Cloud Services takes place over an IPSec tunnel established by the VPN solution.
You can connect multiple on-prem sites to Xi Cloud Services. If you have multiple remote sites, you can set up secure VPN connectivity between each of your remote sites and Xi Cloud Services. With this configuration, you do not need to force the traffic from your remote site through your main site to Xi Cloud Services.
A VPN solution to connect to Xi Cloud Services includes a VPN gateway appliance in the Xi Cloud and a VPN gateway appliance (remote peer VPN appliance) in your on-prem site. A VPN gateway appliance learns about the local routes, establishes an IPSec tunnel with its remote peer, exchanges routes with its peer, and directs network traffic through the VPN tunnel.
After you complete the VPN configuration in the Xi Cloud Services portal, Nutanix creates a virtual VPN gateway appliance in the Xi Cloud. To set up a remote peer VPN gateway appliance in your on-prem site, you can either use the On Prem - Nutanix VPN solution (provided by Nutanix) or use a third-party VPN solution:
On Prem - Nutanix (recommended): If you select this option, Nutanix creates a VPN gateway VM (remote peer appliance) on your on-prem cluster, connects the appliance to your network, and establishes an IPsec tunnel with the VPN gateway that is running in the Xi Cloud.
The Nutanix VPN controller runs as a service in the Xi Cloud and on the on-prem Nutanix cluster and is responsible for the creation, setup, and lifecycle maintenance of the VPN gateway appliance (in the Xi Cloud and on-prem). The VPN controller deploys the virtual VPN gateway appliance in the Xi Cloud after you complete the VPN configuration in the Xi Cloud Services portal. The on-prem VPN controller deploys the virtual VPN gateway appliance on the on-prem cluster in the subnet you specify when you configure a VPN gateway in the Xi Cloud Services portal.
The virtual VPN gateway appliance in the Xi Cloud and VPN gateway VM (peer appliance) in your on-prem cluster each consume 1 physical core, 4 GB RAM, and 10 GB storage.
To set up a secure VPN connection between your on-prem sites and Xi Cloud Services, configure the following entities in the Xi Cloud Services portal:
VPN gateways are of the following types:
You configure a VPN gateway in the Xi Cloud and at each of the on-prem sites you want to connect to the Xi Cloud. You then configure a VPN connection between a VPN gateway in the Xi Cloud and VPN gateway in your on-prem site.
If you want to connect only one on-prem site to the Xi Cloud, configure the following entities in the Xi Cloud Services portal:
If you want to connect multiple on-prem sites to the Xi Cloud, configure the following entities in the Xi Cloud Services portal:
For example, if you want to connect two on-prem sites to the Xi Cloud, configure the following:
One Xi VPN gateway provides 1 Gbps of aggregate bandwidth for IPSec traffic. Therefore, connect only as many on-prem VPN gateways to one Xi VPN gateway to accommodate 1 Gbps of aggregate bandwidth.
If you require an aggregate bandwidth of more than 1 Gbps, configure multiple Xi VPN gateways.
You can use the on-prem - Nutanix VPN solution to set up VPN between your on-prem site and Xi Cloud Services. If you select this option, you are using an end-to-end VPN solution provided by Nutanix and you do not need to use your own VPN solution to connect to Xi Cloud Services.
After you complete the VPN configuration in the Xi Cloud Services portal, Nutanix creates a virtual VPN gateway appliance in the Xi Cloud. The On Prem - Nutanix VPN solution creates a VPN gateway VM (remote peer appliance) on your on-prem cluster, connects the appliance to your network, and establishes an IPsec tunnel with the VPN gateway VM that is running in the Xi Cloud.
Following is the workflow if you choose the On Prem - Nutanix VPN solution to set up a VPN connection between your on-prem site and Xi Cloud Services.
Create a VPN gateway for each on-prem site that you want to connect to the Xi Cloud.
Create a VPN connection between each on-prem site (on-prem VPN gateway) and Xi Cloud (Xi gateway).
In your on-prem site, ensure the following before you configure VPN on Xi Cloud Services:
Configure rules for ports in your on-prem firewall depending on your deployment scenario.
In this scenario, the IPSec tunnel terminates behind a network address translation (NAT) or firewall device. For NAT to work, open UDP ports 500 and 4500 in both directions. Ports 1024–1034 are ephemeral ports used by the CVMs.
Enable the on-prem VPN gateway to allow the traffic according to the rules described in the Port Rules table.
In this scenario, you do not need to open the ports for NAT (500 and 4500), but enable the on-prem VPN gateway to allow the traffic according to the rules described in the Port Rules table.
In the following table, PC subnet refers to the subnet where your on-prem Prism Central is running. The Xi infrastructure load balancer route is where the traffic for Xi CVMs and PC is located. You receive this information when you begin using Xi Cloud Services.
Source address | Destination address | Source port | Destination port |
---|---|---|---|
PC subnet | Load balancer route advertised | Any | 1024–1034 |
Xi infrastructure load balancer route | PC and CVM subnet | Any |
2020 2009 9440 |
The following port requirements are applicable only if you are using the Nutanix VPN solution. | |||
Nutanix VPN VM | 8.8.8.8 and 8.8.4.4 IP addresses of the DNS server | VPN VM | DNS UDP port 53 |
Nutanix VPN VM | time.google.com, 0.pool.ntp.org, 1.pool.ntp.org, 2.pool.ntp.org of the NTP server | VPN VM | NTP UDP port 123 |
Nutanix VPN VM | ICMP ping to NTP servers | NA | NA |
CVM IP address in AHV clusters | HTTPS request to the Internet | AHV hosts | HTTPS port 443 |
CVM IP address in ESXi clusters | HTTPS and FTP requests to the Internet | ESXi hosts | HTTPS port 443 and FTP 21 |
Create a VPN gateway to represent the Xi VPN gateway appliance.
Perform the following to create a Xi VPN gateway.
The Create VPN Gateway window appears.
For more information, see Adding a Static Route in Xi Infrastructure Service Administration Guide .
Create a VPN gateway to represent the on-prem VPN gateway appliance.
Perform the following to create an on-prem VPN gateway.
The Create VPN Gateway window appears.
A route to Xi CVMs is added with the on-prem VPN gateway as the next-hop.
For more information, see Adding a Static Route in Xi Infrastructure Service Administration Guide .
Create a VPN connection to establish a VPN IPSec tunnel between a VPN gateway in the Xi Cloud and VPN gateway in your on-prem site. Select the Xi gateway and on-prem gateway between whom you want to create the VPN connection.
Perform the following to create a VPN connection.
The Create VPN Connection window appears.
You can use your own VPN solution to connect your on-prem site to Xi Cloud Services. If you select this option, you must manually set up a VPN solution by using a supported third-party VPN solution as an on-prem VPN gateway (peer appliance) that can establish an IPsec tunnel with the VPN gateway VM in the Xi Cloud.
Following is the workflow if you want to use a third-party VPN solution to set up a VPN connection between your on-prem site and Xi Cloud Services.
Create a VPN gateway for each on-prem site that you want to connect to the Xi Cloud.
Create a VPN connection to create an IPSec tunnel between each on-prem site (on-prem VPN gateway) and Xi Cloud (Xi gateway).
Xi Cloud Services supports the following third-party VPN gateway solutions.
Ensure the following in your on-prem site before you configure VPN in Xi Cloud Services.
Configure rules for ports in your on-prem firewall depending on your deployment scenario.
In this scenario, the IPSec tunnel terminates behind a network address translation (NAT) or firewall device. For NAT to work, open UDP ports 500 and 4500 in both directions. Ports 1024–1034 are ephemeral ports used by the CVMs.
Enable the on-prem VPN gateway to allow the traffic according to the rules described in the Port Rules table.
In this scenario, you do not need to open the ports for NAT (500 and 4500), but enable the on-prem VPN gateway to allow the traffic according to the rules described in the Port Rules table.
In the following table, PC subnet refers to the subnet where your on-prem Prism Central is running. The Xi infrastructure load balancer route is where the traffic for Xi CVMs and PC is located. You receive this information when you begin using Xi Cloud Services.
Source address | Destination address | Source port | Destination port |
---|---|---|---|
PC subnet | Load balancer route advertised | Any | 1024–1034 |
Xi infrastructure load balancer route | PC and CVM subnet | Any |
2020 2009 9440 |
The following port requirements are applicable only if you are using the Nutanix VPN solution. | |||
Nutanix VPN VM | 8.8.8.8 and 8.8.4.4 IP addresses of the DNS server | VPN VM | DNS UDP port 53 |
Nutanix VPN VM | time.google.com, 0.pool.ntp.org, 1.pool.ntp.org, 2.pool.ntp.org of the NTP server | VPN VM | NTP UDP port 123 |
Nutanix VPN VM | ICMP ping to NTP servers | NA | NA |
CVM IP address in AHV clusters | HTTPS request to the Internet | AHV hosts | HTTPS port 443 |
CVM IP address in ESXi clusters | HTTPS and FTP requests to the Internet | ESXi hosts | HTTPS port 443 and FTP 21 |
Create a VPN gateway to represent the Xi VPN gateway appliance.
Perform the following to create a Xi VPN gateway.
The Create VPN Gateway window appears.
For more information, see Adding a Static Route in Xi Infrastructure Service Administration Guide .
Create a VPN gateway to represent the on-prem VPN gateway appliance.
Perform the following to create an on-prem VPN gateway.
For more information, see Adding a Static Route in Xi Infrastructure Service Administration Guide .
Create a VPN connection to establish a VPN IPSec tunnel between a VPN gateway in the Xi Cloud and VPN gateway in your on-prem site. Select the Xi gateway and on-prem gateway between whom you want to create the VPN connection.
Perform the following to create a VPN connection.
The Create VPN Connection window appears.
Depending upon your VPN solution, you can download detailed instructions about how to configure your on-prem VPN gateway appliance.
Perform the following to download the instructions to configure your on-prem VPN gateway appliance.
You can see the details of each VPN gateway, update the gateway, or delete the gateway.
All your VPN gateways are displayed in the VPN Gateways page.
You can display the details such as the type of gateway, VPC, IP addresses, protocols, and connections associated with the gateways.
Perform the following to display the details of a VPN gateway.
The details that you can update in a VPN gateway depend on the type of gateway (Xi gateway or On Prem gateway).
Perform the following to update a VPN gateway.
The Update VPN Gateway dialog box appears.
If you want to delete a VPN gateway, you must first delete all the VPN connections associated with the gateway and only then you can delete the VPN gateway.
Perform the following to delete a VPN gateway.
You can see the details of each VPN connection, update the connection, or delete the connection.
All your VPN connections are displayed in the VPN Connections page.
You can display details such as the gateways associated with the connection, protocol details, Xi gateway routes, throughput of the connection, and logs of the IPSec and eBGP sessions for troubleshooting purposes.
Perform the following to display the details of a VPN connection.
Click the name of the tab to display the details in that tab. For example, click the Summary tab to display the details.
You can update the name, description, IPSec secret, and dynamic route priority of the VPN connection.
Perform the following to update a VPN connection.
The Update VPN Connection dialog box appears.
Perform the following to delete a VPN connection.
You can upgrade the VPN gateway VM in the Xi Cloud and on-prem VPN gateway VM in your on-prem site if you are using the On Prem - Nutanix VPN solution by using the Xi Cloud Services portal. If you are using a third-party VPN solution, you can upgrade only the VPN gateway VM running in the Xi Cloud by using the Xi Cloud Services portal. To upgrade the on-prem VPN gateway appliance provided by a third-party vendor, see the documentation of that vendor for instructions about how to upgrade the VPN appliance.
Perform the following to upgrade your VPN gateway appliances.
To upgrade the VPN gateway VM running in the Xi Cloud, select a Xi gateway.
To upgrade the VPN gateway VM running in your on-prem site, select the on-prem gateway associated with that on-prem VPN gateway VM.
The VPN Version dialog box appears.
If you are using the latest version of the VPN gateway VM, the VPN Version dialog box displays a message that your VPN gateway VM is up to date.
If your VPN gateway VM is not up to date, the VPN Version dialog box displays the Upgrade option.
A planned or an unplanned failover for production workloads requires production virtual networks in both the primary and the recovery site. To ensure that a failover operation, whenever necessary, goes as expected, you also need test virtual network in both the sites for testing your recovery configuration in both directions (failover and failback). To isolate production and test workflows, a recovery plan in Leap uses four separate virtual networks, which are as follows.
The following figures show the source and target networks for planned, unplanned, and test failovers.
Virtual networks on on-prem Nutanix clusters are virtual subnets bound to a single VLAN. At on-prem sites (including the recovery site), you must manually create the production and test virtual networks before you create your first recovery plan.
The virtual networks required in Xi Cloud Services are contained within virtual private clouds (VPCs). Virtual networks required for production workloads are contained within a virtual private cloud named production. Virtual networks required for testing failover from on-prem sites are contained within a virtual private cloud named Test. The task of creating virtual networks in the VPCs in Xi Cloud Services is an optional one. If you do not create a virtual network in a VPC, Leap dynamically creates the virtual networks for you when a failover operation is in progress. Leap cleans up dynamically created virtual networks when they are no longer required (after failback).
You can use your on-prem Prism Central instance to create, modify, and remove virtual networks. For information about how to perform these procedures by using Prism Central, see the Prism Central Guide .
You can create virtual subnets in the production and test virtual networks. This is an optional task. You must perform these procedures in Xi Cloud Services. For more information, see the Xi Infrastructure Services Guide .
Nutanix offers standard service level agreements (SLAs) for data replication from your on-prem AHV clusters to Xi Cloud Services based on RPO and RTO. The replication to Xi Cloud Services occurs over public Internet (VPN or DirectConnect) and therefore the network bandwidth available for replication to Xi Cloud Services cannot be controlled. The unstable network bandwidth and the lack of network information affects the amount of data that can be replicated in a given time frame. You can test your RPO objectives by setting up a real protection policy or use Xi Leap RPO sizer utility to simulate the protection plan (without replicating data to Xi Cloud Services). Xi Leap RPO Sizer provides you with information required to determine if the RPO SLAs are achievable. The utility provides insights on your network bandwidth, estimates performance, calculates actual change rate, and calculates the feasible RPO for your data protection plan.
See Xi Leap Service-Level Agreements (SLAs) for more information about Nutanix SLAs for data replication to Xi Cloud Services. To use the Xi Leap RPO Sizer utility, perform the following steps.
nutanix@cvm$ mkdir dir_name
Replace dir_name with an identifiable name. For example, rpo_sizer.
nutanix@cvm$ cp download_bundle_path/rpo_sizer.tar ./dir_name/
Replace download_bundle_path with the path to the downloaded bundle.
Replace dir_name with the directory name created in the previous step.
nutanix@cvm$ cd ./dir_name
Replace dir_name with the directory name created in the step 4.a.
nutanix@cvm$ tar -xvf rpo_sizer.tar
nutanix@cvm$ chmod +x rpo_sizer.sh
nutanix@cvm$ ./rpo_sizer.sh
The container name "/rpo_sizer" is already in use by container "xxxx"(where xxxx is the container name. You have to remove (or rename) that container to be able to reuse that name.
http://
Prism_Central_IP_address
:8001/
to run the RPO test.
Replace Prism_Central_IP_address with the virtual IP address of your Prism Central deployment.
nutanix@cvm$ modify_firewall -p 8001 -o open -i eth0 -a
Close
the port after running the RPO
test.
nutanix@cvm$ modify_firewall -p 8001 -o close -i eth0 -a
Automated data recovery (DR) configurations use protection policies to protect the guest VMs, and recovery plans to orchestrate the recovery of those guest VMs to Xi Cloud Services. With reverse synchronization, you can protect guest VMs and enable DR from Xi Cloud Services to a Nutanix cluster at an on-prem availability zone (site). You can automate protection of your guest VMs with the following supported replication schedules in Xi Leap.
Asynchronous replication schedules enable you to protect your guest VMs with an RPO of 1 hour or beyond. A protection policy with an Asynchronous replication schedule creates a recovery point in an hourly time interval, and replicates it to Xi Cloud Services for High Availability. For guest VMs protected with Asynchronous replication schedule, you can perform disaster recovery (DR) to Xi Cloud Services. With reverse synchronization, you can perform DR from Xi Cloud Services to a Nutanix cluster at an on-prem site. In addition to performing DR from AHV clusters to Xi Cloud Services (only AHV), you can also perform cross-hypervisor disaster recovery (CHDR)—DR from ESXi clusters to Xi Cloud Services.
The following are the specific requirements for protecting your guest VMs with Asynchronous replication schedule. Ensure that you meet the following requirements in addition to the general requirements of Xi Leap.
For information about the general requirements of Xi Leap, see Xi Leap Requirements.
For information about the on-prem node, disk and Foundation configurations required to support Asynchronous replication schedules, see On-Prem Hardware Resource Requirements.
AHV or ESXi
The on-prem Prism Central and their registered clusters (Prism Elements) must be running the following versions of AOS.
Xi Cloud Services runs the latest versions of AOS.
Guest VMs protected with Asynchronous replication schedule support cross-hypervisor disaster recovery. You can perform failover (DR) to recover guest VMs from ESXi clusters to AHV clusters (Xi Cloud Services) by considering the following requirements.
NGT configures the guest VMs with all the required drivers for VM portability. For more information about general NGT requirements, see Nutanix Guest Tools Requirements and Limitations in Prism Web Console Guide .
In vSphere 6.7, guest VMs are configured with UEFI secure boot by default. Upon CHDR to an AHV cluster, these guest VMs do not start if the host does not support the UEFI secure boot feature. For more information about supportability of UEFI secure boot on Nutanix clusters, see Compatibility Matrix.
For operating systems that support UEFI and Secure Boot, see UEFI and Secure Boot Support for CHDR.
If you have delta disks attached to a guest VM and you proceed with failover, you get a validation warning and the guest VM does not recover. Contact Nutanix Support for assistance.
Operating System | Version | Requirements and limitations |
---|---|---|
Windows |
|
|
Linux |
|
|
The storage container name of the protected guest VMs must be the same on both the primary and recovery clusters. Therefore, a storage container must exist on the recovery cluster with the same name as the one on the primary cluster. For example, if the protected VMs are in the SelfServiceContainer storage container on the primary cluster, there must also be a SelfServiceContainer storage container on the recovery cluster.
Consider the following specific limitations before protecting your guest VMs with Asynchronous replication schedule. These limitations are in addition to the general limitations of Xi Leap.
For information about the general limitations of Leap, see Xi Leap Limitations.
Cross hypervisor disaster recovery (CHDR) does not preserve hypervisor-specific properties (for example, multi-writer flags, independent persistent and non-persistent disks, changed block tracking (CBT), PVSCSI disk configurations).
To protect the guest VMs in an hourly replication schedule, configure an Asynchronous replication schedule while creating the protection policy. The policy takes recovery points of those guest VMs in the specified time intervals (hourly) and replicates them to Xi Cloud Services for High Availability. With reverse synchronization, you can create policy at Xi Cloud Services and replicate to an on-prem availability zone (site). For protection from Xi Cloud Services to an on-prem site, the protection policy allows you to add only one Asynchronous replication schedule.
To create a protection policy with an Asynchronous replication schedule, perform the following procedure at Xi Cloud Services. You can also create a protection policy at the on-prem site. Protection policies you create or update at the on-prem site synchronize back to Xi Cloud Service.
The drop-down lists all the sites paired with the local site. Local AZ represents the local site (Prism Central). For your primary site, you can check either the local site or a non-local site.
The drop-down lists all the Nutanix clusters registered to Prism Central representing the selected site. If you want to protect the guest VMs from multiple Nutanix clusters in the same protection policy, check the clusters that host those guest VMs. All Clusters protects the guest VMs of all Nutanix clusters registered to Prism Central.
Clicking Save activates the Recovery Location pane. After saving the primary site configuration, you can optionally add a local schedule (step iv) to retain the recovery points at the primary site.
Specify the following information in the Add Schedule window.
When you enter the frequency in minutes , the system selects the Roll-up retention type by default because minutely recovery points do not support Linear retention types.
For more information about the roll-up recovery points, see step d.iii.
Irrespective of the local or replication schedules, the recovery points are of the specified type. If you check Take App-Consistent Recovery Point , the recovery points generated are application-consistent and if you do not check Take App-Consistent Recovery Point , the recovery points generated are crash-consistent. If the time in the local schedule and the replication schedule match, the single recovery point generated is application-consistent.
The drop-down lists all the sites paired with the Xi Cloud Services. XI-US-EAST-1A-PPD : Auto represents the local site (Prism Central). Do not select XI-US-EAST-1A-PPD : Auto because a duplicate location is not supported in Xi Cloud Services.
If you do not select a site, local recovery points that are created by the protection policy do not replicate automatically. You can, however, replicate the recovery points manually and use recovery plans to recover the guest VMs. For more information, see Manual Disaster Recovery (Leap).
The drop-down lists all the Nutanix clusters registered to Prism Central representing the selected site. If you want to protect the guest VMs from multiple Nutanix clusters in the same protection policy, check the clusters that host those guest VMs. All Clusters protects the guest VMs of all Nutanix clusters registered to Prism Central.
Clicking Save activates the + Add Schedule button between the primary and the recovery site. After saving the recovery site configuration, you can optionally add a local schedule to retain the recovery points at the recovery site.
Specify the following information in the Add Schedule window.
When you enter the frequency in minutes , the system selects the Roll-up retention type by default because minutely recovery points do not support Linear retention types.
For more information about the roll-up recovery points, see step d.iii.
Irrespective of the local or replication schedules, the recovery points are of the specified type. If you check Take App-Consistent Recovery Point , the recovery points generated are application-consistent and if you do not check Take App-Consistent Recovery Point , the recovery points generated are crash-consistent. If the time in the local schedule and the replication schedule match, the single recovery point generated is application-consistent.
Specify the following information in the Add Schedule window. The window auto-populates the Primary Location and Recovery Location that you have selected in step b and step c.
The specified frequency is the RPO. For more information about RPO, see Leap Terminology.
This field is unavailable if you do not specify a recovery location.
If you select linear retention, the remote and local retention count represents the number of recovery points to retain at any given time. If you select roll-up retention, these numbers specify the retention period.
Reverse retention maintains the retention numbers of recovery points even after failover to a recovery site in the same or different availability zones. For example, if you retain two recovery points at the primary site and three recovery points at the recovery site, and you enable reverse retention, a failover event does not change the initial retention numbers when the recovery points replicate back to the primary site. The recovery site still retains two recovery points while the primary site retains three recovery points. If you do not enable reverse retention, a failover event changes the initial retention numbers when the recovery points replicate back to the primary site. The recovery site retains three recovery points while the primary site retains two recovery points.
Maintaining the same retention numbers at a recovery site is required if you want to retain a particular number of recovery points, irrespective of where the guest VM is after its failover.
Application-consistent recovery points ensure that application consistency is maintained in the replicated recovery points. For application-consistent recovery points, install NGT on the guest VMs running on AHV clusters. For guest VMs running on ESXi clusters, you can take application-consistent recovery points without installing NGT, but the recovery points are hypervisor-based, and leads to VM stuns (temporary unresponsive VMs) after failover to the recovery sites.
For example, the guest VM VM_SherlockH is in the category Department:Admin , and you add this category to the protection policy named PP_AdminVMs . Now, if you add VM_SherlockH from the VMs page to another protection policy named PP_VMs_UK , VM_SherlockH is protected in PP_VMs_UK and unprotected from PP_AdminVMs .
If you do not want to protect the guest VMs category wise, proceed to the next step without checking VM categories. You can add the guest VMs individually to the protection policy later from the VMs page (see Adding Guest VMs individually to a Protection Policy).
To orchestrate the failover (disaster recovery) of the protected guest VMs to the recovery site, create a recovery plan. After a failover, a recovery plan recovers the protected guest VMs to the recovery availability zone (site). To create a recovery plan, perform the following procedure at Xi Cloud Services. You can also create a recovery plan at the on-prem site. The recovery plan you create or update at the on-prem site synchronizes back to Xi Cloud Service.
Traditionally, to achieve this new configuration, you would manually log on to the recovered VM and modify the relevant files. With in-guest scripts, you have to write a script to automate the required steps and enable the script when you configure a recovery plan. The recovery plan execution automatically invokes the script and performs the reassigning of DNS IP address and reconnection to the database server at the recovery site.
Traditionally, to reconfigure, you would manually log on to the VM, remove the VM from an existing domain controller, and then add the VM to a new domain controller. With in-guest scripts, you can automate the task of changing the domain controller.
C:\Program Files\Nutanix\scripts\production\vm_recovery
C:\Program Files\Nutanix\scripts\test\vm_recovery
/usr/local/sbin/production_vm_recovery
/usr/local/sbin/test_vm_recovery
The recovery plan is created. To verify the recovery plan, see the Recovery Plans page. You can modify the recovery plan to change the recovery location, add, or remove the protected guest VMs. For information about various operations that you can perform on a recovery plan, see Recovery Plan Management.
You perform failover of the protected guest VMs when unplanned failure events (for example, natural disasters) or planned events (for example, scheduled maintenance) happen at the primary availability zone (site) or the primary cluster. The protected guest VMs migrate to the recovery site where you perform the failover operations. On recovery, the protected guest VMs start in the Xi Cloud Services region you specify in the recovery plan that orchestrates the failover.
The following are the types of failover operations in Xi Leap.
After the failover, replication begins in the reverse direction. You can perform an unplanned failover operation only if recovery points have replicated to the recovery cluster. At the recovery site, failover operations cannot use recovery points that were created locally in the past. For example, if you perform an unplanned failover from the primary site AZ1 to recovery site AZ2 in Xi Cloud Services and then attempt an unplanned failover (failback) from AZ2 to AZ1 , the recovery succeeds at AZ1 only if the recovery points are replicated from AZ2 to AZ1 after the unplanned failover operation. The unplanned failover operation cannot perform recovery based on the recovery points that were created locally when the VMs were running in AZ1 .
The procedure for performing a planned failover is the same as the procedure for performing an unplanned failover. You can perform a failover even in different scenarios of network failure. For more information about network failure scenarios, see Leap and Xi Leap Failover Scenarios.
After you create a recovery plan, you can run a test failover periodically to ensure that the failover occurs smoothly when required. You can perform the test failover from Xi Cloud Services.
To perform a test failover to Xi Cloud Services, do the following.
After testing a recovery plan, you can remove the test VMs that the recovery plan created in the recovery test network on Xi Cloud Services. To clean up the test VMs created when you test a recovery plan, do the following.
Perform a planned failover at the recovery site. To perform a planned failover to Xi Cloud Services, do the following procedure.
The entities of AHV/ESXi clusters recover at a different path on the ESXi clusters if their files conflict with the existing files on the recovery ESXi cluster. For example, there is a file name conflict if a VM (VM1) migrates to a recovery cluster that already has a VM (VM1) in the same container.
However, the entities recover at a different path with VmRecoveredAtAlternatePath alert only if the following conditions are met.
If these conditions are not satisfied, the failover operation fails.
Perform an unplanned failover at the recovery site. To perform an unplanned failover to Xi Cloud Services, do the following procedure.
The entities of AHV/ESXi clusters recover at a different path on the ESXi clusters if their files conflict with the existing files on the recovery ESXi cluster. For example, there is a file name conflict if a VM (VM1) migrates to a recovery cluster that already has a VM (VM1) in the same container.
However, the entities recover at a different path with VmRecoveredAtAlternatePath alert only if the following conditions are met.
If these conditions are not satisfied, the failover operation fails.
A failback is similar to a failover but in the reverse. The same recovery plan applies to both the failover and the failback operations. Therefore, how you perform a failback is identical to how you perform a failover. Log on to the site where you want the VMs to failback, and then perform a failover. For example, if you failed over VMs from an on-prem site to Xi Cloud Services, to failback to the on-prem site, perform the failover from the on-prem site.
To perform a failback, do the following procedure at the primary site.
Resolve the error conditions and then restart the failover procedure.
The entities of AHV/ESXi clusters recover at a different path on the ESXi clusters if their files conflict with the existing files on the recovery ESXi cluster. For example, there is a file name conflict if a VM (VM1) migrates to a recovery cluster that already has a VM (VM1) in the same container.
However, the entities recover at a different path with VmRecoveredAtAlternatePath alert only if the following conditions are met.
If these conditions are not satisfied, the failover operation fails.
After you trigger a failover operation, you can monitor failover-related tasks. To monitor a failover, do the following.
Nutanix supports CHDR migrations of guest VMs having UEFI and Secure Boot.
Nutanix Software | Minimum Supported Version |
---|---|
Minimum AOS | 5.19.1 |
Minimum PC | pc.2021.1 |
Minimum NGT | 2.1.1 |
Operating Systems | Versions |
---|---|
Microsoft Windows |
|
Linux |
|
Operating Systems | Versions |
---|---|
Microsoft Windows |
|
Linux |
|
System Configuration | Limitation |
---|---|
Microsoft Windows Defender Credential Guard |
VMs which have Credential Guard enabled cannot be recovered with CHDR recovery solution. |
IDE + Secure Boot |
VMs on ESXi which have IDE Disks or CD-ROM and Secure Boot enabled cannot be recovered on AHV. |
UEFI VMs on CentOS 7.4, CentOS 7.5 and Ubuntu 16.04 may fail to boot after CHDR migration. |
CentOS 7.4, CentOS 7.5 and Ubuntu 16.04 UEFI VMs do not boot after cross-hypervisor disaster recovery migrations. See KB-10633 for more information about this limitation. Contact Nutanix Support for assistance with this limitation. |
UEFI VM may fail to boot after failback. |
When a UEFI VM is booted on AHV for the first time, UEFI firmware settings of the VM are initialized. The next step is to perform a guest reboot or guest shutdown to fully flush the settings into persistent storage in the NVRAM. If this UEFI VM is failed over to an ESXi host without performing the guest reboot/shutdown, the UEFI settings of the VM remain partial. Although the VM boots on ESXi, it fails to boot on AHV when a failback is performed. See KB-10631 for more information about this limitation. Contact Nutanix Support for assistance with this limitation. |
NearSync replication enables you to protect your data with an RPO of as low as 1 minute. You can configure a protection policy with NearSync replication by defining the VMs or VM categories. The policy creates a recovery point of the VMs in minutes (1–15 minutes) and replicates it to Xi Cloud Services. You can configure disaster recovery with Asynchronous replication between an on-prem AHV or ESXi clusters and Xi Cloud Services. You can also perform cross-hypervisor disaster recovery (CHDR)—disaster recovery of VMs from AHV clusters to ESXi clusters or of VMs from ESXi clusters to AHV clusters.
The following are the advantages of NearSync replication.
Stun time is the time of application freeze when the recovery point is taken.
To implement the NearSync feature, Nutanix has introduced a technology called lightweight snapshots (LWSs). LWS recovery points are created at the metadata level only, and they continuously replicate incoming data generated by workloads running on the active clusters. LWS recovery points are stored in the LWS store, which is allocated on the SSD tier. When you configure a protection policy with NearSync replication, the system allocates the LWS store automatically.
When you configure a protection policy with NearSync replication, the policy remains in an hourly schedule until its transition into NearSync is complete.
To transition into NearSync, initial seeding of the recovery site with the data is performed, the recovery points are taken on an hourly basis, and replicated to the recovery site. After the system determines that the recovery points containing the seeding data have replicated within a specified amount of time (default is an hour), the system automatically transitions the protection policy into NearSync depending on the bandwidth and the change rate. After you transition into NearSync, you can see the configured NearSync recovery points in the web interface.
The following are the characteristics of the process.
To transition out of NearSync, you can do one of the following.
Repeated transitioning in and out of NearSync can occur because of the following reasons.
Depending on the RPO (1–15 minutes), the system retains the recovery points for a specific amount of time. For protection policy with NearSync replication, you can configure the retention policy for days, weeks, or months on both the primary and recovery sites instead of defining the number of recovery points you want to retain. For example, if you desire an RPO of 1 minute and want to retain the recovery points for 5 days, the following retention policy is applied.
You can also define recovery point retention in weeks or months. For example, if you configure a 3-month schedule, the following retention policy is applied.
The following are the specific requirements of configuring protection policies with NearSync replication schedule in Xi Leap. Ensure that you meet the following requirements in addition to the general requirements of Xi Leap.
For more information about the general requirements of Xi Leap, see Xi Leap Requirements.
For information about the on-prem node, disk and Foundation configurations required to support NearSync replication schedules, see On-Prem Hardware Resource Requirements.
AHV or ESXi clusters running AOS 5.17 or newer, each registered to a different Prism Central
The on-prem Prism Central and its registered clusters (Prism Elements) must be running the following versions of AOS.
Data Protection with NearSync replication supports cross-hypervisor disaster recovery. You can configure disaster recovery to recover VMs from AHV clusters to ESXi clusters or VMs from ESXi clusters to AHV clusters by considering the following requirement of CHDR.
NGT configures the guest VMs with all the required drivers for VM portability. For more information about general NGT requirements, see Nutanix Guest Tools Requirements and Limitations in Prism Web Console Guide .
For information about operating systems that support UEFI and Secure Boot, see UEFI and Secure Boot Support for CHDR.
Operating System | Version | Requirements and Limitations |
---|---|---|
Windows |
|
|
Linux |
|
|
The following are the specific limitations of data protection with NearSync replication in Xi Leap. These limitations are in addition to the general limitations of Leap.
For information about the general limitations of Leap, see Xi Leap Limitations.
For example, if you have 1 day retention at the primary site and 5 days retention at the recovery site, and you want to go back to a recovery point from 5 days ago. NearSync does not support replicating 5 days retention back from the recovery site to the primary site.
Create a NearSync protection policy in the primary site Prism Central. The policy schedules recovery points of the protected VMs as per the set RPO and replicates them to Xi Cloud Services for availability. When creating a protection policy, you can specify only VM categories. If you want to include VMs individually, you must first create the protection policy—which can also include VM categories and then include the VMs individually in the protection policy from the VMs page.
Ensure that the AHV or ESXi clusters on both the primary and recovery site are NearSync capable. A cluster is NearSync capable if the capacity of each SSD in the cluster is at least 1.2 TB.
See NearSync Replication Requirements (Xi Leap) and NearSync Replication Limitations (Xi Leap) before you start.
To create a protection policy with NearSync replication in Xi Cloud Services, perform the following procedure.
By default, recovery point creation begins immediately after you create the protection policy. If you want to specify when recovery point creation must begin, click Change , and then, in the Start Time dialog box, do the following.
Click Start from specific point in time.
In the time picker, specify the time at which you want to start taking recovery points.
Click Save .
If you select linear retention, the remote and local retention count represents the number of recovery points to retain at any given time. If you select roll-up retention, these numbers specify the retention period.
You cannot protect a VM by using two or more protection policies. Therefore, VM categories specified in another protection policy are not listed here. Also, if you included a VM in another protection policy by specifying the category to which it belongs (category-based inclusion), and if you add the VM to this policy by using its name (individual inclusion), the individual inclusion supersedes the category-based inclusion. Effectively, the VM is protected only by this protection policy and not by the protection policy in which its category is specified.
For example, the guest VM VM_SherlockH is in the category Department:Admin , and you add this category to the protection policy named PP_AdminVMs . Now, if you add VM_SherlockH from the VMs page to another protection policy named PP_VMs_UK , VM_SherlockH is protected in PP_VMs_UK and unprotected from PP_AdminVMs .
Create a recovery plan in the primary Prism Central. The procedure for creating a recovery plan is the same for all the data protection strategies in Xi Leap.
For more information about creating a recovery plan in Xi Leap, see Creating a Recovery Plan (Xi Leap).
A protection policy automates the creation and replication of recovery points. When configuring a protection policy for creating local recovery points, you specify the RPO, retention policy, and the VMs that you want to protect. You also specify the recovery location if you want to automate recovery point replication to Xi Cloud Services.
When you create, update, or delete a protection policy, it synchronizes to the paired Xi Cloud Services. The recovery points automatically start replicating in the reverse direction after you perform a failover at the recovery Xi Cloud Services. For information about how Xi Leap determines the list of availability zones for synchronization, see Entity Synchronization Between Paired Availability Zones.
You can also add VMs directly to a protection policy from the VMs page, without the use of a VM category. To add VMs directly to a protection policy in Xi Cloud Services, perform the following procedure.
You can directly remove guest VMs from a protection policy from the VMs page. To remove guest VMs from a protection policy in Xi Cloud Services, perform the following procedure.
If the requirements of the protection policy that you want to create are similar to an existing protection policy in Xi Cloud Services, you can clone the existing protection policy and update the clone.
To clone a protection policy from Xi Cloud Services, perform the following procedure.
You can modify an existing protection policy in the Xi Cloud Services. To update an existing protection policy in Xi Cloud Services, perform the following procedure.
You can use the data protection focus on the VMs page to determine the protection policies to which a VM belongs in Xi Cloud Services. To determine the protection policy in Xi Cloud Services to which a VM belongs, do the following.
A recovery plan orchestrates the recovery of protected VMs at a recovery site. Recovery plans are predefined procedures (runbooks) that use stages to enforce VM power-on sequence. You can also configure the inter-stage delays to recover applications gracefully. Recovery plans that recover applications in Xi Cloud Services are also capable of creating the required networks during failover and can assign public-facing IP addresses to VMs.
A recovery plan created in one availability zone (site) replicates to the paired availability zone and works bidirectionally. After a failover from the primary site to a recovery site, you can failback to the primary site by using the same recovery plan.
After you create a recovery plan, you can validate or test it to ensure that recovery goes through smoothly when failover becomes necessary. Xi Cloud Services includes a built-in VPC for validating or testing failover.
Recovery plans are independent of protection policies and do not reference protection policies in their configuration information. Also, they do not create recovery points. While the process of planned failover includes the creation of a recovery point so that the latest data can be used for recovery, unplanned and test failovers rely on the availability of the required recovery points at the designated recovery site. A recovery plan therefore requires the VMs in the recovery plan to also be associated with a protection policy.
Recovery plans are synchronized to one or more paired sites when they are created, updated, or deleted. For information about how Leap determines the list of availability zones (sites) for synchronization, see Entity Synchronization Between Paired Availability Zones.
You can also add VMs directly to a recovery plan in the VMs page, without the use of a category. To add VMs directly to a recovery plan in Xi Cloud Services, perform the following procedure.
You can also remove VMs directly from a recovery plan in Xi Cloud Services. To remove VMs directly from a protection policy, perform the following procedure.
You can update an existing recovery plan in Xi Cloud Services. To update a recovery plan, perform the following procedure.
You can validate a recovery plan from the recovery site. For example, if you perform the validation in the Xi Cloud Services (primary site being an on-prem site), Leap validates failover from the on-prem site to Xi Cloud Services. Recovery plan validation only reports warnings and errors. Failover is not performed. In this procedure, you need to specify which of the two paired sites you want to treat as the primary, and then select the other site as the secondary.
To validate a recovery plan, do the following.
Manual data protection involves manually creating recovery points, manually replicating recovery points, and manually recovering the VMs at the recovery site. You can also automate some of these tasks. For example, the last step—that of manually recovering VMs at the recovery site—can be performed by a recovery plan while the underlying recovery point creation and replication can be performed by protection policies. Conversely, you can configure protection policies to automate recovery point creation and replication and recover VMs at the recovery site manually.
To create recovery points manually in Xi Cloud Services, do the following.
You can manually replicate recovery points only from the site where the recovery points exist.
To replicate recovery points manually from Xi Cloud Service, do the following.
You can recover a VM by cloning a VM from a recovery point.
To recover a VM from a recovery point at Xi Cloud Services, do the following.
When paired with each other, availability zones (sites) synchronize disaster recovery configuration entities. Paired sites synchronize the following disaster recovery configuration entities.
If you specify VM categories in a recovery plan, Leap determines which protection policies use those VM categories, and then synchronizes the recovery plans to the availability zones specified in those Protection Plans.
If you include VMs individually in a recovery plan, Leap uses the recovery points of those VMs to determine which protection policies created those recovery points, and then synchronizes the recovery plans to the availability zones specified in those protection policies. If you create a recovery plan for VM categories or VMs that are not associated with a protection policy, Leap cannot determine the availability zone list and therefore cannot synchronize the recovery plan. If a recovery plan includes only individually added VMs and a protection policy associated with a VM has not yet created VM recovery points, Leap cannot synchronize the recovery plan to the availability zone specified in that protection policy. However, recovery plans are monitored every 15 minutes for the availability of recovery points that can help derive availability zone information. When recovery points become available, Xi Leap derives the availability zone by the process described earlier and synchronizes the recovery plan to the availability zone.
If you do not update entities before a connectivity issue is resolved or before you pair the availability zones again, the synchronization behavior described earlier resumes. Also, pairing previously unpaired availability zones trigger an automatic synchronization event. For recommendations to avoid facing such issues, see Entity Synchronization Recommendations (Xi Leap).
Consider the following recommendations to avoid inconsistencies and the resulting synchronization issues.
You can safely create entities at either or both the sites as long as you do not assign the same name to entities at the two sites. After the connectivity issue is resolved, force synchronization from the site where you created entities.
Entity synchronization, when forced from an availability zone (site), overwrites the corresponding entities in paired sites. Forced synchronization also creates, updates, and removes those entities from paired sites.
The availability zone (site) to which a particular entity is forcefully synchronized depends on which site requires the entity (seeEntity Synchronization Between Paired Availability Zones). To avoid inadvertently overwriting required entities, ensure to force VM synchronization from the site in which the entities have the desired configuration.
If a site is paired with two or more availability zones (sites), you cannot select one or more sites with which to synchronize entities.
To force entity synchronization from Xi Cloud Services, do the following.
You can protect a guest VM either with a protection domain in Prism Element or with a protection policy in Prism Central. If you have guest VMs in protection domains, migrate those guest VMs to protection policies to orchestrate their disaster recovery using Leap.
To migrate a guest VM from a protection domain to a protection policy manually, perform the following procedure.
For Epoch documentation, see https://docs.epoch.nutanix.com/
Last updated: 2022-06-14
File Analytics provides data and statistics on the operations and contents of a file server.
Once deployed, Nutanix Files adds a File Analytics VM (FAVM) to the Files cluster. A single File Analytics VM supports all file servers in the cluster; however, you must enable File Analytics separately for each file server. File Analytics protects data on the FAVM, which is kept in a separate volume group.
Once you deploy File Analytics, a new File Analytics link appears on the file server actions bar. Use the link to access File Analytics on any file server that has File Analytics enabled.
The File Analytics web console consists of display features:
Main menu bar : The main menu bar appears at the top of every page of the File Analytics web console. The main menu bar includes the following display features:
Meet the following requirements prior to deploying File Analytics.
Ensure that you have performed the following tasks and your Files deployment meets the following specifications.
Open the required ports, and ensure that your firewall allows bi-directional Internet Control Message Protocol (ICMP) traffic between the FAVM and CVMs.
The Port Reference provides detailed port information for Nutanix products and services, including port sources and destinations, service descriptions, directionality, and protocol requirements.
In addition to meeting the File Analytics network requirements, ensure to meet Nutanix Files port requirements as described in the Port Reference .
File Analytics has the following limitations.
Overview of administrative processes for File Analytics.
As an admin, you have the required permissions for performing File Analytics administrative tasks. To add a file server admin user, see Managing Roles in the Nutanix Files Guide . The topics in this chapter describe the basics for administering your File Analytics environment. For advanced administrative options, refer to the File Analytics Options chapter.
Follow this procedure to deploy the File Analytics server.
Steps for enabling File Analytics after deployment or disablement.
Follow these steps to enable File Analytics after disabling the application.
Follow the steps as indicated to disable File Analytics.
File Analytics is disabled on the server. Enable File Analytics to start collecting data again or Delete File Analytics Data.
Do the following to launch File Analytics.
To update a File Analytics VM (FAVM), refer to the sizing guidelines in the File Analytics release notes and follow the steps in the VM Management topic of the Prism Web Console Guide .
Remove a File Analytics VM (FAVM) by disabling it and deleting it from the cluster in Prism.
Follow the steps as indicated to update authentication credentials for LDAP or Active Directory.
Manage the audit data of delete shares and exports.
By default, File Analytics retains deleted share and export data. The dashboard widgets do not account for data of deleted shares and exports. The deleted marker appears next to deleted shares and exports in audit trails. The Manage Share/Export Audit data window displays a list of deleted shares and exports.
Follow the directions as indicated to delete audit data for the deleted share or export.
Steps for updating the password of a File Analytics VM (FAVM).
Context for the current task
nutanix@fsvm$ sudo passwd nutanix
Changing password for user nutanix.
Old Password:
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
The password must meet the following complexity requirements:
Perform File Analytics upgrades using the Life Cycle Manager feature in Prism Element.
Before you upgrade File Analytics, ensure that you are running a compatible version of AOS and Files. Refer to File Analytics release notes for compatibility details. You can upgrade both AOS and Files through Prism Element, see AOS Upgrade in the Prism Web Console Guide .
To upgrade File Analytics, perform inventory and updates using the Life-Cycle Manager (LCM), see the Life Cycle Manager Guide for instructions on performing inventory and updates. LCM cannot upgrade File Analytics when the protection domain (PD) for the File Analytics VM (FAVM) includes any other entities.
During the upgrade process, File Analytics takes a snapshot of the volume group (VG) that contains File Analytics data. If issues occur during an upgrade, File Analytics restores the FAVM to the pre-upgrade state. If the volume group is protected and is part a protection domain, the File Analytics creates a snapshot and sets the expiry time to 30 days. If the volume group is not protected, File Analytics creates a snapshot and deletes the snapshot after completing the upgrade successfully. If any errors occur, the system keeps the snapshot for 30 days to troubleshoot the issue.
Upgrade File Analytics at a dark site using the Life-Cycle Manager (LCM).
The Dashboard tab displays data on the operational trends of a file server.
The Dashboard tab is the opening screen that appears after launching File Analytics from Prism. The dashboard displays widgets that present data on file trends, distribution, and operations.
Tile Name | Description | Intervals |
---|---|---|
Capacity trend |
Displays capacity trends for the file server including capacity added, capacity
removed, and net changes.
Clicking an event period widget displays the Capacity Trend Details view. |
7 days, the last 30 days, or the last 1 year. |
Data age | Displays the percentage of data by age. Data age determines the data heat, including: hot, warm, and cold. |
Default intervals are as follows:
|
Anomaly alerts | Displays alerts for configured anomalies and ransomware detection based on blocked file types, see Configuring Anomaly Detection. | [alert] |
Permission denials | Displays users who have had excessive permission denials and the number of denials. Clicking a user displays audit details, see Audit Trails - Users for more. | [user id], [number of permission denials] |
File distribution by size | Displays the number of files by file size. Provides trend details for top 5 files. | Less than 1 MB, 1–10 MB, 10–100 MB, 100 MB to 1 GB, greater than 1 GB). |
File distribution by type | Displays the space taken up by various applications and file types. The file extension determines the file type. See the File types table for more details. | MB or GB |
File distribution by type details view |
Displays a trend graph of the top 5 file types. File distribution details include
file type, current space used, current number of files, and change in space for the
last 7 or 30 days.
Clicking View Details displays the File Distribution by Type view. |
Daily size trend for top 5 files (GB), file type (see the "File Type" table), current space used (GB), current number of files (numeric), change in last 7 or 30 days (GB). |
Top 5 active users | Lists the users who have accessed the most files and number of operations the user performed for the specified period. When there are more than 5 active users, the more link provides details on the top 50 users. Clicking the user name displays the audit view for the user, see Audit Trails - Users for more. | 24 hours, 7 days, 1 month, or 1 year. |
Top 5 accessed files |
Lists the 5 most frequently accessed files. Clicking
more
provides details on the top 50 files.
Clicking the file name displays the audit view details for the file, see Audit Trails - Files for more. |
24 hours, 7 days, 1 month, or 1 year. |
Files operations |
Displays the distribution of operation types for the specified period, including
a count for each operation type and the total sum of all operations.
Operations include: create, delete, read, write, rename, permission changed, set attribute, symlink, permission denied, permission denied (file blocking). Clicking an operation displays the File Operation Trend view. |
24 hours, 7 days, 1 month, or 1 year. |
Clicking an event period in the Capacity Trend widget displays the Capacity Trend Details view for that period. The view includes three tabs Share/Export , Folder , and Category . Each tab includes columns detailing entity details: Name . Net capacity change, capacity added, and capacity removed.
Category | Supported File Type |
---|---|
Name | Name of share/export, folder, or category. |
Net capacity change | The total difference between capacity at the beginning and the end of the specified period. |
Share name (for folders only) | The name of the share or export that the folder belongs to. |
Capacity added | Total added capacity for the specified period. |
Capacity removed | Total removed capacity for the specified period. |
Clicking View Details for the File Distribution by Type widget displays granular details of file distribution, see the File Types table for details.
Category | Supported File Type |
---|---|
File type | Name of file type |
Current space used | Space capacity occupied by the file type |
Current number of files | Number of files for the file type |
Change (in last 30 days) | The increase in capacity over a 30-day period for the specified file type |
Category | Supported File Type |
---|---|
Archives | .cab, .gz, .rar, .tar, .z, .zip |
Audio | .aiff, .au, .mp3, .mp4, .wav, .wma |
Backups | .bak, .bkf, .bkp |
CD/DVD images | .img, .iso, .nrg |
Desktop publishing | .qxd |
Email archives | .pst |
Hard drive images | .tib, .gho, .ghs |
Images | .bmp, .gif, .jpg, .jpeg, .pdf .png, .psd, .tif, .tiff, |
Installers | .msi, .rpm |
Log Files | .log |
Lotus notes | .box, .ncf, .nsf, .ns2, .ns3, .ns4, .ntf |
MS Office documents | .accdb, .accde, .accdt, .accdr, .doc, .docx, .docm, .dot, .dotx, .dotm, .xls, .xlsx, .xlsm, .xlt, .xltx, .xltm, .xlsb, .xlam, .ppt, .pptx, .pptm, .potx, .potm, .ppam, .ppsx, .ppsm, .mdb |
System files | .bin, .dll, .exe |
Text files | .csv, .pdf, .txt |
Video | .avi, mpg, .mpeg, .mov, .m4v |
Disk image | .hlog, .nvram, .vmdk, .vmx, .vmxf, .vmtm, .vmem, .vmsn, .vmsd |
Clicking an operation type in the File Operations widget displays the File Operation Trend view. The File Operation Trend view breaks down the specified period into smaller intervals, and displays the number of occurrences of the operation during each interval.
Category | Description |
---|---|
Operation type | A drop-down option to specify the operation type. See Files Operations in the Dashboard Widgets table for a list of operation types. |
Last (time period) | A drop-down option to specify the period for the file operation trend. |
File operation trend graph | The x-axis displays shorter intervals for the specified period. The y-axis displays the number of operations trend over the extent of the intervals. |
The Health dashboard displays dynamically updated health information about each File File Analytics component.
The Health dashboard includes the following details:
The Data Age widget in the Dashboard provides details on data heat.
Share-level data is displayed to provide details on share capacity trends. There are three levels of data heat.
You can configure the definitions for each level of data heat rather than using the default values.
Update the values that constitute different data heat levels.
Data panes in the Anomalies tab display data and trends for configured anomalies.
You can configure anomalies for the following operations:
Define anomaly rules by the specifying the following conditions:
Meeting the lower operation threshold triggers an anomaly.
Consider a scenario where you have 1 thousand files, the operation count threshold defined as 10, and the operation percentage threshold defined as 10%. The count threshold takes precedence, as 10% of 1 thousand is 100, which is greater than the count threshold of 10.
Pane Name | Description | Values |
---|---|---|
Anomaly Trend | Displays the number of anomalies per day or per month. | Last 7 days, Last 30 days, Last 1 year |
Top Users | Displays the users with the most anomalies and the number of anomalies per user. | Last 7 days, Last 30 days, Last 1 year |
Top Folders | Displays the folders with the most anomalies and the number of anomalies per folder. | Last 7 days, Last 30 days, Last 1 year |
Operation Anomaly Types | Displays the percentage of occurrences per anomaly type. | Last 7 days, Last 30 days, Last 1 year |
Clicking an anomaly bar in the Anomaly Trend graph displays the Anomaly Details view.
Column | Description |
---|---|
Anomaly Type | The configured anomaly type. Anomaly types not configured do not show up in the table. |
Total User Count | The number of users that have performed the operation causing the specified anomaly during the specified time range. |
Total Folder Count | The numbers of folders in which the anomaly occurred during the specified time range. |
Total Operation Count | Total number of anomalies for the specified anomaly type that occurred during the specified time range. |
Time Range | The time range for which the total user count, total folder count, and total operation count are specified. |
Column | Description |
---|---|
Username or Folders | Indicates the entity for the operation count. Selecting the Users tab indicates operation count for specific users, and selecting the Folders tab indicates the operation count for specific folders. |
Operation count | The total number of operations causing anomalies for the selected user or folder during the time period for the bar in the Anomaly Trend graph. |
Steps for configuring anomaly rules.
Configure an SMTP server for File Analytics to send anomaly alerts, see Configuring an SMTP Server. To create an anomaly rule, do the following.
File Analytics uses a simple mail transport protocol (SMTP) server to send anomaly alerts.
Use audit trails to look up operation data for a specific user, file, folder, or client.
The Audit Trails tab includes Files , Folders , Users , and Client IP options for specifying the audit type. Use the search bar for specifying the specific entity for the audit (user, folder, file, or client IP).
The results table presents details for entities that match the search criteria. Clicking the entity name (or client IP number) takes you to the Audit Trails dashboard for the target entity.
Audit a user, file, client, or folder.
Details for client IP Audit Trails.
When you search by user in the Audit Trails tab, search results display the following information in a table.
Clicking View Audit displays the Audit Details page, which shows the following audit information for the selected user.
The Results table provides granular details of the audit results. The following data is displayed for every event.
Click the gear icon for options to download the data as an xls, csv, or JSON file.
Dashboard details for folder audits.
The following information displays when you search by file in the Audit Trails tab.
The Audit Details page shows the following audit information for the selected folder.
The Results table provides granular details of the audit results. File Analytics displays the following data for every event.
Click the gear icon for options to download the data as a CSV file.
Dashboards details for file audit.
When you search by file in the Audit Trails tab, the following information displays:
The Audit Details page shows the following audit information for the selected file.
The Results table provides granular details of the audit results. File Analytics displays the following data for every event.
Click the gear icon for options to download the data as a CSV file.
Dashboard details for client IP Audit Trails.
When you search by client IP in the Audit Trails tab, search results display the following information in a table.
The Audit Details page shows the following audit information for the selected client.
The Results table provides granular details of the audit results. File Analytics displays the following data for every event.
Click the gear icon for an option to download the data as a CSV file.
Ransomware protection for your file server.
File Analytics scans files for ransomware in real time, and notifies you through email in the event of a ransomware attack. By using the Nutanix Files file blocking mechanism, File Analytics prevents files with signatures of potential ransomware from carrying out malicious operations. Ransomware protection automatically scans for ransomware based on a curated list of signatures that frequently appear in ransomware files. You can modify the list by manually adding other signatures.
File Analytics also monitors shares for self-service restore (SSR) policies and identifies shares that do not have SSR enabled in the ransomware dashboard. You can enable SSR through the ransomware dashboard by selecting shares identified by File Analytics.
The ransomware dashboard includes panes for managing ransomware protection and self-service restore (SSR).
The ransomware dashboard includes two main sections:
File Analytics blocks the following ransomware signatures.
Extension | Known Ransomware |
---|---|
*.micro | eslaCrypt 3.0 |
*.zepto | Locky |
*.cerber3 | Cerber 3 |
*.locky | Locky |
*.cerber | Cerber |
*.loli | LOLI |
*.mole | CryptoMix (variant) |
*.cryp1 | CryptXXX |
*.axx | AxCrypt |
*.onion | Dharma |
*.crypt | Scatter |
*.osiris | Locky (variant) |
*.crypz | CryptXXX |
*.ccc | TeslaCrypt or Cryptowall |
*.locked | Various ransomware |
*.odin | Locky |
*.cerber2 | Cerber 2 |
*.sage | Sage |
*.globe | Globe |
*.good | Scatter |
*.exx | Alpha Crypt |
*.encrypt | Alpha |
*.encrypted | Various ransomware |
*.1txt | Enigma |
*.ezz | Alpha Crypt |
*.r5a | 7ev3n |
*.wallet | Globe 3 (variant) |
*.decrypt2017 | Globe 3 |
*.zzzzz | Locky |
*.MERRY | Merry X-Mas |
*.enigma | Coverton |
*.ecc | Cryptolocker or TeslaCrypt |
*.cryptowall | Cryptowall |
*.aesir | Locky |
*.cryptolocker | CryptoLocker |
*.coded | Anubis |
*.sexy | PayDay |
*.pubg | PUBG |
*.ha3 | El-Polocker |
*.breaking_bad | Files1147@gmail(.)com |
*.dharma | CrySiS |
*.wcry | WannaCry |
*.lol! | GPCode |
*.damage | Damage |
*.MRCR1 | Merry X-Mas |
*.fantom | Fantom |
*.legion | Legion |
*.kratos | KratosCrypt |
*.crjoker | CryptoJoker |
*.LeChiffre | LeChiffre |
*.maya | HiddenTear (variant) |
*.kraken | Rakhni |
*.keybtc@inbox_com | KeyBTC |
*.rrk | Radamant v2 |
*.zcrypt | ZCRYPT |
*.crinf | DecryptorMax or CryptInfinite |
*.enc | TorrentLocker / Cryptorium |
*.surprise | Surprise |
*.windows10 | Shade |
*.serp | Serpent (variant) |
*.file0locked | Evil |
*.ytbl | Troldesh (variant) |
*.pdcr | PadCrypt |
*.venusf | Venus Locker |
*.dale | Chip |
*.potato | Potato |
*.lesli | CryptoMix |
*.angelamerkel | Angela Merkel |
*.PEGS1 | Merry X-Mas |
*.R16m01d05 | Evil-JS (variant) |
*.zzz | TeslaCrypt |
*.wflx | WildFire |
*.serpent | Serpent |
*.Dexter | Troldesh (variant) |
*.rnsmwr | Gremit |
*.thor | Locky |
*.nuclear55 | Nuke |
*.xyz | TeslaCrypt |
*.encr | FileLocker |
*.kernel_time | KeRanger OS X |
*.darkness | Rakhni |
*.evillock | Evil-JS (variant) |
*.locklock | LockLock |
*.rekt | HiddenTear (variant) / RektLocker |
*.coverton | Coverton |
*.VforVendetta | Samsam (variant) |
*.remk | STOP |
*.1cbu1 | Princess Locker |
*.purge | Globe |
*.cry | CryLocker |
*.zyklon | ZYKLON |
*.dCrypt | DummyLocker |
*.raid10 | Globe [variant] |
*.derp | Derp |
*.zorro | Zorro |
*.AngleWare | HiddenTear/MafiaWare (variant) |
*.shit | Locky |
*.btc | Jigsaw |
*.atlas | Atlas |
*.EnCiPhErEd | Xorist |
*.xxx | TeslaCrypt 3.0 |
*.realfs0ciety@sigaint.org.fs0ciety | Fsociety |
*.vbransom | VBRansom 7 |
*.exotic | Exotic |
*.crypted | Nemucod |
*.fucked | Manifestus |
*.vvv | TeslaCrypt 3.0 |
*.padcrypt | PadCrypt |
*.cryeye | DoubleLocker |
*.hush | Jigsaw |
*.RMCM1 | Merry X-Mas |
*.unavailable | Al-Namrood |
*.paym | Jigsaw |
*.stn | Satan |
*.braincrypt | Braincrypt |
*.ttt | TeslaCrypt 3.0 |
*._AiraCropEncrypted | AiraCrop |
*.spora | Spora |
*.alcatraz | Alcatraz Locker |
*.reco | STOP/DJVU |
*.crypte | Jigsaw (variant) |
*.aaa | TeslaCrypt |
*.pzdc | Scatter |
*.RARE1 | Merry X-Mas |
*.ruby | Ruby |
*.fun | Jigsaw |
*.73i87A | Xorist |
*.abc | TeslaCrypt |
*.odcodc | ODCODC |
*.crptrgr | CryptoRoger |
*.herbst | Herbst |
*.comrade | Comrade |
*.szf | SZFLocker |
*.pays | Jigsaw |
*.antihacker2017 | Xorist (variant) |
*.rip | KillLocker |
*.rdm | Radamant |
*.CCCRRRPPP | Unlock92 |
*.bript | BadEncriptor |
*.hnumkhotep | Globe 3 |
*.helpmeencedfiles | Samas/SamSam |
*.BarRax | BarRax (HiddenTear variant) |
*.magic | Magic |
*.noproblemwedecfiles | Samas/SamSam |
*.bitstak | Bitstak |
*.kkk | Jigsaw |
*.kyra | Globe |
*.a5zfn | Alma Locker |
*.powerfulldecrypt | Samas/SamSam |
*.vindows | Vindows Locker |
*.payms | Jigsaw |
*.lovewindows | Globe (variant) |
*.p5tkjw | Xorist |
*.madebyadam | Roga |
*.conficker | Conficker |
*.SecureCrypted | Apocalypse |
*.perl | Bart |
*.paymts | Jigsaw |
*.kernel_complete | KeRanger OS X |
*.payrms | Jigsaw |
*.paymst | Jigsaw |
*.lcked | Jigsaw (variant) |
*.covid19 | Phishing |
*.ifuckedyou | SerbRansom |
*.d4nk | PyL33T |
*.grt | Karmen HiddenTear (variant) |
*.kostya | Kostya |
*.gefickt | Jigsaw (variant) |
*.covid-19 | Phishing |
*.kernel_pid | KeRanger OS X |
*.wncry | Wana Decrypt0r 2.0 |
*.PoAr2w | Xorist |
*.Whereisyourfiles | Samas/SamSam |
*.edgel | EdgeLocker |
*.adk | Angry Duck |
*.oops | Marlboro |
*.theworldisyours | Samas/SamSam |
*.czvxce | Coverton |
*.crab | GandCrab |
*.paymrss | Jigsaw |
*.kimcilware | KimcilWare |
*.rmd | Zeta |
*.dxxd | DXXD |
*.razy | Razy |
*.vxlock | vxLock |
*.krab | GandCrab v4 |
*.rokku | Rokku |
*.lock93 | Lock93 |
*.pec | PEC 2017 |
*.mijnal | Minjal |
*.kobos | Kobos |
*.bbawasted | Bbawasted |
*.rlhwasted | RLHWasted |
*.52pojie | 52Pojie |
*.FastWind | Fastwind |
*.spare | Spare |
*.eduransom | Eduransom |
*.RE78P | RE78P |
*.pstKll | pstKll |
*.erif | |
*.kook | |
*.xienvkdoc | |
*.deadfiles | |
*.mnbzr | |
*.silvertor | |
*.MH24 | |
*.nile | |
*.ZaCaPa | |
*.tcwwasted | |
*.Spade | |
*.pandemic | |
*.covid | |
*.xati | |
*.Zyr | |
*.spybuster | |
*.ehre | |
*.wannacry | WannaCry |
*.jigsaaw | |
*.boop | |
*.Back | |
*.CYRAT | |
*.bmd | |
*.Fappy | |
*.Valley | |
*.copa | |
*.horse | |
*.CryForMe | |
*.easyransom | |
*.nginxhole | |
*.lockedv1 | Lockedv1 |
*.ziggy | Ziggy |
*.booa | Booa |
*.nobu | Nobu |
*.howareyou | Howareyou |
*.FLAMINGO | Flamingo |
*.FUSION | Fusion |
*.pay2key | Pay2Key |
*.zimba | Zimba, Dharma |
*.luckyday | Luckyday |
*.bondy | Bondy |
*.cring | Cring |
*.boom | Boom |
*.judge | Judge |
*.LIZARD | LIZARD |
*.bonsoir | Bonsoir |
*.moloch | Moloch |
*.14x | 14x |
*.cnh | CNH |
*.DeroHE | DeroHE |
Enable ransomware protection on your file server.
Configure ransomware protection on file servers.
Do the following to add signature to the blocked extension list.
Enable self-service restore on shares identified by File Analytics.
File Analytics scans shares for SSR policies.
Generate a report for entities on the file server.
Create a report with custom attribute values or use one of the File Analytics pre-configured report templates. To create a custom report, you must specify the entity, attributes, operators for some attributes, attribute values, column headings, and the number of columns.
The reports page displays a table or previously generated reports. You can rerun existing reports rather than creating a template. After creating a report, download it as a JSON or CSV file.
The reports dashboard includes options to create, view, and download reports.
The Reports dashboard includes options to create a report, download reports as a JSON, download reports as a CSV, rerun reports, and delete reports.
The reports table includes columns for the report name, status, last run, and actions.
Clicking Create a new report takes you to the report creation screen, which includes a Report builder and a Pre-canned Reports Templates tabs. The tabs include report options and filters for report configuration.
Both tabs include the following elements:
Entity | Attributes (filters) | Operator | Value | Column |
---|---|---|---|---|
Events | event_date |
|
(date) |
|
Event_operation | N/A |
|
||
Files | Category |
|
(date) |
|
Extensions | N/A | (type in value) | ||
Deleted | N/A | Last (number of days from 1 to 30) days | ||
creation_date |
|
(date) | ||
access_date |
|
(date) | ||
Size |
|
(number) (file size)
File size options:
|
||
Folders | Deleted | N/A | Last (number of days from 1 to 30) days |
|
creation_date |
|
(date) | ||
Users | last_event_date |
|
(date) |
|
Entity | Pre-canned report template | Columns |
---|---|---|
Events |
|
|
Files |
|
|
Users |
|
|
Create a custom report by defining the entity, attribute, filters, and columns.
Use one of the pre-canned File Analytics templates for your report.
You can get more insight into the usage and contents of files on your system by configuring and updating File Analytics features and settings. Some options include scanning the files on your file server on demand, updating data retention, and configuring data protection.
The data retention period determines how long File Analytics retains event data.
Follow the steps as indicated to configure data retention.
Once enabled, File Analytics scans the metadata of all files and shares on the system. You can perform an on-demand scan of shares in your file system.
Blacklist users, file extensions, and client IPs.
File Analytics uses the file category configuration to classify file extensions.
The capacity widget in the dashboard uses the category configuration to calculate capacity details.
Configure File Analytics disaster recovery (DR) using Prism Element.
File Analytics only supports async disaster recovery. File Analytics does not support NearSync and metro availability.
Create an async protection domain, configure a protection domain schedule, and configure remote site mapping. The remote site must have symmetric configurations to the primary site. The remote site must also deploy File Analytics to restore a File Analytics VM (FAVM).
The Data Protection section in the Prism Web Console Guide provides more detail on the disaster recovery process.
To set up disaster recovery for File Analytics, create an async protection domain, configure a protection domain schedule, and configure remote site mapping.
By default, the File Analytics volume group resides on the same container that hosts vDisks for Nutanix Files.
Recover a File Analytics VM (FAVM) after a planned or unplanned migration to the remote site.
Perform the following tasks on the remote site.
Deploy a File Analytics VM (FAVM) after a planned or unplanned (disaster) migration to the remote site.
To perform disaster recovery, deploy and enable File Analytics on the remote site. Restore the data using a snapshot of the volume group from the primary FAVM.
nutanix@favm$ sudo blkid
nutanix@favm$ cd /mnt/containers/config/common_config /tmp
nutanix@favm$ sudo systemctl stop monitoring
nutanix@favm$ docker stop $(docker ps -q)
nutanix@favm$ sudo systemctl stop docker
nutnix@avm$ sudo umount /mnt
nutanix@avm$ sudo reboot
nutanix@favm$ sudo blkid
/dev/sr0: UUID="2019-06-11-12-18-52-00" LABEL="cidata" TYPE="iso9660" /dev/sda1: LABEL="_master-x86_64-2" UUID="b1fb6e26-a782-4cf7-b5de-32941cc92722" TYPE="ext4"The FAVM discovers the attached volume group and assigns to the /dev/sdb device.
nutanix@favm$ mv /mnt/containers/config/common_config/cvm.config \
/mnt/containers/config/common_config/cvm_bck.config
nutanix@favm$ mv /tmp/cvm.config /mnt/containers/config/common_config/
nutanix@favm$ sudo python /opt/nutanix/analytics/bin/reset_password.py --user_type=prism \
--password='new password' --local_update
nutanix@favm$ sudo python /opt/nutanix/analytics/bin/reset_password.py --user_type=prism \
--password='new password' --prism_user=admin --prism_password='Prism admin password'
Deploy a File Analytics VM (FAVM) after a planned or unplanned (disaster) migration to the remote site.
To perform disaster recovery, deploy and enable File Analytics on the remote site. Restore the data using a snapshot of the volume group from the primary FAVM.
nutanix@favm$ sudo blkid
nutanix@favm$ cd /mnt/containers/config/common_config/ /tmp
nutanix@favm$ sudo systemctl stop monitoring
nutanix@favm$ docker stop $(docker ps -q)
nutanix@favm$ sudo systemctl stop docker
nutnix@avm$ sudo umount /mnt
nutnix@avm$ sudo /sbin/iscsiadm -m node -u
nutanix@favm$ sudo /sbin/iscsiadm -m node –o delete
nutanix@favm$ sudo blkid
/dev/sr0: UUID="2019-06-11-12-18-52-00" LABEL="cidata" TYPE="iso9660" /dev/sda1: LABEL="_master-x86_64-2" UUID="b1fb6e26-a782-4cf7-b5de-32941cc92722" TYPE="ext4"The output does not show the /dev/sdb device.
nutanix@favm$ sudo cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.1991-05.com.redhat:8ef967b5b8f
nutanix@favm$ sudo /sbin/iscsiadm --mode discovery --type sendtargets --portal data_services_IP_address:3260
Clicking
the Nutanix cluster name in Prism displays cluster details including the
data service IP address. The output displays the restored iSCSI target
from step 2.
nutanix@favm$ sudo /sbin/iscsiadm --mode node --targetname iqn_name --portal data_services_IP_address:3260,1 --login
nutanix@favm$ sudo reboot
nutanix@favm$ sudo blkid
/dev/sr0: UUID="2019-06-11-12-18-52-00" LABEL="cidata" TYPE="iso9660" /dev/sda1: LABEL="_master-x86_64-2" UUID="b1fb6e26-a782-4cf7-b5de-32941cc92722" TYPE="ext4" /dev/sdb: UUID="30749ab7-58e7-437e-9a09-5f6d9619e85b" TYPE="ext4"
nutanix@favm$ mv /mnt/containers/config/common_config/cvm.config \
/mnt/containers/config/common_config/cvm_bck.config
nutanix@favm$ mv /tmp/cvm.config /mnt/containers/config/common_config/
nutanix@favm$ sudo python /opt/nutanix/analytics/bin/reset_password.py --user_type=prism \
--password='new password' --local_update
nutanix@favm$ sudo python /opt/nutanix/analytics/bin/reset_password.py --user_type=prism \
--password='new password' --prism_user=admin --prism_password='Prism admin password'
Last updated: 2022-06-14
File Analytics provides data and statistics on the operations and contents of a file server.
Once deployed, Files adds an File Analytics VM to the Files cluster. A single File Analytics VM supports all file servers in the cluster; however, you must enable File Analytics separately for each file server. Data on the File Analytics VM is protected, and is kept in a separate volume group.
Once you deploy File Analytics, a new File Analytics link appears on the file server actions bar. You can access File Analytics through this link for any file server where it is enabled.
The File Analytics web console consists of display features:
Main menu bar : The main menu bar appears at the top of every page of the File Analytics web console. The main menu bar includes the following display features:
Meet the following requirements prior to deploying File Analytics.
Ensure that you have performed the following tasks and your Files deployment meets the following specifications.
Open the required ports and ensure that your firewall allows bi-directional Internet Control Message Protocol (ICMP) traffic between the FAVM and CVMs.
The Port Reference provides detailed port information for Nutanix products and services, including port sources and destinations, service descriptions, directionality, and protocol requirements.
In addition to meeting the File Analytics network requirements, ensure to meet Nutanix Files port requirements as described in the Port Reference .
File Analytics has the following limitations.
Overview of administrative processes for File Analytics.
As an admin, you have the privileges to perform administrative tasks for File Analytics. To add a file server admin user, see Managing Roles in the Nutanix Files Guide . The topics in this chapter describe the basics for administering your File Analytics environment. For advanced administrative options, refer to the File Analytics Options chapter.
Follow this procedure to deploy the File Analytics server.
Steps for enabling File Analytics after deployment or disablement.
Follow these steps to enable File Analytics after disabling the application.
Follow the steps as indicated to disable File Analytics.
File Analytics is disabled on the server. Enable File Analytics to start collecting data again or Delete File Analytics Data.
Do the following to launch File Analytics.
To update an File Analytics VM (FAVM), refer to the sizing guidelines in the File Analytics release notes and follow the steps in the VM Management topic of the Prism Web Console Guide .
Remove a File Analytics VM (FAVM) by disabling it and deleting it from the cluster in Prism.
Follow the steps as indicated to update authentication credentials for LDAP or Active Directory.
Manage the audit data of delete shares and exports.
By default, File Analytics retains deleted share and export data. The dashboard widgets do not account for data of deleted shares and exports. The deleted marker appears adjacent to deleted shares and exports in audit trails. The Manage Share/Export Audit data window displays a list of deleted shares and exports.
Follow the directions as indicated to delete audit data for the deleted share or export.
Perform File Analytics upgrades using the Life Cycle Manager feature in Prism Element.
Before you upgrade File Analytics, ensure that you are running a compatible version of AOS and Files. Refer to File Analytics release notes for compatibility details. You can upgrade both AOS and Files through Prism Element, see AOS Upgrade in the Prism Web Console Guide .
To upgrade File Analytics, perform inventory and updates using the Life-Cycle Manager (LCM), see the Life Cycle Manager Guide for instructions on performing inventory and updates. LCM cannot upgrade File Analytics when the protection domain (PD) for the File Analytics VM (FAVM) includes any other entities.
During the upgrade process, File Analytics takes a snapshot of the volume group (VG) that contains File Analytics data. If issues occur during an upgrade, File Analytics restores the FAVM to the pre-upgrade state. If the volume group is protected and is part a protection domain, the File Analytics creates a snapshot and sets the expiry time to 30 days. If the volume group is not protected, File Analytics creates a snapshot and deletes the snapshot after completing the upgrade successfully. If any errors occur, the system keeps the snapshot for 30 days to troubleshoot the issue.
Upgrade File Analytics at a dark site using the Life-Cycle Manager (LCM).
The Dashboard tab displays data on the operational trends of a file server.
The Dashboard tab is the opening screen that appears after launching File Analytics from Prism. The dashboard displays widgets that present data on file trends, distribution, and operations.
Tile Name | Description | Intervals |
---|---|---|
Capacity Trend |
Displays capacity trends for the file server including capacity added, capacity
removed, and net changes.
Clicking an event period widget displays the Capacity Trend Details view. |
Seven days, the last 30 days, or the last 1 year. |
Data Age | Displays the percentage of data by age. | Less than 3 months, 3–6 months, 6–12 months, and > 12 months. |
Anomaly Alerts | Displays alerts for configured anomalies, see Configuring Anomaly Detection. | |
Permission Denials | Displays users who have had excessive permission denials and the number of denials. Clicking a user displays audit details, see Audit Trails - Users for more. | [user id], [number of permission denials] |
File Distribution by Size | Displays the number of files by file size. Provides trend details for top 5 files. | Less than 1 MB, 1–10 MB, 10–100 MB, 100 MB to 1 GB, greater than 1 GB). |
File Distribution by Type | Displays the space taken up by various applications and file types. The file type is determined by the file extension. See the File Types table for more details. | MB or GB |
File Distribution by Type Details view |
Displays a trend graph of the top 5 file types. File distribution details include
file type, current space used, current number of file, and change in space for the
last 7 or 30 days.
Clicking View Details displays the File Distribution by Type view. |
Daily size trend for top 5 files (GB), file type (see File Type table), current space used (GB), current number of files (numeric), change in last 7 or 30 days (GB). |
Top 5 active users | Lists the users who have accessed the most files and number of operations the user performed for the specified period. When there are more than 5 active users, the more link provides details on the top 50 users. Clicking the user name displays the audit view for the user, see Audit Trails - Users for more. | 24 hours, 7 days, 1 month, or 1 year. |
Top 5 accessed files |
Lists the 5 most frequently accessed files. Clicking
more
provides details on the top 50 files.
Clicking the file name displays the audit view details for the file, see Audit Trails - Files for more. |
Twenty-four hours, 7 days, 1 month, or 1 year. |
Files Operations |
Displays the distribution of operation types for the specified period including a
count for each operation type and the total sum of all operations.
Operations include: create, delete, read, write, rename, permission changed, set attribute, symlink, permission denied, permission denied (file blocking). Clicking an operation displays the File Operation Trend view. |
Twenty-four hours, 7 days, 1 month, or 1 year. |
Clicking an event period in the Capacity Trend widget displays the Capacity Trend Details view for that period. The view includes three tabs Share/Export , Folder , and Category . Each tab includes columns detailing entity details: Name . Net Capacity Change, Capacity Added, and Capacity Removed.
Category | Supported File Type |
---|---|
Name | Name of share/export, folder, or category. |
Net Capacity Change | The total difference between capacity at the beginning and the end of the specified period. |
Share Name (for folders only) | The name of the share or export that the folder belongs to. |
Capacity Added | Total added capacity for the specified period. |
Capacity Removed | Total removed capacity for the specified period. |
Clicking View Details for the File Distribution by Type widget displays granular details of file distribution, see the File Types table below for details.
Category | Supported File Type |
---|---|
File Type | Name of file type |
Current Space Used | Space capacity occupied by the file type |
Current Number of Files | Number of files for the file type |
Change (In Last 30 Days) | The increase in capacity over a 30 day period of time for the specified file type . |
Category | Supported File Type |
---|---|
Archives | .cab, .gz, .rar, .tar, .z, .zip |
Audio | .aiff, .au, .mp3, .mp4, .wav, .wma |
Backups | .bak, .bkf, .bkp |
CD/DVD Images | .img, .iso, .nrg |
Desktop Publishing | .qxd |
Email Archives | .pst |
Hard Drive images | .tib, .gho, .ghs |
Images | .bmp, .gif, .jpg, .jpeg, .pdf .png, .psd, .tif, .tiff, |
Installers | .msi, .rpm |
Log Files | .log |
Lotus Notes | .box, .ncf, .nsf, .ns2, .ns3, .ns4, .ntf |
MS Office Documents | .accdb, .accde, .accdt, .accdr, .doc, .docx, .docm, .dot, .dotx, .dotm, .xls, .xlsx, .xlsm, .xlt, .xltx, .xltm, .xlsb, .xlam, .ppt, .pptx, .pptm, .potx, .potm, .ppam, .ppsx, .ppsm, .mdb |
System Files | .bin, .dll, .exe |
Text Files | .csv, .pdf, .txt |
Video | .avi, mpg, .mpeg, .mov, .m4v |
Disk Image | .hlog, .nvram, .vmdk, .vmx, .vmxf, .vmtm, .vmem, .vmsn, .vmsd |
Clicking an operation type in the File Operations widget displays the File Operation Trend view. The File Operation Trend view breaks down the specified period into smaller intervals, and displays the number of occurrences of the operation during each interval.
Category | Description |
---|---|
Operation Type | A drop-down option to specify the operation type. See Files Operations in the Dashboard Widgets table for a list of operation types. |
Last (time period) | A drop-down option to specify the period for the file operation trend. |
File operation trend graph | The x-axis displays shorter intervals for the specified period. The y-axis displays the number of operations trend over the extent of the intervals. |
File Analytics uses the file category configuration to classify file extensions.
The capacity widget in the dashboard uses the category configuration to calculate capacity details.
The Health dashboard displays dynamically updated health information about each File File Analytics component.
The Health dashboard includes the following details:
Data panes in the Anomalies tab display data and trends for configured anomalies.
You can configure anomalies for the following operations:
Define anomaly rules by the specifying the following conditions:
Meeting the lower operation threshold triggers an anomaly.
Consider a scenario where you have 1 thousand files, the operation count threshold defined as 10, and the operation percentage threshold defined as 10%. The count threshold takes precedence, as 10% of 1 thousand is 100, which is greater than the count threshold of 10.
Pane Name | Description | Values |
---|---|---|
Anomaly Trend | Displays the number of anomalies per day or per month. | Last 7 days, Last 30 days, Last 1 year |
Top Users | Displays the users with the most anomalies and the number of anomalies per user. | Last 7 days, Last 30 days, Last 1 year |
Top Folders | Displays the folders with the most anomalies and the number of anomalies per folder. | Last 7 days, Last 30 days, Last 1 year |
Operation Anomaly Types | Displays the percentage of occurrences per anomaly type. | Last 7 days, Last 30 days, Last 1 year |
Clicking an anomaly bar in the Anomaly Trend graph displays the Anomaly Details view.
Column | Description |
---|---|
Anomaly Type | The configured anomaly type. Anomaly types not configured do not show up in the table. |
Total User Count | The number of users that have performed the operation causing the specified anomaly during the specified time range. |
Total Folder Count | The numbers of folders in which the anomaly occurred during the specified time range. |
Total Operation Count | Total number of anomalies for the specified anomaly type that occurred during the specified time range. |
Time Range | The time range for which the total user count, total folder count, and total operation count are specified. |
Column | Description |
---|---|
Username or Folders | Indicates the entity for the operation count. Selecting the Users tab indicates operation count for specific users, and selecting the Folders tab indicates the operation count for specific folders. |
Operation count | The total number of operations causing anomalies for the selected user or folder during the time period for the bar in the Anomaly Trend graph. |
Steps for configuring anomaly rules.
Configure an SMTP server for File Analytics to send anomaly alerts, see Configuring an SMTP Server. To create an anomaly rule, do the following.
File Analytics uses a simple mail transport protocol (SMTP) server to send anomaly alerts.
Use audit trails to look up operation data for a specific user, file, folder, or client.
The Audit Trails tab includes Files , Folders , Users , and Client IP options for specifying the audit type. Use the search bar for specifying the specific entity for the audit (user, folder, file, or client IP).
The results table presents details for entities that match the search criteria. Clicking the entity name (or client IP number) takes you to the Audit Trails dashboard for the target entity.
Audit a user, file, client, or folder.
Details for client IP Audit Trails.
When you search by user in the Audit Trails tab, search results display the following information in a table.
Clicking View Audit displays the Audit Details page, which shows the following audit information for the selected user.
The Results table provides granular details of the audit results. The following data is displayed for every event.
Click the gear icon for options to download the data as an xls, csv, or JSON file.
Dashboard details for folder audits.
The following information displays when you search by file in the Audit Trails tab.
The Audit Details page shows the following audit information for the selected folder.
The Results table provides granular details of the audit results. File Analytics displays the following data for every event.
Click the gear icon for options to download the data as a CSV file.
Dashboards details for file audit.
When you search by file in the Audit Trails tab, the following information displays:
The Audit Details page shows the following audit information for the selected file.
The Results table provides granular details of the audit results. File Analytics displays the following data for every event.
Click the gear icon for options to download the data as a CSV file.
Dashboard details for client IP Audit Trails.
When you search by client IP in the Audit Trails tab, search results display the following information in a table.
The Audit Details page shows the following audit information for the selected client.
The Results table provides granular details of the audit results. File Analytics displays the following data for every event.
Click the gear icon for an option to download the data as a CSV file.
You can get more insight into the usage and contents of files on your system by configuring and updating File Analytics features and settings. Some options include scanning the files on your file server on demand, updating data retention, and configuring data protection.
The data retention period determines how long File Analytics retains event data.
Follow the steps as indicated to configure data retention.
Once enabled, File Analytics scans the metadata of all files and shares on the system. You can perform an on-demand scan of shares in your file system.
Blacklist users, file extensions, and client IPs.
Configure File Analytics disaster recovery (DR) using Prism Element.
File Analytics only supports async disaster recovery. File Analytics does not support NearSync and metro availability.
Create an async protection domain, configure a protection domain schedule, and configure remote site mapping. The remote site must have symmetric configurations to the primary site. The remote site must also deploy File Analytics to restore a File Analytics VM (FAVM).
The Data Protection section in the Prism Web Console Guide provides more detail on the disaster recovery process.
To set up disaster recovery for File Analytics, create an async protection domain, configure a protection domain schedule, and configure remote site mapping.
By default, the File Analytics volume group resides on the same container that hosts vDisks for Nutanix Files.
Recover a File Analytics VM (FAVM) after a planned or unplanned migration to the remote site.
Perform the following tasks on the remote site.
Deploy a File Analytics VM (FAVM) after a planned or unplanned (disaster) migration to the remote site.
To perform disaster recovery, deploy and enable File Analytics on the remote site. Restore the data using a snapshot of the volume group from the primary FAVM.
nutanix@favm$ sudo blkid
nutanix@favm$ cd /mnt/containers/config/common_config/
nutnix@avm$ sudo cp cvm.config /tmp
nutanix@favm$ sudo systemctl stop monitoring
nutanix@favm$ docker stop $(docker ps -q)
nutanix@favm$ sudo systemctl stop docker
nutnix@avm$ sudo umount /mnt
nutnix@avm$ sudo /sbin/iscsiadm -m node -u
nutanix@favm$ sudo /sbin/iscsiadm -m node –o delete
nutanix@favm$ sudo blkid
/dev/sr0: UUID="2019-06-11-12-18-52-00" LABEL="cidata" TYPE="iso9660" /dev/sda1: LABEL="_master-x86_64-2" UUID="b1fb6e26-a782-4cf7-b5de-32941cc92722" TYPE="ext4"The output does not show the /dev/sdb device.
nutanix@favm$ sudo cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.1991-05.com.redhat:8ef967b5b8f
nutanix@favm$ sudo /sbin/iscsiadm --mode discovery --type sendtargets --portal data_services_IP_address:3260
Clicking
the Nutanix cluster name in Prism displays cluster details including the
data service IP address. The output displays the restored iSCSI target
from step 2.
nutanix@favm$ sudo /sbin/iscsiadm --mode node --targetname iqn_name --portal data_services_IP_address:3260,1 --login
nutanix@favm$ sudo reboot
nutanix@favm$ sudo blkid
/dev/sr0: UUID="2019-06-11-12-18-52-00" LABEL="cidata" TYPE="iso9660" /dev/sda1: LABEL="_master-x86_64-2" UUID="b1fb6e26-a782-4cf7-b5de-32941cc92722" TYPE="ext4" /dev/sdb: UUID="30749ab7-58e7-437e-9a09-5f6d9619e85b" TYPE="ext4"
nutanix@favm$ cd /mnt/containers/config/common_config/
nutanix@favm$ mv cvm.config cvm_bck.config
nutanix@favm$ cd /tmp
nutanix@favm$ mv cvm.config /mnt/containers/config/common_config/
nutanix@favm$ sudo python /opt/nutanix/analytics/bin/reset_password.py --user_type=prism \
--password='new password' --local_update
nutanix@favm$ sudo python /opt/nutanix/analytics/bin/reset_password.py --user_type=prism \
--password='new password' --prism_user=admin --prism_password='Prism admin password'
Product Release Date: 2022-04-05
Last updated: 2022-11-04
File Analytics provides data and statistics on the operations and contents of a file server.
Once deployed, Nutanix Files adds a File Analytics VM (FAVM) to the Files cluster. A single File Analytics VM supports all file servers in the cluster; however, you must enable File Analytics separately for each file server. File Analytics protects data on the FAVM, which is kept in a separate volume group.
The File Analytics web console consists of display features:
Main menu bar : The main menu bar appears at the top of every page of the File Analytics web console. The main menu bar includes the following display features:
Meet the following requirements prior to deploying File Analytics.
Ensure that you have performed the following tasks and your Files deployment meets the following specifications.
Open the required ports, and ensure that your firewall allows bi-directional Internet Control Message Protocol (ICMP) traffic between the FAVM and CVMs.
The Port Reference provides detailed port information for Nutanix products and services, including port sources and destinations, service descriptions, directionality, and protocol requirements.
In addition to meeting the File Analytics network requirements, ensure to meet Nutanix Files port requirements as described in the Port Reference .
File Analytics has the following limitations.
Overview of administrative processes for File Analytics.
As an admin, you have the required permissions for performing File Analytics administrative tasks. To add a file server admin user, see Managing Roles in the Nutanix Files Guide . The topics in this chapter describe the basics for administering your File Analytics environment. For advanced administrative options, refer to the File Analytics Options chapter.
Prism Element supports role-based access control (RBAC) that allows you to configure and provide customized access to the users based on their assigned roles.
From the Prism Element dashboard, you can assign a set of predefined built-in roles (system roles) roles to users or user groups. File Analytics support the following built-in roles (system roles) that are defined by default:
Follow this procedure to deploy the File Analytics server.
Steps for enabling File Analytics after deployment or disablement.
Follow these steps to enable File Analytics after disabling the application.
Follow the steps as indicated to disable File Analytics.
File Analytics is disabled on the server. Enable File Analytics to start collecting data again or Delete File Analytics Data.
Do the following to launch File Analytics.
To update a File Analytics VM (FAVM), refer to the sizing guidelines in the File Analytics release notes and follow the steps in the VM Management topic of the Prism Web Console Guide .
Remove a File Analytics VM (FAVM) by disabling it and deleting it from the cluster in Prism.
Follow the steps as indicated to update authentication credentials for LDAP or Active Directory.
Manage the audit data of delete shares and exports.
By default, File Analytics retains deleted share and export data. The dashboard widgets do not account for data of deleted shares and exports. The deleted marker appears next to deleted shares and exports in audit trails. The Manage Share/Export Audit data window displays a list of deleted shares and exports.
Follow the directions as indicated to delete audit data for the deleted share or export.
Steps for updating the password of a File Analytics VM (FAVM).
Context for the current task
nutanix@fsvm$ sudo passwd nutanix
Changing password for user nutanix.
Old Password:
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
The password must meet the following complexity requirements:
Perform File Analytics upgrades using the Life Cycle Manager feature in Prism Element.
Before you proceed with the FA upgrade, ensure you meet the following:
Refer to File Analytics release notes for compatibility details. You can upgrade both AOS and Files through Prism Element, see AOS Upgrade in the Prism Web Console Guide .
To upgrade File Analytics, perform inventory and updates using the Life-Cycle Manager (LCM), see the Life Cycle Manager Guide for instructions on performing inventory and updates.
During the upgrade process, File Analytics takes a snapshot of the volume group (VG) that contains File Analytics data. If issues occur during an upgrade, File Analytics restores the FAVM to the pre-upgrade state. If the volume group is protected and is part a protection domain, the File Analytics creates a snapshot and sets the expiry time to 30 days. If the volume group is not protected, File Analytics creates a snapshot and deletes the snapshot after completing the upgrade successfully. If any errors occur, the system keeps the snapshot for 30 days to troubleshoot the issue.
Upgrade File Analytics at a dark site using the Life-Cycle Manager (LCM).
The Dashboard tab displays data on the operational trends of a file server.
The Dashboard tab is the opening screen that appears after launching File Analytics for a specific file server. The dashboard displays widgets that present data on file trends, distribution, and operations.
Tile Name | Description | Intervals |
---|---|---|
Capacity trend |
Displays capacity trends for the file server including capacity added, capacity
removed, and net changes.
Clicking an event period widget displays the Capacity Trend Details view. |
7 days, the last 30 days, or the last 1 year. |
Data age | Displays the percentage of data by age. Data age determines the data heat, including: hot, warm, and cold. |
Default intervals are as follows:
|
Permission denials | Displays users who have had excessive permission denials and the number of denials. Clicking a user displays audit details, see Audit Trails - Users for more. | [user id], [number of permission denials] |
File distribution by size | Displays the number of files by file size. Provides trend details for top 5 files. | Less than 1 MB, 1–10 MB, 10–100 MB, 100 MB to 1 GB, greater than 1 GB). |
File distribution by type | Displays the space taken up by various applications and file types. The file extension determines the file type. See the File types table for more details. | MB or GB |
File distribution by type details view |
Displays a trend graph of the top 5 file types. File distribution details include
file type, current space used, current number of files, and change in space for the
last 7 or 30 days.
Clicking View Details displays the File Distribution by Type view. |
Daily size trend for top 5 files (GB), file type (see the "File Type" table), current space used (GB), current number of files (numeric), change in last 7 or 30 days (GB). |
Top 5 active users | Lists the users who have accessed the most files and number of operations the user performed for the specified period. When there are more than 5 active users, the more link provides details on the top 50 users. Clicking the user name displays the audit view for the user, see Audit Trails - Users for more. | 24 hours, 7 days, 1 month, or 1 year. |
Top 5 accessed files |
Lists the 5 most frequently accessed files. Clicking
more
provides details on the top 50 files.
Clicking the file name displays the audit view details for the file, see Audit Trails - Files for more. |
24 hours, 7 days, 1 month, or 1 year. |
Files operations |
Displays the distribution of operation types for the specified period, including
a count for each operation type and the total sum of all operations.
Operations include: create, delete, read, write, rename, permission changed, set attribute, symlink, permission denied, permission denied (file blocking). Clicking an operation displays the File Operation Trend view. |
24 hours, 7 days, 1 month, or 1 year. |
Clicking an event period in the Capacity Trend widget displays the Capacity Trend Details view for that period. The view includes three tabs Share/Export , Folder , and Category . Each tab includes columns detailing entity details: Name . Net capacity change, capacity added, and capacity removed.
Category | Supported File Type |
---|---|
Name | Name of share/export, folder, or category. |
Net capacity change | The total difference between capacity at the beginning and the end of the specified period. |
Share name (for folders only) | The name of the share or export that the folder belongs to. |
Capacity added | Total added capacity for the specified period. |
Capacity removed | Total removed capacity for the specified period. |
Clicking View Details for the File Distribution by Type widget displays granular details of file distribution, see the File Types table for details.
Category | Supported File Type |
---|---|
File type | Name of file type |
Current space used | Space capacity occupied by the file type |
Current number of files | Number of files for the file type |
Change (in last 30 days) | The increase in capacity over a 30-day period for the specified file type |
Category | Supported File Type |
---|---|
Archives | .cab, .gz, .rar, .tar, .z, .zip |
Audio | .aiff, .au, .mp3, .mp4, .wav, .wma |
Backups | .bak, .bkf, .bkp |
CD/DVD images | .img, .iso, .nrg |
Desktop publishing | .qxd |
Email archives | .pst |
Hard drive images | .tib, .gho, .ghs |
Images | .bmp, .gif, .jpg, .jpeg, .pdf .png, .psd, .tif, .tiff, |
Installers | .msi, .rpm |
Log Files | .log |
Lotus notes | .box, .ncf, .nsf, .ns2, .ns3, .ns4, .ntf |
MS Office documents | .accdb, .accde, .accdt, .accdr, .doc, .docx, .docm, .dot, .dotx, .dotm, .xls, .xlsx, .xlsm, .xlt, .xltx, .xltm, .xlsb, .xlam, .ppt, .pptx, .pptm, .potx, .potm, .ppam, .ppsx, .ppsm, .mdb |
System files | .bin, .dll, .exe |
Text files | .csv, .pdf, .txt |
Video | .avi, mpg, .mpeg, .mov, .m4v |
Disk image | .hlog, .nvram, .vmdk, .vmx, .vmxf, .vmtm, .vmem, .vmsn, .vmsd |
Clicking an operation type in the File Operations widget displays the File Operation Trend view. The File Operation Trend view breaks down the specified period into smaller intervals, and displays the number of occurrences of the operation during each interval.
Category | Description |
---|---|
Operation type | A drop-down option to specify the operation type. See Files Operations in the Dashboard Widgets table for a list of operation types. |
Last (time period) | A drop-down option to specify the period for the file operation trend. |
File operation trend graph | The x-axis displays shorter intervals for the specified period. The y-axis displays the number of operations trend over the extent of the intervals. |
The Health dashboard displays dynamically updated health information about each file server component.
The Health dashboard includes the following details:
The Data Age widget in the dashboard provides details on data heat.
Share-level data is displayed to provide details on share capacity trends. There are three levels of data heat:
You can configure the definitions for each level of data heat rather than using the default values. See Configuring Data Heat Levels.
Update the values that constitute different data heat levels.
Data panes in the Anomalies tab display data and trends for configured anomalies.
The Anomalies tab provides options for creating anomaly policies and displays dashboards for viewing anomaly trends.
You can configure anomalies for the following operations:
Define anomaly rules by the specifying the following conditions:
Meeting the lower operation threshold triggers an anomaly.
Consider a scenario where you have 1 thousand files, the operation count threshold defined as 10, and the operation percentage threshold defined as 10%. The count threshold takes precedence, as 10% of 1 thousand is 100, which is greater than the count threshold of 10.
Pane Name | Description | Values |
---|---|---|
Anomaly Trend | Displays the number of anomalies per day or per month. | Last 7 days, Last 30 days, Last 1 year |
Top Users | Displays the users with the most anomalies and the number of anomalies per user. | Last 7 days, Last 30 days, Last 1 year |
Top Folders | Displays the folders with the most anomalies and the number of anomalies per folder. | Last 7 days, Last 30 days, Last 1 year |
Operation Anomaly Types | Displays the percentage of occurrences per anomaly type. | Last 7 days, Last 30 days, Last 1 year |
Clicking an anomaly bar in the Anomaly Trend graph displays the Anomaly Details view.
Column | Description |
---|---|
Anomaly Type | The configured anomaly type. Anomaly types not configured do not show up in the table. |
Total User Count | The number of users that have performed the operation causing the specified anomaly during the specified time range. |
Total Folder Count | The numbers of folders in which the anomaly occurred during the specified time range. |
Total Operation Count | Total number of anomalies for the specified anomaly type that occurred during the specified time range. |
Time Range | The time range for which the total user count, total folder count, and total operation count are specified. |
Column | Description |
---|---|
Username or Folders | Indicates the entity for the operation count. Selecting the Users tab indicates operation count for specific users, and selecting the Folders tab indicates the operation count for specific folders. |
Operation count | The total number of operations causing anomalies for the selected user or folder during the time period for the bar in the Anomaly Trend graph. |
Steps for configuring anomaly rules.
To create an anomaly rule, do the following.
File Analytics uses a simple mail transport protocol (SMTP) server to send anomaly alerts.
Use audit trails to look up operation data for a specific user, file, folder, or client.
The Audit Trails tab includes Files , Folders , Users , and Client IP options for specifying the audit type. Use the search bar for specifying the specific entity for the audit (user, folder, file, or client IP).
The results table presents details for entities that match the search criteria. Clicking the entity name (or client IP number) takes you to the Audit Trails dashboard for the target entity.
Audit a user, file, client, or folder.
Details for client IP Audit Trails.
When you search by user in the Audit Trails tab, search results display the following information in a table.
Clicking View Audit displays the Audit Details page, which shows the following audit information for the selected user.
The Results table provides granular details of the audit results. The following data is displayed for every event.
Click the gear icon for options to download the data as an xls, csv, or JSON file.
Dashboard details for folder audits.
The following information displays when you search by file in the Audit Trails tab.
The Audit Details page shows the following audit information for the selected folder.
The Results table provides granular details of the audit results. File Analytics displays the following data for every event.
Click the gear icon for options to download the data as a CSV file.
Dashboards details for file audit.
When you search by file in the Audit Trails tab, the following information displays:
The Audit Details page shows the following audit information for the selected file.
The Results table provides granular details of the audit results. File Analytics displays the following data for every event.
Click the gear icon for options to download the data as a CSV file.
Dashboard details for client IP Audit Trails.
When you search by client IP in the Audit Trails tab, search results display the following information in a table.
The Audit Details page shows the following audit information for the selected client.
The Results table provides granular details of the audit results. File Analytics displays the following data for every event.
Click the gear icon for an option to download the data as a CSV file.
Ransomware protection for your file server.
File Analytics scans files for ransomware in real time and notifies you in the event of a ransomware attack once you configure email notifications.
Using a curated a list of over 250 signatures that frequently appear in ransomware files, the Nutanix Files file blocking mechanism identifies and blocks files with ransomware extensions from carrying out malicious operations. You can modify the list by manually adding or removing signatures.
File Analytics also monitors shares for self-service restore (SSR) policies and identifies shares that do not have SSR enabled in the ransomware dashboard. You can enable SSR through the ransomware dashboard.
The ransomware dashboard includes panes for managing ransomware protection and self-service restore (SSR).
The ransomware dashboard includes two main sections:
Enable ransomware protection on your file server.
Configure ransomware protection on file servers.
Do the following to add signature to the blocked extension list.
Enable self-service restore on shares identified by File Analytics.
File Analytics scans shares for SSR policies.
Generate a report for entities on the file server.
Create a report with custom attribute values or use one of the File Analytics pre-canned report templates. To create a custom report, specify the entity, attributes (and operators for some attributes), attribute values, column headings, and the number of columns. Pre-canned reports define most of the attributes and headings based on the entity and template that you choose.
The Reports dashboard displays a table or previously generated reports. You can rerun existing reports rather than creating a template. After creating a report, you can download it as a JSON or CSV file.
The reports dashboard includes options to create, view, and download reports.
The Reports dashboard includes options to create a report, download reports as a JSON, download reports as a CSV, rerun reports, and delete reports.
The reports table includes columns for the report name, status, last run, and actions.
Clicking Create a new report takes you to the report creation screen, which includes a Report builder and a Pre-canned Reports Templates tabs. The tabs include report options and filters for report configuration.
Both tabs include the following elements:
Entity | Attributes (filters) | Operator | Value | Column |
---|---|---|---|---|
Events | event_date |
|
(date) |
|
Event_operation | N/A |
|
||
Files | Category |
|
(date) |
|
Extensions | N/A | (type in value) | ||
Deleted | N/A | Last (number of days from 1 to 30) days | ||
creation_date |
|
(date) | ||
access_date |
|
(date) | ||
Size |
|
(number) (file size)
File size options:
|
||
Folders | Deleted | N/A | Last (number of days from 1 to 30) days |
|
creation_date |
|
(date) | ||
Users | last_event_date |
|
(date) |
|
Entity | Pre-canned report template | Columns |
---|---|---|
Events |
|
|
Files |
|
|
Users |
|
|
Create a custom report by defining the entity, attribute, filters, and columns.
Use one of the pre-canned File Analytics templates for your report.
You can get more insight into the usage and contents of files on your system by configuring and updating File Analytics features and settings. Some options include scanning the files on your file server on demand, updating data retention, and configuring data protection.
The data retention period determines how long File Analytics retains event data.
Follow the steps as indicated to configure data retention.
Once enabled, File Analytics scans the metadata of all files and shares on the system. You can perform an on-demand scan of shares in your file system.
Deny users, file extensions, and client IP addresses.
File Analytics uses the file category configuration to classify file extensions.
The capacity widget in the dashboard uses the category configuration to calculate capacity details.
Configure File Analytics disaster recovery (DR) using Prism Element.
File Analytics only supports async disaster recovery. File Analytics does not support NearSync and metro availability.
Create an async protection domain, configure a protection domain schedule, and configure remote site mapping. The remote site must have symmetric configurations to the primary site. The remote site must also deploy File Analytics to restore a File Analytics VM (FAVM).
The Data Protection section in the Prism Web Console Guide provides more detail on the disaster recovery process.
To set up disaster recovery for File Analytics, create an async protection domain, configure a protection domain schedule, and configure remote site mapping.
By default, the File Analytics volume group resides on the same container that hosts vDisks for Nutanix Files.
Recover a File Analytics VM (FAVM) after a planned or unplanned migration to the remote site.
Perform the following tasks on the remote site.
Deploy a File Analytics VM (FAVM) after a planned or unplanned (disaster) migration to the remote site.
To perform disaster recovery, deploy and enable File Analytics on the remote site. Restore the data using a snapshot of the volume group from the primary FAVM.
nutanix@favm$ sudo blkid
nutanix@favm$ cd /mnt/containers/config/common_config /tmp
nutanix@favm$ sudo systemctl stop monitoring
nutanix@favm$ docker stop $(docker ps -q)
nutanix@favm$ sudo systemctl stop docker
nutnix@avm$ sudo umount /mnt
nutanix@avm$ sudo reboot
nutanix@favm$ sudo blkid
/dev/sr0: UUID="2019-06-11-12-18-52-00" LABEL="cidata" TYPE="iso9660" /dev/sda1: LABEL="_master-x86_64-2" UUID="b1fb6e26-a782-4cf7-b5de-32941cc92722" TYPE="ext4"The FAVM discovers the attached volume group and assigns to the /dev/sdb device.
nutanix@favm$ mv /mnt/containers/config/common_config/cvm.config \
/mnt/containers/config/common_config/cvm_bck.config
nutanix@favm$ mv /tmp/cvm.config /mnt/containers/config/common_config/
nutanix@favm$ sudo python /opt/nutanix/analytics/bin/reset_password.py --user_type=prism \
--password='new password' --local_update
nutanix@favm$ sudo python /opt/nutanix/analytics/bin/reset_password.py --user_type=prism \
--password='new password' --prism_user=admin --prism_password='Prism admin password'
Deploy a File Analytics VM (FAVM) after a planned or unplanned (disaster) migration to the remote site.
To perform disaster recovery, deploy and enable File Analytics on the remote site. Restore the data using a snapshot of the volume group from the primary FAVM.
nutanix@favm$ sudo blkid
nutanix@favm$ cd /mnt/containers/config/common_config/ /tmp
nutanix@favm$ sudo systemctl stop monitoring
nutanix@favm$ docker stop $(docker ps -q)
nutanix@favm$ sudo systemctl stop docker
nutnix@avm$ sudo umount /mnt
nutnix@avm$ sudo /sbin/iscsiadm -m node -u
nutanix@favm$ sudo /sbin/iscsiadm -m node –o delete
nutanix@favm$ sudo blkid
/dev/sr0: UUID="2019-06-11-12-18-52-00" LABEL="cidata" TYPE="iso9660" /dev/sda1: LABEL="_master-x86_64-2" UUID="b1fb6e26-a782-4cf7-b5de-32941cc92722" TYPE="ext4"The output does not show the /dev/sdb device.
nutanix@favm$ sudo cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.1991-05.com.redhat:8ef967b5b8f
nutanix@favm$ sudo /sbin/iscsiadm --mode discovery --type sendtargets --portal data_services_IP_address:3260
Clicking
the Nutanix cluster name in Prism displays cluster details including the
data service IP address. The output displays the restored iSCSI target
from step 2.
nutanix@favm$ sudo /sbin/iscsiadm --mode node --targetname iqn_name --portal data_services_IP_address:3260,1 --login
nutanix@favm$ sudo reboot
nutanix@favm$ sudo blkid
/dev/sr0: UUID="2019-06-11-12-18-52-00" LABEL="cidata" TYPE="iso9660" /dev/sda1: LABEL="_master-x86_64-2" UUID="b1fb6e26-a782-4cf7-b5de-32941cc92722" TYPE="ext4" /dev/sdb: UUID="30749ab7-58e7-437e-9a09-5f6d9619e85b" TYPE="ext4"
nutanix@favm$ mv /mnt/containers/config/common_config/cvm.config \
/mnt/containers/config/common_config/cvm_bck.config
nutanix@favm$ mv /tmp/cvm.config /mnt/containers/config/common_config/
nutanix@favm$ sudo python /opt/nutanix/analytics/bin/reset_password.py --user_type=prism \
--password='new password' --local_update
nutanix@favm$ sudo python /opt/nutanix/analytics/bin/reset_password.py --user_type=prism \
--password='new password' --prism_user=admin --prism_password='Prism admin password'
Product Release Date: 2022-09-07
Last updated: 2022-11-04
File Analytics provides data and statistics on the operations and contents of a file server.
Once deployed, Nutanix Files adds a File Analytics VM (FAVM) to the Files cluster. A single File Analytics VM supports all file servers in the cluster; however, you must enable File Analytics separately for each file server. File Analytics protects data on the FAVM, which is kept in a separate volume group.
The File Analytics web console consists of display features:
Main menu bar : The main menu bar appears at the top of every page of the File Analytics web console. The main menu bar includes the following display features:
Meet the following requirements prior to deploying File Analytics.
Ensure that you have performed the following tasks and your Files deployment meets the following specifications.
Open the required ports, and ensure that your firewall allows bi-directional Internet Control Message Protocol (ICMP) traffic between the FAVM and CVMs.
The Port Reference provides detailed port information for Nutanix products and services, including port sources and destinations, service descriptions, directionality, and protocol requirements.
In addition to meeting the File Analytics network requirements, ensure to meet Nutanix Files port requirements as described in the Port Reference .
File Analytics has the following limitations.
Overview of administrative processes for File Analytics.
As an admin, you have the required permissions for performing File Analytics administrative tasks. To add a file server admin user, see Managing Roles in the Nutanix Files Guide . The topics in this chapter describe the basics for administering your File Analytics environment. For advanced administrative options, refer to the File Analytics Options chapter.
Prism Element supports role-based access control (RBAC) that allows you to configure and provide customized access to the users based on their assigned roles.
From the Prism Element dashboard, you can assign a set of predefined built-in roles (system roles) roles to users or user groups. File Analytics support the following built-in roles (system roles) that are defined by default:
Follow this procedure to deploy the File Analytics server.
Steps for enabling File Analytics after deployment or disablement.
Follow these steps to enable File Analytics after disabling the application.
Follow the steps as indicated to disable File Analytics.
File Analytics is disabled on the server. Enable File Analytics to start collecting data again or Delete File Analytics Data.
Do the following to launch File Analytics.
To update a File Analytics VM (FAVM), refer to the sizing guidelines in the File Analytics release notes and follow the steps in the VM Management topic of the Prism Web Console Guide .
Remove a File Analytics VM (FAVM) by disabling it and deleting it from the cluster in Prism.
Follow the steps as indicated to update authentication credentials for LDAP or Active Directory.
Manage the audit data of delete shares and exports.
By default, File Analytics retains deleted share and export data. The dashboard widgets do not account for data of deleted shares and exports. The deleted marker appears next to deleted shares and exports in audit trails. The Manage Share/Export Audit data window displays a list of deleted shares and exports.
Follow the directions as indicated to delete audit data for the deleted share or export.
Steps for updating the password of a File Analytics VM (FAVM).
Context for the current task
nutanix@fsvm$ sudo passwd nutanix
Changing password for user nutanix.
Old Password:
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
The password must meet the following complexity requirements:
Perform File Analytics upgrades using the Life Cycle Manager feature in Prism Element.
Before you proceed with the FA upgrade, ensure you meet the following:
Refer to File Analytics release notes for compatibility details. You can upgrade both AOS and Files through Prism Element, see AOS Upgrade in the Prism Web Console Guide .
To upgrade File Analytics, perform inventory and updates using the Life-Cycle Manager (LCM), see the Life Cycle Manager Guide for instructions on performing inventory and updates.
During the upgrade process, File Analytics takes a snapshot of the volume group (VG) that contains File Analytics data. If issues occur during an upgrade, File Analytics restores the FAVM to the pre-upgrade state. If the volume group is protected and is part a protection domain, the File Analytics creates a snapshot and sets the expiry time to 30 days. If the volume group is not protected, File Analytics creates a snapshot and deletes the snapshot after completing the upgrade successfully. If any errors occur, the system keeps the snapshot for 30 days to troubleshoot the issue.
Upgrade File Analytics at a dark site using the Life-Cycle Manager (LCM).
The Dashboard tab displays data on the operational trends of a file server.
The Dashboard tab is the opening screen that appears after launching File Analytics for a specific file server. The dashboard displays widgets that present data on file trends, distribution, and operations.
Tile Name | Description | Intervals |
---|---|---|
Capacity trend |
Displays capacity trends for the file server including capacity added, capacity
removed, and net changes.
Clicking an event period widget displays the Capacity Trend Details view. |
7 days, the last 30 days, or the last 1 year. |
Data age | Displays the percentage of data by age. Data age determines the data heat, including: hot, warm, and cold. |
Default intervals are as follows:
|
Permission denials | Displays users who have had excessive permission denials and the number of denials. Clicking a user displays audit details, see Audit Trails - Users for more. | [user id], [number of permission denials] |
File distribution by size | Displays the number of files by file size. Provides trend details for top 5 files. | Less than 1 MB, 1–10 MB, 10–100 MB, 100 MB to 1 GB, greater than 1 GB). |
File distribution by type | Displays the space taken up by various applications and file types. The file extension determines the file type. See the File types table for more details. | MB or GB |
File distribution by type details view |
Displays a trend graph of the top 5 file types. File distribution details include
file type, current space used, current number of files, and change in space for the
last 7 or 30 days.
Clicking View Details displays the File Distribution by Type view. |
Daily size trend for top 5 files (GB), file type (see the "File Type" table), current space used (GB), current number of files (numeric), change in last 7 or 30 days (GB). |
Top 5 active users | Lists the users who have accessed the most files and number of operations the user performed for the specified period. When there are more than 5 active users, the more link provides details on the top 50 users. Clicking the user name displays the audit view for the user, see Audit Trails - Users for more. | 24 hours, 7 days, 1 month, or 1 year. |
Top 5 accessed files |
Lists the 5 most frequently accessed files. Clicking
more
provides details on the top 50 files.
Clicking the file name displays the audit view details for the file, see Audit Trails - Files for more. |
24 hours, 7 days, 1 month, or 1 year. |
Files operations |
Displays the distribution of operation types for the specified period, including
a count for each operation type and the total sum of all operations.
Operations include: create, delete, read, write, rename, permission changed, set attribute, symlink, permission denied, permission denied (file blocking). Clicking an operation displays the File Operation Trend view. |
24 hours, 7 days, 1 month, or 1 year. |
Clicking an event period in the Capacity Trend widget displays the Capacity Trend Details view for that period. The view includes three tabs Share/Export , Folder , and Category . Each tab includes columns detailing entity details: Name . Net capacity change, capacity added, and capacity removed.
Category | Supported File Type |
---|---|
Name | Name of share/export, folder, or category. |
Net capacity change | The total difference between capacity at the beginning and the end of the specified period. |
Share name (for folders only) | The name of the share or export that the folder belongs to. |
Capacity added | Total added capacity for the specified period. |
Capacity removed | Total removed capacity for the specified period. |
Clicking View Details for the File Distribution by Type widget displays granular details of file distribution, see the File Types table for details.
Category | Supported File Type |
---|---|
File type | Name of file type |
Current space used | Space capacity occupied by the file type |
Current number of files | Number of files for the file type |
Change (in last 30 days) | The increase in capacity over a 30-day period for the specified file type |
Category | Supported File Type |
---|---|
Archives | .cab, .gz, .rar, .tar, .z, .zip |
Audio | .aiff, .au, .mp3, .mp4, .wav, .wma |
Backups | .bak, .bkf, .bkp |
CD/DVD images | .img, .iso, .nrg |
Desktop publishing | .qxd |
Email archives | .pst |
Hard drive images | .tib, .gho, .ghs |
Images | .bmp, .gif, .jpg, .jpeg, .pdf .png, .psd, .tif, .tiff, |
Installers | .msi, .rpm |
Log Files | .log |
Lotus notes | .box, .ncf, .nsf, .ns2, .ns3, .ns4, .ntf |
MS Office documents | .accdb, .accde, .accdt, .accdr, .doc, .docx, .docm, .dot, .dotx, .dotm, .xls, .xlsx, .xlsm, .xlt, .xltx, .xltm, .xlsb, .xlam, .ppt, .pptx, .pptm, .potx, .potm, .ppam, .ppsx, .ppsm, .mdb |
System files | .bin, .dll, .exe |
Text files | .csv, .pdf, .txt |
Video | .avi, mpg, .mpeg, .mov, .m4v |
Disk image | .hlog, .nvram, .vmdk, .vmx, .vmxf, .vmtm, .vmem, .vmsn, .vmsd |
Clicking an operation type in the File Operations widget displays the File Operation Trend view. The File Operation Trend view breaks down the specified period into smaller intervals, and displays the number of occurrences of the operation during each interval.
Category | Description |
---|---|
Operation type | A drop-down option to specify the operation type. See Files Operations in the Dashboard Widgets table for a list of operation types. |
Last (time period) | A drop-down option to specify the period for the file operation trend. |
File operation trend graph | The x-axis displays shorter intervals for the specified period. The y-axis displays the number of operations trend over the extent of the intervals. |
The Health dashboard displays dynamically updated health information about each file server component.
The Health dashboard includes the following details:
The Data Age widget in the dashboard provides details on data heat.
Share-level data is displayed to provide details on share capacity trends. There are three levels of data heat:
You can configure the definitions for each level of data heat rather than using the default values. See Configuring Data Heat Levels.
Update the values that constitute different data heat levels.
Data panes in the Anomalies tab display data and trends for configured anomalies.
The Anomalies tab provides options for creating anomaly policies and displays dashboards for viewing anomaly trends.
You can configure anomalies for the following operations:
Define anomaly rules by the specifying the following conditions:
Meeting the lower operation threshold triggers an anomaly.
Consider a scenario where you have 1 thousand files, the operation count threshold defined as 10, and the operation percentage threshold defined as 10%. The count threshold takes precedence, as 10% of 1 thousand is 100, which is greater than the count threshold of 10.
Pane Name | Description | Values |
---|---|---|
Anomaly Trend | Displays the number of anomalies per day or per month. | Last 7 days, Last 30 days, Last 1 year |
Top Users | Displays the users with the most anomalies and the number of anomalies per user. | Last 7 days, Last 30 days, Last 1 year |
Top Folders | Displays the folders with the most anomalies and the number of anomalies per folder. | Last 7 days, Last 30 days, Last 1 year |
Operation Anomaly Types | Displays the percentage of occurrences per anomaly type. | Last 7 days, Last 30 days, Last 1 year |
Clicking an anomaly bar in the Anomaly Trend graph displays the Anomaly Details view.
Column | Description |
---|---|
Anomaly Type | The configured anomaly type. Anomaly types not configured do not show up in the table. |
Total User Count | The number of users that have performed the operation causing the specified anomaly during the specified time range. |
Total Folder Count | The numbers of folders in which the anomaly occurred during the specified time range. |
Total Operation Count | Total number of anomalies for the specified anomaly type that occurred during the specified time range. |
Time Range | The time range for which the total user count, total folder count, and total operation count are specified. |
Column | Description |
---|---|
Username or Folders | Indicates the entity for the operation count. Selecting the Users tab indicates operation count for specific users, and selecting the Folders tab indicates the operation count for specific folders. |
Operation count | The total number of operations causing anomalies for the selected user or folder during the time period for the bar in the Anomaly Trend graph. |
Steps for configuring anomaly rules.
To create an anomaly rule, do the following.
File Analytics uses a simple mail transport protocol (SMTP) server to send anomaly alerts.
Use audit trails to look up operation data for a specific user, file, folder, or client.
The Audit Trails tab includes Files , Folders , Users , and Client IP options for specifying the audit type. Use the search bar for specifying the specific entity for the audit (user, folder, file, or client IP).
The results table presents details for entities that match the search criteria. Clicking the entity name (or client IP number) takes you to the Audit Trails dashboard for the target entity.
Audit a user, file, client, or folder.
Details for client IP Audit Trails.
When you search by user in the Audit Trails tab, search results display the following information in a table.
Clicking View Audit displays the Audit Details page, which shows the following audit information for the selected user.
The Results table provides granular details of the audit results. The following data is displayed for every event.
Click the gear icon for options to download the data as an xls, csv, or JSON file.
Dashboard details for folder audits.
The following information displays when you search by file in the Audit Trails tab.
The Audit Details page shows the following audit information for the selected folder.
The Results table provides granular details of the audit results. File Analytics displays the following data for every event.
Click the gear icon for options to download the data as a CSV file.
Dashboards details for file audit.
When you search by file in the Audit Trails tab, the following information displays:
The Audit Details page shows the following audit information for the selected file.
The Results table provides granular details of the audit results. File Analytics displays the following data for every event.
Click the gear icon for options to download the data as a CSV file.
Dashboard details for client IP Audit Trails.
When you search by client IP in the Audit Trails tab, search results display the following information in a table.
The Audit Details page shows the following audit information for the selected client.
The Results table provides granular details of the audit results. File Analytics displays the following data for every event.
Click the gear icon for an option to download the data as a CSV file.
Ransomware protection for your file server.
File Analytics scans files for ransomware in real time and notifies you in the event of a ransomware attack once you configure email notifications.
Using a curated a list of over 250 signatures that frequently appear in ransomware files, the Nutanix Files file blocking mechanism identifies and blocks files with ransomware extensions from carrying out malicious operations. You can modify the list by manually adding or removing signatures from in Nutanix Files, see "File Blocking" in the Nutanix Files User Guide .
File Analytics also monitors shares for self-service restore (SSR) policies and identifies shares that do not have SSR enabled in the ransomware dashboard. You can enable SSR through the ransomware dashboard.
The ransomware dashboard includes panes for managing ransomware protection and self-service restore (SSR).
The ransomware dashboard includes two main sections:
Enable ransomware protection on your file server.
Configure ransomware protection on file servers.
Do the following to add signature to the blocked extension list.
Enable self-service restore on shares identified by File Analytics.
File Analytics scans shares for SSR policies.
Generate a report for entities on the file server.
Create a report with custom attribute values or use one of the File Analytics pre-canned report templates. To create a custom report, specify the entity, attributes (and operators for some attributes), attribute values, column headings, and the number of columns. Pre-canned reports define most of the attributes and headings based on the entity and template that you choose.
The Reports dashboard displays a table or previously generated reports. You can rerun existing reports rather than creating a template. After creating a report, you can download it as a JSON or CSV file.
The reports dashboard includes options to create, view, and download reports.
The Reports dashboard includes options to create a report, download reports as a JSON, download reports as a CSV, rerun reports, and delete reports.
The reports table includes columns for the report name, status, last run, and actions.
Clicking Create a new report takes you to the report creation screen, which includes a Report builder and a Pre-canned Reports Templates tabs. The tabs include report options and filters for report configuration.
Both tabs include the following elements:
Entity | Attributes (filters) | Operator | Value | Column |
---|---|---|---|---|
Events | event_date |
|
(date) |
|
Event_operation | N/A |
|
||
Files | Category |
|
(date) |
|
Extensions | N/A | (type in value) | ||
Deleted | N/A | Last (number of days from 1 to 30) days | ||
creation_date |
|
(date) | ||
access_date |
|
(date) | ||
Size |
|
(number) (file size)
File size options:
|
||
Folders | Deleted | N/A | Last (number of days from 1 to 30) days |
|
creation_date |
|
(date) | ||
Users | last_event_date |
|
(date) |
|
Entity | Pre-canned report template | Columns |
---|---|---|
Events |
|
|
Files |
|
|
Users |
|
|
Create a custom report by defining the entity, attribute, filters, and columns.
Use one of the pre-canned File Analytics templates for your report.
You can get more insight into the usage and contents of files on your system by configuring and updating File Analytics features and settings. Some options include scanning the files on your file server on demand, updating data retention, and configuring data protection.
The data retention period determines how long File Analytics retains event data.
Follow the steps as indicated to configure data retention.
Once enabled, File Analytics scans the metadata of all files and shares on the system. You can perform an on-demand scan of shares in your file system.
Deny users, file extensions, and client IP addresses.
File Analytics uses the file category configuration to classify file extensions.
The capacity widget in the dashboard uses the category configuration to calculate capacity details.
Configure File Analytics disaster recovery (DR) using Prism Element.
File Analytics only supports async disaster recovery. File Analytics does not support NearSync and metro availability.
Create an async protection domain, configure a protection domain schedule, and configure remote site mapping. The remote site must have symmetric configurations to the primary site. The remote site must also deploy File Analytics to restore a File Analytics VM (FAVM).
The Data Protection section in the Prism Web Console Guide provides more detail on the disaster recovery process.
To set up disaster recovery for File Analytics, create an async protection domain, configure a protection domain schedule, and configure remote site mapping.
By default, the File Analytics volume group resides on the same container that hosts vDisks for Nutanix Files.
Recover a File Analytics VM (FAVM) after a planned or unplanned migration to the remote site.
Perform the following tasks on the remote site.
Deploy a File Analytics VM (FAVM) after a planned or unplanned (disaster) migration to the remote site.
To perform disaster recovery, deploy and enable File Analytics on the remote site. Restore the data using a snapshot of the volume group from the primary FAVM.
nutanix@favm$ sudo blkid
nutanix@favm$ cd /mnt/containers/config/common_config /tmp
nutanix@favm$ sudo systemctl stop monitoring
nutanix@favm$ docker stop $(docker ps -q)
nutanix@favm$ sudo systemctl stop docker
nutnix@avm$ sudo umount /mnt
nutanix@avm$ sudo reboot
nutanix@favm$ sudo blkid
/dev/sr0: UUID="2019-06-11-12-18-52-00" LABEL="cidata" TYPE="iso9660" /dev/sda1: LABEL="_master-x86_64-2" UUID="b1fb6e26-a782-4cf7-b5de-32941cc92722" TYPE="ext4"The FAVM discovers the attached volume group and assigns to the /dev/sdb device.
nutanix@favm$ mv /mnt/containers/config/common_config/cvm.config \
/mnt/containers/config/common_config/cvm_bck.config
nutanix@favm$ mv /tmp/cvm.config /mnt/containers/config/common_config/
nutanix@favm$ sudo python /opt/nutanix/analytics/bin/reset_password.py --user_type=prism \
--password='new password' --local_update
nutanix@favm$ sudo python /opt/nutanix/analytics/bin/reset_password.py --user_type=prism \
--password='new password' --prism_user=admin --prism_password='Prism admin password'
Deploy a File Analytics VM (FAVM) after a planned or unplanned (disaster) migration to the remote site.
To perform disaster recovery, deploy and enable File Analytics on the remote site. Restore the data using a snapshot of the volume group from the primary FAVM.
nutanix@favm$ sudo blkid
nutanix@favm$ cd /mnt/containers/config/common_config/ /tmp
nutanix@favm$ sudo systemctl stop monitoring
nutanix@favm$ docker stop $(docker ps -q)
nutanix@favm$ sudo systemctl stop docker
nutnix@avm$ sudo umount /mnt
nutnix@avm$ sudo /sbin/iscsiadm -m node -u
nutanix@favm$ sudo /sbin/iscsiadm -m node –o delete
nutanix@favm$ sudo blkid
/dev/sr0: UUID="2019-06-11-12-18-52-00" LABEL="cidata" TYPE="iso9660" /dev/sda1: LABEL="_master-x86_64-2" UUID="b1fb6e26-a782-4cf7-b5de-32941cc92722" TYPE="ext4"The output does not show the /dev/sdb device.
nutanix@favm$ sudo cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.1991-05.com.redhat:8ef967b5b8f
nutanix@favm$ sudo /sbin/iscsiadm --mode discovery --type sendtargets --portal data_services_IP_address:3260
Clicking
the Nutanix cluster name in Prism displays cluster details including the
data service IP address. The output displays the restored iSCSI target
from step 2.
nutanix@favm$ sudo /sbin/iscsiadm --mode node --targetname iqn_name --portal data_services_IP_address:3260,1 --login
nutanix@favm$ sudo reboot
nutanix@favm$ sudo blkid
/dev/sr0: UUID="2019-06-11-12-18-52-00" LABEL="cidata" TYPE="iso9660" /dev/sda1: LABEL="_master-x86_64-2" UUID="b1fb6e26-a782-4cf7-b5de-32941cc92722" TYPE="ext4" /dev/sdb: UUID="30749ab7-58e7-437e-9a09-5f6d9619e85b" TYPE="ext4"
nutanix@favm$ mv /mnt/containers/config/common_config/cvm.config \
/mnt/containers/config/common_config/cvm_bck.config
nutanix@favm$ mv /tmp/cvm.config /mnt/containers/config/common_config/
nutanix@favm$ sudo python /opt/nutanix/analytics/bin/reset_password.py --user_type=prism \
--password='new password' --local_update
nutanix@favm$ sudo python /opt/nutanix/analytics/bin/reset_password.py --user_type=prism \
--password='new password' --prism_user=admin --prism_password='Prism admin password'