Confirming the SCSI-2 Reservations are Happening
If VMware is not configured as per best practice expectations (ATS Enabled) then we may see SCSI-2 Reservations in our logs. This is how you can check to see if that's happening:
-
Run this command
tgrep -c 'vol.pr_cache inserting registration' core*
on Penguin Fuse on the date directory in question for the array, this will give you the number of SCSI-2 reservations created every hour:
quelyn@i-9000a448:/logs/del-valle.k12.tx.us/dvisd-pure01-ct1/2015_10_27$ tgrep -c 'vol.pr_cache inserting registration' core*
core.log-2015102700.gz:1875
core.log-2015102701.gz:1798
core.log-2015102702.gz:1827
core.log-2015102703.gz:1817
core.log-2015102704.gz:1860
core.log-2015102705.gz:1812
core.log-2015102706.gz:1818
core.log-2015102707.gz:2577
core.log-2015102708.gz:8181
core.log-2015102709.gz:15131
core.log-2015102710.gz:21826
core.log-2015102711.gz:19140
core.log-2015102712.gz:12044
core.log-2015102713.gz:13451
core.log-2015102714.gz:22995
core.log-2015102715.gz:33136
core.log-2015102716.gz:18587
core.log-2015102717.gz:5900
core.log-2015102718.gz:7324
core.log-2015102719.gz:2541
core.log-2015102720.gz:2213
core.log-2015102721.gz:1850
core.log-2015102722.gz:1807
core.log-2015102723.gz:1851
-
Running this command without the '-c' will allow you to see which LUNs are being locked, seen below bolded. This can yield many lines of output, so you may want to do this per log file:
quelyn@i-9000a448:/logs/del-valle.k12.tx.us/dvisd-pure01-ct1/2015_10_27$ tgrep 'vol.pr_cache inserting registration' core*
core.log-2015102700.gz:Oct 26 23:18:51.394 7FB1ACCF3700 I vol.pr_cache inserting registration, seq 3097137672 vol 69674 i_t 20000025B5AA000E-spc2-0 res_type 15
core.log-2015102700.gz:Oct 26 23:18:51.431 7FB1AD5F5700 I vol.pr_cache inserting registration, seq 3097137673 vol 69673 i_t 20000025B5BB000E-spc2-0 res_type 15
core.log-2015102700.gz:Oct 26 23:18:51.457 7FB1AE378700 I vol.pr_cache inserting registration, seq 3097137674 vol 69662 i_t 20000025B5AA000E-spc2-0 res_type 15
core.log-2015102700.gz:Oct 26 23:18:51.483 7FB1AE378700 I vol.pr_cache inserting registration, seq 3097137675 vol 69661 i_t 20000025B5AA000E-spc2-0 res_type 15
core.log-2015102700.gz:Oct 26 23:18:51.503 7FB1A73FC700 I vol.pr_cache inserting registration, seq 3097137676 vol 69676 i_t 20000025B5AA000E-spc2-0 res_type 15
core.log-2015102700.gz:Oct 26 23:18:51.522 7FB1AF7FD700 I vol.pr_cache inserting registration, seq 3097137677 vol 69667 i_t 20000025B5BB000E-spc2-0 res_type 15
-
You can then run the
pslun
command to determine the volume name:
quelyn@i-9000a448:/logs/del-valle.k12.tx.us/dvisd-pure01-ct1/2015_10_27$ pslun
Volume Name pslun Name
-------------------------------- ----------
PURE-STR-LUN01 pslun69648
PURE-STR-LUN02 pslun69660
PURE-STR-LUN03 pslun69661
PURE-STR-LUN04 pslun69662
PURE-STR-LUN05 pslun69664
PURE-STR-LUN06 pslun69665
PURE-STR-LUN07 pslun69666
PURE-STR-LUN08 pslun69667
PURE-STR-LUN09 pslun69668
PUR-STR-LUN10 pslun69669
PUR-STR-LUN11 pslun69670
PURE-STR-LUN12 pslun69671
PURE-STR-LUN13 pslun69672
PURE-STR-LUN14 pslun69673
PURE-STR-LUN15 pslun69674
PURE-STR-LUN16 pslun69675
PURE-STR-LUN17 pslun69676
PURE-STR-LUN18 pslun69677
PURE-STR-LUN19 pslun69678
-
To determine which hosts are creating the SCSI-2 Reservations, we will need a VMware bundle. The customer can send this to us via FTP.
-
Once the bundle is uploaded, please prepare for analysis as per KB: Retrieving Customer Logs from the FTP
-
Run the vm script that jhop created against the bundles to check the global configuration of VAAI ATS Wiki: VMware vSphere Ovreview and Troubleshooting
/home/jhop/python/Mr_VMware.py
-
After we have confirmed that VAAI ATS is enabled globally if we are still seeing SCSI-2 reservations we will want to check each volume individually. Please proceed to the next section:
Identifying datastore ATS Configuration on VMware ESXi
Step 1:
Since we only care about datastores (not Raw Device Maps (RDMs)) on Pure Storage we will find our applicable LUNs in the "esxcfg-scsidevs -m.txt" file under the "commands" folder in a VMware Support Bundle. Below is an example of what a line from there will look like:
There are several things that we want to identify from this output; the first is the "NAA Identifier". This is important because anything starting with "naa.624a937" is a Pure Storage LUN. Once we have a Pure Storage LUN we then want to take note of the
"VMFS UUID" number (i.e.
53c80075-7ddcc5ba-7d03-0025b5000080
). The reason why we focus on this instead of the "User-Friendly Name" is because the customers can choose
any
name they want in that option. If we choose the VMFS UUID then we are guaranteed to know we are referring to a Pure Storage LUN since that is a uniquely generated ID that the vCenter Server assigns to individual LUNs.
Step 2:
Once we have this information the next step is to search for the "vmkfstools" text file that contains the File System information on this device; this will also be found in the "commands" folder you already reside in. An example of what the text file will look like is as follows:
vmkfstools_-P--v-10-vmfsvolumes
53c80075-7ddcc5ba-7d03-0025b5000080
.txt
Notice above our "VMFS UUID" is contained in the file name (in red). We can now search this file for the "Mode" it is running in. An example of what this line will look like, if configured properly, is as follows:
Mode: public ATS-only
If the datastore is
not
configured properly it will look as follows:
Mode: public
If the datastore is showing a "public" mode then we know that this datastore is misconfigured and we'll be receiving an excessive amount of SCSI-2 Reservations from the ESXi Hosts. This means that locking tasks are not being offloaded to the FlashArray.
Obviously if the customer has a lot of LUNs this process above can take a while, so it is best to script this. I have listed below a simple one liner that will do this for you if you would like to use this instead of going through each LUN one-by-one:
grep "naa.624a937" esxcfg-scsidevs_-m.txt | awk '{print $3}' > Pure-LUNs.txt;while read f;do cat vmkfs*$f.txt |grep -e "Mode:" -e "naa.624a937";echo;done < Pure-LUNs.txt
NOTE:
This command is able to be copied & pasted then used on any ESXi Host that is 5.0 and higher, as long as you are in the "commands" folder of the ESXi Host you want to verify.
Step 1:
Since we only care about datastores (not Raw Device Maps (RDMs)) on Pure Storage we will find our applicable LUNs in the "esxcfg-scsidevs -m.txt" file under the "commands" folder in a VMware Support Bundle. Below is an example of what a line from there will look like:
There are several things that we want to identify from this output; the first is the "NAA Identifier". This is important because anything starting with "naa.624a937" is a Pure Storage LUN. Once we have a Pure Storage LUN we then want to take note of the
"VMFS UUID" number (i.e.
53c80075-7ddcc5ba-7d03-0025b5000080
). The reason why we focus on this instead of the "User-Friendly Name" is because the customers can choose
any
name they want in that option. If we choose the VMFS UUID then we are guaranteed to know we are referring to a Pure Storage LUN since that is a uniquely generated ID that the vCenter Server assigns to individual LUNs.
Step 2:
Once we have this information the next step is to search for the "vmkfstools" text file that contains the File System information on this device; this will also be found in the "commands" folder you already reside in. An example of what the text file will look like is as follows:
vmkfstools_-P--v-10-vmfsvolumes
53c80075-7ddcc5ba-7d03-0025b5000080
.txt
Notice above our "VMFS UUID" is contained in the file name (in red). We can now search this file for the "Mode" it is running in. An example of what this line will look like, if configured properly, is as follows:
Mode: public ATS-only
If the datastore is
not
configured properly it will look as follows:
Mode: public
If the datastore is showing a "public" mode then we know that this datastore is misconfigured and we'll be receiving an excessive amount of SCSI-2 Reservations from the ESXi Hosts. This means that locking tasks are not being offloaded to the FlashArray.
Obviously if the customer has a lot of LUNs this process above can take a while, so it is best to script this. I have listed below a simple one liner that will do this for you if you would like to use this instead of going through each LUN one-by-one:
grep "naa.624a937" esxcfg-scsidevs_-m.txt | awk '{print $3}' > Pure-LUNs.txt;while read f;do cat vmkfs*$f.txt |grep -e "Mode:" -e "naa.624a937";echo;done < Pure-LUNs.txt
NOTE:
This command is able to be copied & pasted then used on any ESXi Host that is 5.0 and higher, as long as you are in the "commands" folder of the ESXi Host you want to verify.
Resolution
Once we have the misconfigured LUNs identified the customer can use the VMware KB listed below to resolve the issue:
Link to VMware KB:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1033665
Follow the steps outlined in the
"Changing an ATS-only volume to public".
Simply change the "0" they are setting to a "1" in the listed command they provide to turn ATS-only back on. It is important that the customer read
all
of the steps before continuing forward and reading the notes & caveats.
Alternatively, and also much less of a headache, the customer can simply create a new LUN from the FlashArray and mount it to the applicable ESXi Host(s). Once the new VMFS datastore is created they can verify that ATS is properly configured. Once confirmed the new datastore has ATS enabled can then migrate the Virtual Machines from the misconfigured datastore to the newly configured datastore. After everything has been moved from the old datastore and they have confirmed all is working well, they can simply destroy the old LUN. This is much easier to do and is typically what
should
be recommended as the first step.
If there are any questions please reach out to a fellow colleague or Support Escalations team member for assistance.