varyonvg and varyoffvg scripts in AIX

Introduction

There will be times when you have to vary on or vary off one or more volume groups (VGs). For example, if you are upgrading IBM® AIX® from version 6.1 to version 7.1, then you need to upgrade Subsystem Device Driver Path Control Module (SDDPCM). In order to upgrade SDDPCM, you need to vary off all the applications and database volume groups that use storage area network (SAN) disks, for example IBM System Storage® SAN Volume Controller (SVC) or IBM System Storage DS8000®. The assumption here is that the root volume group (rootvg) is using an internal Small Computer System Interface (SCSI) disk. This article presents two scripts that can vary on or vary off a VG. These scripts can save you a lot of time in terms of mounting and unmounting file systems. In this article, I also discuss solving problems when unmount, varyoffvg, and varyonvg commands fail.

varyonvg command

If you are familiar with the varyonvg and varyoffvg commands, then refer to the varyonvg.ksh script section. The varyonvg command varies on a volume group and makes it available for use. You can then create file systems for your applications on it. Here is how you can use this command to vary on a VG.

=> varyonvg chrisvg

After a VG has been varied on, the hdisk shows as active when using the lspv command as follows:


=> lspv|grep chrisvg
hdisk7         00xyw6oi3b6f069b                   chrisvg         active
hdisk8         00xyz6oi3b6f05ed                   chrisvg         active
hdisk9         00xyz6oi3b6f038a                   chrisvg         active

You can use vary on a VG in read-only mode using the -r option as follows:

=> varyonvg ‑r chrisvg

When you vary on the VG with this option it will prevent the following activities:

  • Write operations to logical volumes
  • Logical Volume Manager (LVM) metadata updates
  • Stale partitions synchronization

If the majority of the disks in a VG are not available, then the varyonvg command might fail. If you have SAN problems preventing AIX from accessing the majority of the SAN disks for the VG, then this command would fail. In this case, AIX shows a list of all physical volumes with their status displayed. To vary on the VG successfully, in this case, you need to use the force option -f. This option will vary on the VG and all disks that cannot be brought to an active state will be put in a removed state. At least one disk must be available in the VG for this to work. You can display the disks state using the lsvg command as follows:


=> lsvg ‑p chrisvg 
chrisvg: 

PV_NAME                  PV STATE       TOTAL PPs
    FREE PPs         FREE DISTRIBUTION 
hdisk7               active           799             19            00..00..00..00..19
hdisk8               active                799             19            00..00..00..00..19
hdisk9               active                799             19           00..00..00..00..19
                

As you can see from the above output, the PV STATE column for all the disks are showing active. This means that all the disks are available in this VG for use.

varyoffvg command

The varyoffvg command will vary off a volume group and make it unavailable for use. Here is how you can use this command to vary off a VG.


=> varyoffvg chrisvg
                    

After a VG has been varied off, the hdisk will not show as active anymore when using the lspv command.


=> lspv|grep chrisvg 
hdisk7         00xyw6oi3b6f069b                    chrisvg         
hdisk8         00xyz6oi3b6f05ed                    chrisvg         
hdisk9         00xyz6oi3b6f038a                    chrisvg    
               

The varyoffvg command fails if you still have some file systems in the VG that have not been unmounted. If the umount command fails for some file systems then refer to the Problems with unmounting file systems section for information about dealing with such problems.

If you need more information or want to learn more about LVM, then refer to the IBM Redbooks®s AIX Logical Volume Manager, from A to Z: Introduction and Concepts.

varyonvg.ksh script

When you run the varyonvg.ksh script , a list of VGs using the lsvg command is displayed. All VGs will be displayed except the rootvg. This script will then prompt you for input. You can provide only one VG and press Enter or you can provide several VGs and press Enter, but you have to separate the VGs by a pipe symbol “|”. I use the command lsvgfs VG1 to find all the file systems in VG1 and mount them. After you run this script to vary on the volume groups, you have to start all applications and databases that use these volume groups. You may have to contact the application group and the database administrator and ask them to start their applications and their DB after you actually vary on the volume groups. Refer to the varyonvg.ksh script:

When you run the ./varyonvg.ksh script, the following output is displayed.


The following is a list of current VGs: 
myVG1
myVG2
myVG3
myVG4
myVG5  
               


You can specify one VG or a list of VGs to be varied on (separated by a pipe "|") as shown below.
            
VG1|VG2|VG3 

myVG1|myVG2|myVG3 
‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑ 
varyonvg myVG1 successful 
mount /myVG1/FS1 successful 
mount /myVG1/FS2 successful 
‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑ 
varyonvg myVG2 successful 
mount /myVG2/FS1 successful 
mount /myVG2/FS2 successful 
‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑ 
varyonvg myVG3 successful 
mount /myVG3/FS1 successful 
mount /myVG3/FS2 successful 
‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑
         

Let us take the following two lines from the above output messages.



varyonvg myVG3 successful 
mount /myVG3/FS1 successful 
              

This message for varyonvg has the word successful, which means that myVG3 was varied on successfully. For the mount message, the word successful means that the file system /myVG3/FS1 was mounted successfully. If varyonvg was unsuccessful, then the message would include UNSUCCESSFUL in all caps, instead of successful indicating that varyonvg failed. If the mount was unsuccessful, then the message would include UNSUCCESSFUL instead of successful indicating the mount failed. Note that /myVG3/FS1 is the file system name and does not normally contain the name of the VG myVG3 like it does in this case. I just made it a part of the name to make it easy to see that for each VG the script will first vary it on and then mount all of its file systems before it goes to the next VG. If the varyonvg or mount commands fail, you must investigate to determine the reason why they failed, fix the problem, and then rerun the script. This script can be rerun at any time and it will still vary on all the VGs even if they were varied on to begin with and try to mount all file systems even if some of them were mounted already. In this case, you will only get a message indicating that the VG was already varied on or the file system was already mounted.

varyoff.ksh script

When you run the varyoffvg.ksh script, it displays a list of VGs using the command lsvg -o. Only the active VGs or the varied on VGs will be displayed, except the rootvg. This script will then prompt you for input. You can provide only one VG and press enter or provide several VGs and press Enter but, you have to separate them by a pipe symbol “|”. I use the command lsvgfs VG1 to find all the file systems in VG1 and unmount them. Before you run this script to vary off volume groups, you have to stop all the applications and databases that use these volume groups. You may have to contact the application group and the database administrator and ask them to stop their applications and their DB before you can actually unmount file systems and vary off any of the volume groups. Here is a copy of this script:

Here is an output run of this script:


=> ./varyoffvg.ksh
    


The following is a list of active VGs:
myVG1
myVG2
myVG3 
myVG4
myVG5
          

    


You can specify one VG or a list of VGs to be varied off (separated by a pipe "|") as shown below.
           
VG1|VG2|VG3 

myVG1|myVG2|myVG3 
‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑ 
umount /myVG1/FS1 successful 
umount /myVG1/FS2 successful 
varyoffvg myVG1 successful 
‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑ 
umount /myVG2/FS1 successful 
umount /myVG2/FS2 successful 
varyoffvg myVG2 successful 
‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑ 
umount /myVG3/FS1 successful 
umount /myVG3/FS2 successful   
varyoffvg myVG3 successful 
‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑ 
           

Let us take the following two lines from the above output messages.


umount /myVG3/FS1 successful 
varyoffvg myVG3 successful 
        

The above message for umount has the word successful which means that the file system /myVG3/FS1 was unmounted successfully. The message for varyoffvg has the word successful, which means that the VG was varied off successfully. If the umount was unsuccessful, then the message would include UNSUCCESSFUL (in all CAPS) instead of successful, indicating that the umount failed. If the varyoffvg was unsuccessful then the message would include UNSUCCESSFUL instead of successful, indicating that varyoffvg failed. In this case you have to investigate to see why the umount failed. Refer to the Problems with unmounting file systems section on how to resolve these problems.

Problems with unmounting file systems

Let us say the umount command for the /myVG3/FS1 file system failed to unmount it. The umount command usually fails if the file system is currently being used by one or more processes. In this case, you have to determine the owner of this file system and the owner of the processes that are using this file system. After determining the owner of the processes using this file system, you can contact the owner to stop the application, database, or kill the process so you can unmount this file system. You need to run the following command to find the owner of the file system.


=> ls ‑l /myVG3/FS1
‑rwxr‑xr‑x    1 cnarcouz     staff     58594159 Jan  8 2014  /myVG3/FS1
               

The output of the command tells you that the user ID of the file system owner is cnarcouz.

You can use the following command to get the full name of the owner of this user ID:


=> grep cnarcouz /etc/passwd 
cnarcouz:!:999:222:Christopher Narcouzi:/home/cnarcouz:/bin/ksh 
                

As you can see from the output of this command, Christopher Narcouzi (which is me in this case) is the owner of the user ID cnarcouz and also the owner of the file system /myVG3/FS1.

If the user ID in this case has for example tsm (IBM Tivoli® Storage Manager, TSM) or “db2” (IBM DB2® database) in it, then you know the file system is owned by TSM or DB2. Sometimes the file system name itself has tsm or db2 in it, and this gives you an idea of the owner of this file system. You have found the owner of the file system so far but you still have to find the owner of the process that is using this file system and that is next.

In order to find the processes that are using this file system and their owners, you need to use the fuser command. Here is an example of the fuser command that is used with the /home directory and its output.


=> fuser ‑xuc /home
/home:  9834761c(daemon) 8635421c(root) 7346581c(root) 5689214(root) 3768915e(tsm)
                

There are several processes using /home as indicated above. Note that some process numbers are followed by letters such as “c” and “e”. This letter tells you how the file is being used. The following list describes the letters.

  • c – Uses the file as the current directory.
  • e – Uses the file as a program’s executable object.
  • r – Uses the file as the root directory.
  • s – Uses the file as a shared library (or other loadable object).

The above was an example of fuser but in our case, we will be using the following fuser command:


=> fuser ‑xuc /myVG3/FS1
                

This command displays all the processes using this file system along with the user ID of the owner of each process. If the user ID is tsm, for example, then you need to contact the TSM administrator to stop TSM. Sometimes the owner can be root but this can be misleading specially if you know that the file system is an application filesystem and not an OS file system such as /opt, for example. Some of the applications require root user access to start them and you usually give application sudo permission to start them. In this case, the root user will be shown as the owner from the fuser command but you know this is not really true because the file system is an application file system. In this case, you have to find out who is the real owner of the process. Is it an application process, a DB process, or a user process? Let us say, for example, one of the processes using this file system is 9834576. You need to use the following command to find the real owner of the process.


=> ps ‑ef | grep  9834576
root  9834576  3648729   0   Oct 15      ‑  6:21 /home/cnarcouz/myappl.ksh
               

This means the file system is being used by the running process initiated by /home/cnarcouz/myappl.ksh. You have to find out who owns this process now. You can run the following command to determine the user ID of the owner:


=> ls ‑l /home/cnarcouz/myappl.ksh
‑rwxr‑xr‑x    1 cnarcouz     staff     58594159 Jan  8 2014  /home/cnarcouz/myappl.ksh

From the above output you can tell /home/cnarcouz/myappl.ksh is owned by the user ID, cnarcouz.

You can then run the following command to find the owner of the user ID cnarcouz:


=> grep cnarcouz /etc/passwd 
cnarcouz:!:999:222:Christopher Narcouzi:/home/cnarcouz:/bin/ksh 
                

The above output tells you that the owner of the user ID is Christopher Narcouzi. So, you need to contact him and let him know that you are upgrading the system and ask him to stop his process or ask if it is fine to kill it.

Sometimes, the path /home/cnarcouz/myappl.ksh might have the name of the application or the database, such as TSM or DB2. In this case, you know you have to contact the TSM administrator or the DB2 administrator and ask the administrator to stop TSM or DB2. Sometimes, when you find many processes using the file system (from the output of the fuser command), you can use the following command to kill all the processes.


fuser ‑xuck /myVG3/FS1
                

The “k” option is used to kill all the processes using the /myVG3/FS1 file system. Before you decide to kill all the processes, make sure that this activity does not affect anything else. First, make sure application group authorizes you to kill the processes. After running this command, you can check to see if all the processes have been killed.

fuser ‑xuc /myVG3/FS1

You may have to use kill -9 to kill some processes but be aware of the consequences. If you use kill -9 on TSM, you could leave the TSM database in an unclean way. Many TSM backups can be terminated abruptly in this case. The TSM administrator will have to deal with this and fix it. I don’t like to use kill -9 on applications processes. I prefer application groups stop their applications in this case. Before you use kill -9 on an application process, inform the application group that this command could abruptly kill their process in the middle of updating their database leaving the database in an unclean manner and see if they can fix any problem that may arise in this case. Get them to agree to you killing the process this way.

In summary, the best you can do is always to stop all the applications and databases first so that you won’t have to run into problems where some file systems can’t be unmounted because they are in use. But, if you still run into problems you have some ideas on how to deal with them now.

Christopher Narcouzi