Monday, August 17, 2015

Exadata Patching


                                             PATCHING   EXADATA  





Exadata Database Machine and Exadata Storage Server Supported Versions (Doc ID 888828.1)

A. It is recommended that Exadata systems with Data Guard configured use the "Standby
First" patching approach.
B. Patching should never be interrupted due to a connection drop. It is therefore
recommended that you use VNC or the screen utility.
C. Before patching cells in a rolling manner, you must check asmdeactivationoutcome
amModestatus and make sure that cells on all disks are online and that disks can be
deactivated.





First, the Oracle Exadata Patch has 3 different components that should be patched. As we know about Oracle Exadata, the Exadata rack has a different components, like Cisco Switch, KVM, Power Distribuion Unit, etc… and we only are responsible for patching the Database Servers (usually referenced as compute nodes), Storage Servers (usually referenced as cell nodes) and the Infiniband Switches.

We can divide the patches in 3 different parts:
Storage Server Patch
Database Server Patch
Infiniband Switches Patch


Order of Patching

This is an overview in how usually you should patch all the components, you may also want to wait a few days to progress to different components, in order to be aware of which component you had patched before starting facing one bug/error.

Infinibands Switchs

Spine
Leaf
Storage Server Software

Cell nodes
Database Minimal Pack in compute nodes
Database Bundle Patch

Grid Home (if applicable)
Oracle Home
CRS MLR patches (11.2.0.1.0)

Grid Home
Oracle Home (if applicable)





Before starting, I would like to share and note here 2 documents from My Oracle Support, aka metalink. These notes must be the first place that you need to go to review before patching the Exadata environment.

Database Machine and Exadata Storage Server 11g Release 2 (11.2) Supported Versions (Doc ID. 888828.1)
- This is for the second and third generation (V2 and X2) for Oracle Exadata, using Sun hardware.

Database Machine and Exadata Storage Server 11g Release 1 (11.1) Supported Versions (Doc ID. 835032.1)
- This is for the first generation (V1) for Oracle Exadata, using HP hardware.

Oracle usually updates these documents for every patch that is released, including different information about that.




-- The Oracle Database software on Exadata is updated using standard OPatch and the Oracle Universal Installer.
-- Running Exadata with different storage server software versions is supported, but should be minimized to  rolling patching scenarios.
-- Storage server updates require access to an Unbreakable Linux Network (ULN) based repository .



Platinum covers Exadata storage software and firmware patching, but the customer must perform
database patching.


A. Dependency issues found during yum updates require rolling back to a previous release before retrying.
B. Bundle patches applied using opatch auto ( can )roll back only the database or the grid infrastructure home.
C. Failed OS patches on database servers ( cannot  ) be rolled back.
D. Failed storage cell patches are  ( not )  rolled back to the previous release automatically.
E. Database server OS updates can be rolled back using opatch auto -rollback.
F. Dependency issues found during yum updates should ( not ) be ignored using the force option.



--  Frmware level are maintained automatically
--  Cell patches are maintian across all cell components  and are individual of  database patches
--



Exadata Database Server Patching using the DB Node Update Utility (Doc ID 1553103.1)
Exadata YUM Repository Population and Linux Database Server Updating (Doc ID 1473002.1)
Exadata YUM Repository Population, One-Time Setup Configuration and YUM upgrades (Doc ID 1556257.1) - this is an older version of 1473002.1, with additional manual steps, helping to understand more details.
Patch 16432033 - Using ISO Image with a Local Repository README
Quarterly Full Stack Download Patch For Oracle Exadata (Jul 2013)
Quarterly Full Stack Download Patch For Oracle Exadata (Oct 2013)




It is mandatory to know which components/tools are running on which servers on Exadata;
Here is the list;

DCLI -> storage cell and compute nodes, execute cellcli commands on multiple storageservers
 
ASM -> compute nodes -- it is ASM instance basically
 
RDBMS -> compute nodes -- it is Databas software
 
MS-> storage cells, provides a Java interface to the CellCLI command line interface, as well as providing an interface for Enterprise Manager plugins.
 
RS -> storage cells,RS, is a set of processes responsible for managing and restarting other processes.
 
Cellcli -> storage cell, to run storage commands
 
Cellsrv ->storage cell. It receives and unpacks iDB messages transmitted over the InfiniBand interconnect and examines metadata contained in the messages
 
Diskmon -> compute node, In Exadata, the diskmon is responsible for:,Handling of storage cell failures and I/O fencing
 
,Monitoring of Exadata Server state on all storage cells in the cluster (heartbeat),Broadcasting intra database IORM (I/O Resource Manager) plans from databases to storage cells,Monitoring or the control messages from database and ASM instances to storage cells ,Communicating with other diskmons in the cluster.
 

The following strategy should be used for applying patches in Exadata:
Review the patch README file (know what you are doing)
Run Exachk utility before the patch application (check the system , know current the situation of the system)
Automate the patch application process (automate it for being fast and to minimize problems)
Apply the patch
Run Exachk utility again -- after the patch application
Verify the patch ( Does it fix the problem or does it supply the related stuff)
Check the performance of the system(are there any abnormal performance decrease)
Test the failback procedure (as you may need to fail back in Production, who knows)




###########################################################




Exadata Database Server Patch (compute nodes):

This the patch that we are used to work as an Oracle DBA. These patches are being released specifically for Exadata, so you should not upgrade one Database to a version that is not support to the Exadata, so always check in the documentation if the release that you want was already published. The software to patches already contain upgrades for the Oracle Database and Oracle Clusterware.

All the patches provided as a bundle patches are comulative, and it also may include the most recent Patch Set Update (PSU) and/or Critical Patch Update (CPU).

a) So, for example, if you are in the 11.2.0.2.0 BP 1 (Bundle Patch 1) and would like to migrate to BP 7 (Bundle Patch 7), you just need to upgrade directly, without applying the 2, 3, 4, 5 and 6.
b) As the Bundle Patches have the last PSU and CPU, its not necessary to apply PSU/CPU in the Exadata environment, because its already included in the Bundle Patches. Always check the README file in order to check which PSU/CPU is included or not in the Bundle Patch.

– Bundle Patches Bug List:
11.2.0.2 Patch Bundles for Oracle Exadata Database Machine (Doc ID. 1314319.1)
11.2.0.1 Patch Bundles for Oracle Exadata Database Machine (Doc ID. 1316026.1)



When you need to apply a bundle patch  to Oracle Homes in Exadata, you will need to use oplan utility.
oplan utility generates instructions for applying patches, as well as instructions for rollback. It generates instructions for all the nodes in the cluster. Note that, oplan does not support DataGuard .. Oplan is supported since release 11.2.0.2 . It basically eases the patching process, because without it you need to read Readme files and extract your instructions yourself..
It is used as follows;
as Oracle software owner,(Grid or RDBMS) execute oplan;
$ORACLE_HOME/oplan/oplan generateApplySteps <bundle patch location>
it will create you patch instructions in html and txt formats;
$ORACLE_HOME/cfgtoollogs/oplan/<TimeStamp>/InstallInstructions.html 
$ORACLE_HOME/cfgtoollogs/oplan/<TimeStamp>/InstallInstructions.txt
Then, choose the apply strategy according to your needs and follow the patching instructions to apply the patch to the target.
That 's it.. 
If you want to rollback the patch;
execute the following;(replacing bundle patch location)
$ORACLE_HOME/oplan/oplan generateRollbackSteps <bundle patch location>
Again,   choose the rollback strategy according to your needs and follow the patching instructions to rollback the patch from target.




#####  Exadata Patching - DB Node Update


Copy the the dbnodeupdate.sh Utility and patch patch 16432033 from szur0023pap to the new folders. The required patches are stored there.
Alternatively download from MOS and transfer to the nodes

# scp username@szur0023pap:/sbclocal/san/DBaaS/Exadata/dbnodeupdate.zip /u01/patches/system/dbnodeupdate
# scp username@szur0023pap:/sbclocal/san/DBaaS/Exadata/16784347/Infrastructure/ExadataStorageServer/11.2.3.2.1/p16432033_112321_Linux-x86-64.zip /u01/patches/system/16432033


Unpack the dbnodeupdate.sh utility
# unzip dbnodeupdate.zip


 Make sure you have the latest version of dbnodupdate.sh. If you don't, then stop here and get the latest version from MOS by following the above link!
# cd /u01/patches/system/dbnodeupdate
# ./dbnodeupdate.sh -V
dbnodeupdate.sh, version 2.17


 Do not unpack patch 16432033
# cd /u01/patches/system/16432033
# ls -l
-rw-r--r-- 1 root root 1216696525 Jun 27 08:33 p16432033_112321_Linux-x86-64.zip


Before you start patching the node, verify the status of the cluster resources as follows:
# /u01/app/11.2.0.3/grid/bin/crsctl stat res -t


Check the current image version running on the node
# imageinfo


Run the dbnodeupdate.sh script in check/verify mode, using the zip file as the "repository".
# cd /u01/patches/system/dbnodeupdate
# ./dbnodeupdate.sh -u -v -l /u01/patches/system/16432033/p16432033_112321_Linux-x86-64.zip


The recommended procedure to update a DB noe is to use the DB Node Update Utility (aka dbnodeupdate.sh), as documented in Exadata Database Server Patching using the DB Node Update Utility (Doc ID 1553103.1).
If there are no errors in the above preparartion and verification step, proceed to the actual upgrading. This will perform a backup using the dbserver_backup.sh script, followed by patching the system.

# cd /u01/patches/system/dbnodeupdate
# ./dbnodeupdate.sh -u -s -l /u01/patches/system/16432033/p16432033_112321_Linux-x86-64.zip



Now the system will reboot
Note the last statement of the previous command, just before the reboot. Once the system is back, this last step will be executed.
# cd /u01/patches/system/dbnodeupdate/
# ./dbnodeupdate.sh -c


Patching is now completed, congratulations!
You can now check the image version and the status of the cluster
# imageinfo
# /u01/app/11.2.0.3/grid/bin/crsctl stat res -t






Exadata Patching - Compute Node


Patching on Database server can be performed serially or in parallel using DCLI utility.

dbnodeupdate.sh utility is used to perform the database server patching


What dbnodeupdate.sh utility does?

·                     Stop/unlock/disable CRS for host restart
·                     Perform LVM snapshot backup of / filesystem
·                     Mount yum ISO image and configure yum
·                     Apply OS updates via yum
·                     Relink all Oracle homes for RDS protocol
·                     Lock GI home and enable CRS upon host restart

 

Patching:


Step 1: Go to the database server patch directory


#cd /19625719/Infrastructure/ExadataDBNodeUpdate/3.60


Step 2:  Download latest dbnodeupdate.sh script and replace it with new one

You can go through the MOS ID 1553103.1 for latest dbnodeupdate.sh utility.


Step 3: Execute pre-requisites check on DB Server

#./dbnodeupdate.sh -u -l /19625719/Infrastructure/11.2.3.3.1/ExadataDatabaseServer/p18876946_112331_Linux-x86-64.zip -v

If pre-requisites check fails than fix the problem first and re-execute it again.


Step 4: Start patching on DB node post completion of successful pre-requisites check

#./dbnodeupdate.sh -u -l  /19625719/Infrastructure/11.2.3.3.1/ExadataDatabaseServer/p18876946_112331_Linux-x86-64.zip


Step 5: Closely monitor the log file for any error and do the next steps based on the given instruction by dbnodeupdate.sh utility.

Log file location /var/log/cellos/dbnodeupdate.log


Step 6: After reboot execute below command to continue as instucted by dbnodeupdate.sh utility

#./dbnodeupdate.sh -c


How it works?


-----Pre steps starts
·                     Collecting configuration details
·                     Validating system details. It will check best practice and know issues
·                     Check free space in /u01
·                     Backup yum configuration files
·                     Cleaning up the yum cache
·                     Preparing update
·                     Performing yum package dependency
·                     It will give the overview of pre-requisites check before starting the upgrade including existing and upgrading to image version
-----Pre steps completes

·                     Acceptance to continue upgrade procedure

-----Start upgrade
·                     Verifying GI and DB's are shut down
·                     Un-mount and mount /boot
·                     Performing file system backup
·                     Verifying and updating yum.conf
·                     Stop OSwatcher
·                     Cleaning up the yum cache
·                     Preparing for update 
·                     Performing yum update. Node is expected to reboot when finished
·                     Finish all the post steps
·                     Reboot system automatically
·                     After reboot run "./dbnodeupdate.sh -c" to complete the upgrade
-----Finish upgrade

-----Post Steps Starts
·                     Collect system configuration details
·                     Verifying GI and DB's are shutdown
·                     Verifying firmware updates/validations
·                     If the node reboots during this execution, re-run './dbnodeupdate.sh -c' after the node restarts
·                     Start ExaWatcher
·                     Re-linking all homes
·                     Unlocking /u01/app/11.2.0.3/grid
·                     Re-linking /u01/app/11.2.0.3/grid 
·                     Re-linking /u01/app/oracle/product/11.2.0.3/dbhome_1 
·                     Executing /u01/app/11.2.0.3/grid/crs/install/rootcrs.pl -patch
·                     Starts stack
·                     Enabling stack to start at reboot
-----Finished post steps

After completing on first node we can carry on same tasks on second node if patching is executing serially.


Rollback:


There are only two steps need to perform to rollback the patch to the previous version on database server.

Step 1 Execute #./dbnodeupdate.sh -r

Step 2 #./dbnodeupdate.sh -c





###########################################################



./patchmgr -cells /opt/oracle.SupportTools/onecommand/cell_group -cleanup


HOW ROLLING PATCH WORKS:

Since version 11.2.1.3.1, Oracle have introduced the feature “Cell Rolling Apply” in order to simplify the install and of course, to reduce downtime. You must be aware about that sometimes is necessary some patches in the compute nodes BEFORE using -rolling option in the cells nodes.

Below is a brief overview in how it works:

Preparation of the cell node X
Loop
Turn all Grid Disks offline in the cell node X
Patch cell node X
Grid Disks online
End Loop



The Storage Server Patch is responsible for keeping our cell nodes always up-to-date, fixing possibles problems and this patch includes different component patches, like kernel patches, firmware, operation system, etc… for the Storage Server.
These patches are usually lunched from any Database Server (typically using the compute node 1) using dcli and ssh to remotely patch each cell node.
All the Storage Server patch came with what the Oracle calls as “Database Minimal Pack” or “Database Convenience Pack”. These packs are included in a .zip file inside the storage patch software and must be applied in all the Database Servers (compute nodes), because it includes, as in the storage patch, kernel, firmware, and operation system patches necessary.
Applying these patches in the Storage Server, will also change the Linux version for each server, but for the Database Minimal Pack, it will NOT change the linux version.


SOME USEFUL COMMANDS:

— Rolling Patch Example
./patchmgr -cells <cells group> -patch_check_prereq -rolling
./patchmgr -cells <cells group> -patch -rolling

– Non-Rolling Patch Example
./patchmgr -cells <cells group> -patch_check_prereq
./patchmgr -cells <cells group> -patch

– Information about the versions in the kernel.
– run in the database server, also the status of the patch.
imageinfo

– Logs to check
/var/log/cellos/validations.log
/var/log/cellos/vldrun*.log

– File with all the cell nodes names, used in the <cells group> tag
cat /opt/Oracle.SupportTools/onecommand/cell_group





Cell Node patching :

Exadata Storage Server Patching - Some details
Exadata Storage Server Patching

●● Exadata Storage Server patch is applied to all cell nodes.
●● Patching is launched from compute node 1 and will use dcli/ssh to remotely patch each cell node.

●● Exadata Storage Server Patch zip also contains Database Minimal Pack or Database Convenience Pack, which are applied to all compute nodes.
This patch is copied to each compute node and run locally.

●● Applying the storage software on the cell nodes will also change the Linux version and applying the database minimal pack
on the compute nodes does NOT change the Linux version
To upgrade the Linux on Compute Node follow MOS Note: 1284070.1

Non rolling patch apply is much faster as you are applying the patch on all the cell nodes simultaneously, also there
are NO risk to single disk failure. Please note, this would require full outage.

In case of rolling patch apply, database downtime is not required, but patch application time is very high. Major risk:
ASM high redundancy to reduce disk failure exposure

Grid disks offline >>> Patch Cel01 >>> Grid disks online
Grid disks offline >>> Patch Cel02 >>> Grid disks online
Grid disks offline >>> Patch Cel..n>>> Grid disks online

Rolling Patch application can be risky affair, please be appraised of the followings:
Do not use -rolling option to patchmgr for rolling update or rollback without first applying required fixes on database hosts
./patchmgr -cells cell_group -patch_check_prereq -rolling >>> Make sure this is successful and review spool carefully.
./patchmgr -cells cell_group -patch –rolling

Non-rolling Patching Command:
./patchmgr -cells cell_group -patch_check_prereq
./patchmgr -cells cell_group -patch


How to Verify Cell Node is Patched Successfully

# imageinfo

Output of this command gives some good information, including Kernel Minor Version.

Active Image Version: 11.2.2.3.1.110429.1
Active Image Status: Success

If you get anything in "Active Image Status" except success, then you need to look at validations.log and vldrun*.log.
The image status is marked as failure when there is a failure reported in one or more validations.

Check the /var/log/cellos/validations.log and /var/log/cellos/vldrun*.log files for any failures.

If a specific validation failed, then the log will indicate where to look for the additional logs for that validation.




Exadata Patching - Cell Server


Cell storage patching can be done by patchmgr utility which is used to do a patching in rolling as
well in non-rolling fashion.

Syntax: ./patchmgr -cells cell_group -patch [-rolling] [-ignore_alerts] [- smtp_from "addr" -smtp_to "addr1 addr2 addr3 ..."]

Here addr is the sending mail id which is used to send status of patching and addr1,addr2,addr3 are receiving mail id to receive the status of patching.

Step-1  First note down the current image version of cell by executing 

#imageinfo

Step-2  Go to cell patch directory where patch has been copied


/19625719/Infrastructure/11.2.3.3.1/ExadataStorageServer_InfiniBandSwitch/patch_11.2.3.3.1.140708

Step-3
  Reset the server to a known state using the following command

./patchmgr -cells cell_group -reset_force

Step-4  Clean up any previous patchmgr utility runs using the following command

./patchmgr -cells cell_group -cleanup

Step-5 Verify that the cells meet prerequisite checks using the following command



(Rolling)

./patchmgr -cells ~/cellgroup -patch_check_prereq -rolling 

or


(Non-rolling)


./patchmgr -cells ~/cellgroup -patch_check_prereq 

Here cellgroup file contains IPs of all cell server.


Step-6  Output should not have any error, If any error than resolve it first than re-execute above command

Step-7 
Patch cell server

(Rolling)


./patchmgr -cells ~/cellgroup -patch -rolling

or


(Non-rolling)


./patchmgr -cells ~/cellgroup -patch

Step-8  
Check logs if any error in patchmgr.stdout file.


How it works?


Entire patching activity done by patchmgr utility automatically.
·                     To ensure good backup exists, USB recovery media is recreated 
·                     Check cells have ssh equivalence for root user
·                     Initialize files, check space and state of cell services
·                     Copy, extract prerequisite check archive to cells 
·                     Check prerequisites on cell
·                     Copy the patch to cell
·                     Execute plug-in check for Patch Check Prereq
·                     Initiate patch on cell
·                     Reboot the cell
·                     Execute plug-in check for Patching
·                     Finalize patch
·                     Reboot the cell
·                     Check the state of patch
·                     Execute plug-in check for Post Patch
·                     Done
After completion of patching you can check the image version, it should be changed to new version

#imageinfo
#imagehistory


Rollback



Step-1 
 Disable writeback flash cache (You can refer Oracle DOC ID - 1500257.1)

Step-2  Check rollback pre-requisites

(Rolling)


./patchmgr -cells ~/cellgroup -rollback_check_prereq -rolling -ignore_alerts

or


(non-rolling) 


./patchmgr -cells ~/cellgroup -rollback_check_prereq -ignore_alerts  

Step-3  
Perform the rollback

(Rolling)
 


./patchmgr -cells ~/cellgroup -rollback -rolling -ignore_alerts  


or 


(Non-Rolling)


./patchmgr -cells ~/cellgroup -rollback -ignore_alerts 

Step-4  Clean up the cells using the -cleanup option to clean up all the temporary patch or rollback files on the cells

./patchmgr -cells ~/cellgroup -cleanup




###########################################################





Most of Exadata Machines have 2 leafs of Infiniband switches and 1 spine. The position of the 2 Leafs is usually in U20 and U24 and the Spine is located in the very bottom position, U1. Actually, there are some Exadata Machines that have the Spine switch in the Half or Quarter Rack, but usually only the Full rack should have it.
In order to patch the switches, you must apply the patch in the SPINE switch first, rebooting and waiting it to back online BEFORE proceeding to the next switches (leafs).

All the Infiniband Switchs Patch are not CUMULATIVE, its necessary to upgrade version by version, for example, if I want to migrate the version 1.0.1-1 to 1.3.3-2, I must first upgrade 1.0.1-1 to 1.1.3-2 and after 1.1.3-2 to 1.3.3-2.



Exadata Patching - Infiniband Switch


Here we have listed down bullet points to patch IB switches on Exadata Database Machine.


Syntax: #patchmgr -ibswitches [ibswitch_list_file] <-upgrade | -downgrade [-ibswitch_precheck] [-force]]

Here 


ibswitch_list_file
 contains IP of all the IB switches
Upgrade - to upgrade the switch
Downgrade - to downgrade the switch
ibswitch_precheck - to check the prerequisites

Patchmgr utility would be available in storage server patch directory.


Patching


#./patchmgr -ibswitches ibswitches -upgrade -ibswitch_precheck



How it works?
·                     Disable Subnet Manager
·                     Copy firmware to switch
·                     Check minimal firmware version to upgrade it
·                     Verify enough space in /tmp and /
·                     Verify for free memory to start upgrade
·                     Verify host details in /etc/hosts and /etc/sysconfig/network-scripts/ifcfg-eth0 /etc/sysconfig/network-scripts/ifcfg-eth1
·                     Verify for NTP server
·                     Pre-upgrade validation
·                     Start upgrade
·                     Load firmware
·                     Disable Subnet Manager
·                     Verify that /conf/configvalid is set to 1
·                     Set SMPriority to 5
·                     Reboot switch
·                     Restart Subnet Manager 
·                     Start post-update validation
·                     Confirmation: Inifiniband switch is at target patching level
·                     Verifying host details in /etc/hosts and /etc/sysconfig/network-scripts/ifcfg-eth0 /etc/sysconfig/network-scripts/ifcfg-eth1
·                     Verifying NTP Server
·                     Firmware verification on InfiniBand switch 
·                     Post-check validation on IBSwitch 
·                     Final Confirmation: Update switch to 2.1.3_4 (Firmware version)

Once it completes on one switch, it will start to upgrade next available switch and in last it will give overall status of upgrade.




###########################################################


Applying patch to Exadata Box

Hi everyone! I’m here to post about the patch apply on Exadata Machine. As best practices we will apply the QFSP (Quatterly Full Stack Patch) for Exadata Jan/2014. The patch apply is totally automatic so if the prereqs were addressed correctly, you will have no bad surprise and your Exadata environment will be patched successfully. At my job, our team applied it recently without any issue.
The patch number is 17816100 [Quarterly Full Stack Download Patch For Oracle Exadata (Jan 2014 - 11.2.3.3.0)] which has 3.6G . This patch will patch most of the Exadata Database Machine components, whic are: databases; dbnodes; storage servers; infiniband switches; and PDUs (Power Distribution Units). Our databases are already patched to version (11.2.0.3.21) and on the end of this patching, the image version for the db and cell nodes should be 11.2.3.3.0 as we are moving from image 11.2.3.2.1.
You should carefully read all the README and notes regarding this patch as there is a complete list of prereqs and things to analyze. Although the db and cell nodes will all end with the same image version, on our case, the infiniband switches upgrade was optional according to the compatibility matrix but to keep things easy, we upgraded them too. The PDUs upgrade are optional and these is the easiest one.
Now lets get hands on it and lets begin with the PDUs. Doing this upgrade will cost you no outage and it is as simple as upgrading the firmware from your home network router. Just navigate to your PDU from your browser and hit “Net Configuration”. Scroll down to “Firmware Upgrade” and select the file MKAPP_Vx.x.dl to upgrade. After the PDU firmware was upgraded it will popup for the HTML interface to be upgraded so you need to select the file HTML_Vx.x.dl. Do that on all of the PDUs and your are done with it. Peace of cake.
Now lets proceed to the cells upgrade. As we usage the rolling upgrade strategy (no outage), all of the database software must have 17854520 patch applied on them, other while, the DBs may hang or crash. The utility used to patch the cells and infiniband switches is patchmgr (which should be executed as root). Also, you can run a precheck for the upgrade from this utility, as mentioned below:
# ./patchmgr -cells cell_group -patch_check_prereq -rolling
It is recommended to higher the disk repair time from diskgroups, in order to do not drop the disks. Also, and according to Oracle docs, it is recommended to reset the cells if this is the first time that those cells image are upgraded. Do this one cell at a time and then initiate the cell upgrade. The patchmgr should be executed from the dbnode.
# ./patchmgr -cells cel01 -reset_force
# ./patchmgr -cells cel02 -reset_force
# ./patchmgr -cells cel03 -reset_force
# ./patchmgr -cells cell_group -rolling
After finishing successfully the cells upgrade, go for infiniband switches precheck upgrade and execute the patchmgr utility as listed below:
# ./patchmgr -ibswitches -upgrade -ibswitch_precheck
To continue with the ib switches upgrade just remove the precheck parameter:
# ./patchmgr -ibswitches -upgrade
When you are done with the infiniband switches and the cell nodes you should go to upgrade the database nodes. For this upgrade, you will have the dbnodeupdate.sh utility. This will upgrade dbnodes kernel and all of the dependent packages. Pay attention that if you have any other third package installed you should upgrade it manually after the upgrade. On our environment, the kernel will be upgrade to Oracle Linux 5.9 (kernel-2.6.39-400.126.1.el5uek).The dbnodeupdate.sh is fully automatic and it will disable and bring down the CRS for the node. You must use root user to run it and for best practices do it one node at a time.
To perform a precheck run it with the parameter -v on the end
# ./dbnodeupdate.sh -u -l $PATCH_17816100/Infrastructure/ExadataStorageServer/11.2.3.3.0/p17809253_112330_Linux-x86-64.zip -v
Now to start the upgrade for the dbnode, execute it without the -v parameter
# ./dbnodeupdate.sh -u -l $PATCH_17816100/Infrastructure/ExadataStorageServer/11.2.3.3.0/p17809253_112330_Linux-x86-64.zip
After the machine reboots, confirm the upgrade executing:
# ./dbnodeupdate.sh -c
Perform this steps on all the dbnodes remaining and you are done. The whole Exadata Machine is patched, run imageinfo on all dbnodes e storage servers to confirm the new image. On the ibswitches run the command version to confirm it:
# dcli -g all_group -l root imageinfo
db01:
db01: Kernel version: 2.6.39-400.126.1.el5uek #1 SMP Fri Sep 20 10:54:38 PDT 2013 x86_64
db01: Image version: 11.2.3.3.0.131014.1
db01: Image activated: 2014-03-29 10:30:56 -0300
db01: Image status: success
db01: System partition on device: /dev/mapper/VGExaDb-LVDbSys1
db01:
db02:
db02: Kernel version: 2.6.39-400.126.1.el5uek #1 SMP Fri Sep 20 10:54:38 PDT 2013 x86_64
db02: Image version: 11.2.3.3.0.131014.1
db02: Image activated: 2014-03-30 10:23:58 -0300
db02: Image status: success
db02: System partition on device: /dev/mapper/VGExaDb-LVDbSys1
db02:
cel01:
cel01: Kernel version: 2.6.39-400.126.1.el5uek #1 SMP Fri Sep 20 10:54:38 PDT 2013 x86_64
cel01: Cell version: OSS_11.2.3.3.0_LINUX.X64_131014.1
cel01: Cell rpm version: cell-11.2.3.3.0_LINUX.X64_131014.1-1
cel01:
cel01: Active image version: 11.2.3.3.0.131014.1
cel01: Active image activated: 2014-03-28 23:42:33 -0300
cel01: Active image status: success
cel01: Active system partition on device: /dev/md6
cel01: Active software partition on device: /dev/md8
cel01:
cel01: In partition rollback: Impossible
cel01:
cel01: Cell boot usb partition: /dev/sdm1
cel01: Cell boot usb version: 11.2.3.3.0.131014.1
cel01:
cel01: Inactive image version: 11.2.3.1.0.120304
cel01: Inactive image activated: 2012-05-21 18:00:09 -0300
cel01: Inactive image status: success
cel01: Inactive system partition on device: /dev/md5
cel01: Inactive software partition on device: /dev/md7
cel01:
cel01: Boot area has rollback archive for the version: 11.2.3.1.0.120304
cel01: Rollback to the inactive partitions: Possible
cel02:
cel02: Kernel version: 2.6.39-400.126.1.el5uek #1 SMP Fri Sep 20 10:54:38 PDT 2013 x86_64
cel02: Cell version: OSS_11.2.3.3.0_LINUX.X64_131014.1
cel02: Cell rpm version: cell-11.2.3.3.0_LINUX.X64_131014.1-1
cel02:
cel02: Active image version: 11.2.3.3.0.131014.1
cel02: Active image activated: 2014-03-29 00:46:13 -0300
cel02: Active image status: success
cel02: Active system partition on device: /dev/md6
cel02: Active software partition on device: /dev/md8
cel02:
cel02: In partition rollback: Impossible
cel02:
cel02: Cell boot usb partition: /dev/sdm1
cel02: Cell boot usb version: 11.2.3.3.0.131014.1
cel02:
cel02: Inactive image version: 11.2.3.1.0.120304
cel02: Inactive image activated: 2012-05-21 18:01:07 -0300
cel02: Inactive image status: success
cel02: Inactive system partition on device: /dev/md5
cel02: Inactive software partition on device: /dev/md7
cel02:
cel02: Boot area has rollback archive for the version: 11.2.3.1.0.120304
cel02: Rollback to the inactive partitions: Possible
cel03:
cel03: Kernel version: 2.6.39-400.126.1.el5uek #1 SMP Fri Sep 20 10:54:38 PDT 2013 x86_64
cel03: Cell version: OSS_11.2.3.3.0_LINUX.X64_131014.1
cel03: Cell rpm version: cell-11.2.3.3.0_LINUX.X64_131014.1-1
cel03:
cel03: Active image version: 11.2.3.3.0.131014.1
cel03: Active image activated: 2014-03-29 01:51:22 -0300
cel03: Active image status: success
cel03: Active system partition on device: /dev/md6
cel03: Active software partition on device: /dev/md8
cel03:
cel03: In partition rollback: Impossible
cel03:
cel03: Cell boot usb partition: /dev/sdm1
cel03: Cell boot usb version: 11.2.3.3.0.131014.1
cel03:
cel03: Inactive image version: 11.2.3.1.0.120304
cel03: Inactive image activated: 2012-05-21 18:01:28 -0300
cel03: Inactive image status: success
cel03: Inactive system partition on device: /dev/md5
cel03: Inactive software partition on device: /dev/md7
cel03:
cel03: Boot area has rollback archive for the version: 11.2.3.1.0.120304
cel03: Rollback to the inactive partitions: Possible
sw-ib2 # version
SUN DCS 36p version: 2.1.3-4
Build time: Aug 28 2013 16:25:57
SP board info:
Manufacturing Date: 2011.05.08
Serial Number: “NCD6I0106″
Hardware Revision: 0×0006
Firmware Revision: 0×0000
BIOS version: SUN0R100
BIOS date: 06/22/2010
sw-ib3 # version
SUN DCS 36p version: 2.1.3-4
Build time: Aug 28 2013 16:25:57
SP board info:
Manufacturing Date: 2011.05.11
Serial Number: “NCD6Q0110″
Hardware Revision: 0×0006
Firmware Revision: 0×0000
BIOS version: SUN0R100
BIOS date: 06/22/2010
Docs:
• Exadata 11.2.3.3.0 release and patch (16278923) (Doc ID 1487339.1)
• Exadata Database Server Patching using the DB Node Update Utility (Doc ID 1553103.1)
• Exadata Patching Overview and Patch Testing Guidelines (Doc ID 1262380.1)
• Exadata Database Machine and Exadata Storage Server Supported Versions (Doc ID 888828.1)






###########################################################


Upgrading Exadata to 11.2.0.3 and Applying BP14


Upgrading Exadata to 11.2.0.3 and Applying Bundle Patch 14

In this blog, I'll walk you through the abbreviated steps to apply Bundle Patch 14 on our Exadata X2-2 Quarter Rack.  I say "abbreviated" because I'm simply going to bullet all the steps - this is no substitute for reading the various README files.

For BP14, I'm going to apply all patches in a rolling upgrade fashion.  The nodes in our Exadata are:

- cm01dbm01 (Compute node 1)
- cm01dbm02 (Compute node 2)
- cm01cel01 (Cell 1)
- cm01cel02 (Cell 2)
- cm01cel03 (Cell 3)

Preparation

1) Downloaded p13551280_112030_Linux-x86-64.zip from MOS

2) Transferred p13551280_112030_Linux-x86-64.zip to our first compute node, cm01dbm01

3) Unzipped p13551280_112030_Linux-x86-64.zip

4) Read the various README.txt files

5) Login to each storage server, compute node, Infiniband switch, and validate the current versions using "imageinfo"

Patch Contents

Bundle Patch 14 (13551280) contains the latest software versions for the entire Exadata techology stack.  The patch contents are split into 3 sections:

* Infrastructure
- Includes patches for Exadata Storage Server nodes, version 11.2.2.4.2
- InfiniBand switches, version 1.3.3-2
- PDUs, firmware version 1.04
* Database
- Oracle RDBMS 11.2.0.3
- Grid Infrastructure , 11.2.0.3
- OPatch 11.2.0.1.9
- OPlan 11.2.0.2.7
* Systems Management
- EM Agent, 11.1.0.1.0
- EM Plugins for InfiniBand Switches, Cisco switches, PDUs, KVMs, ILOMs
- OMS patches for any/all OMS homes monitoring Exadata targets (11.1.0.1.0)

Patching Storage Servers

1) Transfer 13551280/Infrastructure/ExadataStorageServer/11.2.2.4.2 contents to storage cell cm01cel01:/tmp, our first cell and unzip the zip file

2) Read MOS note 1388400.1 and do the following:
- "# cellcli -e list griddisk where diskType=FlashDisk".  Make sure we don't have any Flash Grid disks, which we didn't.
- "# cellcli -e list physicaldisk attributes name, status, slotNumber".  Make sure no duplicate disks exists with the same slot number.  In our case, we didn't have any.
- "# cellcli -e list physicaldisk".  Make sure they're all normal.
- "# grep -in 'Failed to parse the command' $CELLTRACE/ms-odl.trc*".  Make sure we don't have any flash disk population errors.  We didn't.
- Since our current cell version image is > 11.2.2.2.x, we skipped steps 3a and 3b.
- Transfer validatePhysicalDisks from MOS note to /tmp and run it. It should look like this:

[root@cm01cel01 patch_11.2.2.4.2.111221]# /tmp/validatePhysicalDisks 
[SUCCESS] CellCLI output and MegaCLI output are consistent.
[root@cm01cel01 patch_11.2.2.4.2.111221]# 

- Ensure database tier hosts are > 11.2.0.1 to support rolling upgrades.  In our case, they are.
3) Validate that all physical disks have valid physicalInsertTime:

[root@cm01cel01 patch_11.2.2.4.2.111221]#  cellcli -e 'list physicaldisk attributes luns where physicalInsertTime = null'
[root@cm01cel01 patch_11.2.2.4.2.111221]# 

4) Verify that no duplicate slotNumbers exist.  This was done per MOS note 1388400.1, step 2

5) Obtain LO and serial console access for cell
- Login to cm01cel01-ilom as root
- Type "start /SP/console"
- Login to console as root

6) Check version of ofa by doing "rpm -qa|grep ofa".  Ours was higher than the minimum version, so we're OK

7) Since we're doing in rolling fashion, ensure the Grid Disk are all offline:

[root@cm01cel01 ~]#  cellcli -e "LIST GRIDDISK ATTRIBUTES name WHERE asmdeactivationoutcome != 'Yes'"
[root@cm01cel01 ~]# 
[root@cm01cel01 ~]# cellcli -e "ALTER GRIDDISK ALL INACTIVE"
GridDisk DATA_CD_00_cm01cel01 successfully altered
GridDisk DATA_CD_01_cm01cel01 successfully altered
GridDisk DATA_CD_02_cm01cel01 successfully altered
GridDisk DATA_CD_03_cm01cel01 successfully altered
GridDisk DATA_CD_04_cm01cel01 successfully altered
GridDisk DATA_CD_05_cm01cel01 successfully altered
GridDisk DATA_CD_06_cm01cel01 successfully altered
GridDisk DATA_CD_07_cm01cel01 successfully altered
GridDisk DATA_CD_08_cm01cel01 successfully altered
GridDisk DATA_CD_09_cm01cel01 successfully altered
GridDisk DATA_CD_10_cm01cel01 successfully altered
GridDisk DATA_CD_11_cm01cel01 successfully altered
GridDisk DBFS_DG_CD_02_cm01cel01 successfully altered
GridDisk DBFS_DG_CD_03_cm01cel01 successfully altered
GridDisk DBFS_DG_CD_04_cm01cel01 successfully altered
GridDisk DBFS_DG_CD_05_cm01cel01 successfully altered
GridDisk DBFS_DG_CD_06_cm01cel01 successfully altered
GridDisk DBFS_DG_CD_07_cm01cel01 successfully altered
GridDisk DBFS_DG_CD_08_cm01cel01 successfully altered
GridDisk DBFS_DG_CD_09_cm01cel01 successfully altered
GridDisk DBFS_DG_CD_10_cm01cel01 successfully altered
GridDisk DBFS_DG_CD_11_cm01cel01 successfully altered
GridDisk RECO_CD_00_cm01cel01 successfully altered
GridDisk RECO_CD_01_cm01cel01 successfully altered
GridDisk RECO_CD_02_cm01cel01 successfully altered
GridDisk RECO_CD_03_cm01cel01 successfully altered
GridDisk RECO_CD_04_cm01cel01 successfully altered
GridDisk RECO_CD_05_cm01cel01 successfully altered
GridDisk RECO_CD_06_cm01cel01 successfully altered
GridDisk RECO_CD_07_cm01cel01 successfully altered
GridDisk RECO_CD_08_cm01cel01 successfully altered
GridDisk RECO_CD_09_cm01cel01 successfully altered
GridDisk RECO_CD_10_cm01cel01 successfully altered
GridDisk RECO_CD_11_cm01cel01 successfully altered
[root@cm01cel01 ~]# 
[root@cm01cel01 ~]# cellcli -e "LIST GRIDDISK WHERE STATUS != 'inactive'"
[root@cm01cel01 ~]# 

8) Shutdown cell services

[root@cm01cel01 ~]# sync
[root@cm01cel01 ~]# sync
[root@cm01cel01 ~]# shutdown -F -r now

Broadcast message from root (ttyS0) (Sat Feb 11 19:57:42 2012):

The system is going down for reboot NOW!
audit(1329008264.759:2153236): audit_pid=0 old=7383 by auid=4294967295
type=1305 audit(1329008264.850:2153237): auid=4294967295 op=remove rule key="time-change" list=4 res=1

9) Since we're doing this in rolling fashion, activate all disks and check grid disk attributes.  I'm wondering if steps 7 and 8 were actually required, but I believe they were to ensure we had a healthy disk status:

[root@cm01cel01 ~]# cellcli -e 'list griddisk attributes name,asmmodestatus'
DATA_CD_00_cm01cel01     OFFLINE
DATA_CD_01_cm01cel01     OFFLINE
DATA_CD_02_cm01cel01     OFFLINE
(wait)
[root@cm01cel01 ~]# cellcli -e 'list griddisk attributes name,asmmodestatus' \
> |grep -v ONLINE
[root@cm01cel01 ~]# 

10) Ensure network configuration is consistent with cell.conf by running "/opt/oracle.cellos/ipconf -verify"

[root@cm01cel01 ~]# /opt/oracle.cellos/ipconf -verify
Verifying of Exadata configuration file /opt/oracle.cellos/cell.conf
Done. Configuration file /opt/oracle.cellos/cell.conf passed all verification checks
[root@cm01cel01 ~]# 

11) Prep for patchmgr - ensure that root has user equivalence by running dcli commands below:

[root@cm01cel01 ~]# dcli -g cell_group -l root 'hostname -i'
cm01cel01: 172.16.1.12
cm01cel02: 172.16.1.13
cm01cel03: 172.16.1.14
[root@cm01cel01 ~]# 

12) Check pre-requisites by running "./patchmgr -cells ~/cell_group -patch_check_prereq -rolling" from patch stage location:

[root@cm01cel01 patch_11.2.2.4.2.111221]# ./patchmgr -cells ~/cell_group \
> -patch_check_prereq -rolling

[NOTICE] You will need to patch this cell by starting patchmgr from some other cell or database host.
20:10-11-Feb:2012        :Working: DO: Check cells have ssh equivalence for root user. Up to 10 seconds per cell ...
20:10-11-Feb:2012        :SUCCESS: DONE: Check cells have ssh equivalence for root user.
20:10-11-Feb:2012        :Working: DO: Check space and state of Cell services on target cells. Up to 1 minute ...
20:10-11-Feb:2012        :SUCCESS: DONE: Check space and state of Cell services on target cells.
20:10-11-Feb:2012        :Working: DO: Copy and extract the prerequisite archive to all cells. Up to 1 minute ...
20:10-11-Feb:2012        :SUCCESS: DONE: Copy and extract the prerequisite archive to all cells.
20:10-11-Feb:2012        :Working: DO: Check prerequisites on all cells. Up to 2 minutes ...
20:11-11-Feb:2012        :SUCCESS: DONE: Check prerequisites on all cells.

[root@cm01cel01 patch_11.2.2.4.2.111221]# 

13) Check ASM disk group repair time.  I'm leaving mine at 3.6 hours:

 1  select dg.name,a.value from v$asm_diskgroup dg, v$asm_attribute a
  2* where dg.group_number=a.group_number and a.name='disk_repair_time'
SQL> /

NAME
------------------------------
VALUE
--------------------------------------------------------------------------------
DATA_CM01
3.6h

DBFS_DG
3.6h

RECO_CM01
3.6h


14) Make sure you're not using the LO or serial console - stay logged into the LO console to monitor things in case things go wrong:

[root@cm01cel01 ~]#  echo $consoletype
pty
[root@cm01cel01 ~]# 

15) Apply the patch in rolling fashion - note that this will patch cm01cel02 and cm01cel03, since I'm launching it from cm01cel01.  After it's done, we'll have to patch cm01cel01.  I should have launched this from a compute node, for some reason I always forget =)

[root@cm01cel01 patch_11.2.2.4.2.111221]# ./patchmgr -cells ~/cell_group -patch -rolling
NOTE Cells will reboot during the patch or rollback process.
NOTE For non-rolling patch or rollback, ensure all ASM instances using
<< outut truncated >>

16) Validate cm01cel02 and cm01cel03 using "imageinfo":

[root@cm01cel02 ~]# imageinfo

Kernel version: 2.6.18-238.12.2.0.2.el5 #1 SMP Tue Jun 28 05:21:19 EDT 2011 x86_64
Cell version: OSS_11.2.2.4.2_LINUX.X64_111221
Cell rpm version: cell-11.2.2.4.2_LINUX.X64_111221-1

Active image version: 11.2.2.4.2.111221
Active image activated: 2012-02-11 20:58:06 -0500
Active image status: success
Active system partition on device: /dev/md6
Active software partition on device: /dev/md8

17) Validate Grid disks are active and in the correct state, on both cm01cel02 and cm01cel03, using "cellcli -e 'list griddisk attributes name,status,asmmodestatus'"

18) Check /var/log/cellos/validations.log on both cm01cel02 and cm01cel03

19) From cm01dbm01 (first compute node), un-staged the Infrastructure patch in /tmp/patch_11.2.2.4.2.111221

20) Create cell_group file containing only "cm01cel01"

21) Check user-equivalence by doing below:

[root@cm01dbm01 patch_11.2.2.4.2.111221]#  dcli -g cell_group -l root 'hostname -i'
cm01cel01: 172.16.1.12
[root@cm01dbm01 patch_11.2.2.4.2.111221]# p

22) Run "./patchmgr -cells cell_group -patch_check_prereq -rolling"

23) Patch cm01cel01 by doing "./patchmgr -cells cell_group -patch -rolling":

[root@cm01dbm01 patch_11.2.2.4.2.111221]# ./patchmgr -cells cell_group -patch -rolling
NOTE Cells will reboot during the patch or rollback process.
NOTE For non-rolling patch or rollback, ensure all ASM instances using

24) Check Grid Disk and ASM status on cm01cel01 using "cellcli -e 'list griddisk attributes name,status,asmmodestatus'"

25) Check imageinfo and /var/log/cellos/validations.log on cm01cel01

26) Cleanup using "./patchmgr -cells cell_group -cleanup" (from cm01dbm01 - it will cleanup on all 3 cells)

27) Login to cm01cel01 to check InfiniBand.  As a side-note, we should have patched IBs first according to something very far down in the README, but luckily our IB versions are in good shape:

[root@cm01cel01 oracle.SupportTools]# ./CheckSWProfile.sh -I cm01sw-ib2,cm01sw-ib3
Checking if switch cm01sw-ib2 is pingable...
Checking if switch cm01sw-ib3 is pingable...
Use the default password for all switches? (y/n) [n]: n
Use same password for all switches? (y/n) [n]: y
Enter admin or root password for All_Switches:
Confirm password:
[INFO] SUCCESS Switch cm01sw-ib2 has correct software and firmware version:
           SWVer: 1.3.3-2
[INFO] SUCCESS Switch cm01sw-ib2 has correct opensm configuration:
           controlled_handover=TRUE polling_retry_number=5 routing_engine=ftree sminfo_polling_timeout=1000 sm_priority=5 

[INFO] SUCCESS Switch cm01sw-ib3 has correct software and firmware version:
           SWVer: 1.3.3-2
[INFO] SUCCESS Switch cm01sw-ib3 has correct opensm configuration:
           controlled_handover=TRUE polling_retry_number=5 routing_engine=ftree sminfo_polling_timeout=1000 sm_priority=5 

[INFO] SUCCESS All switches have correct software and firmware version:
           SWVer: 1.3.3-2
[INFO] SUCCESS All switches have correct opensm configuration:
           controlled_handover=TRUE polling_retry_number=5 routing_engine=ftree sminfo_polling_timeout=1000 sm_priority=5 for non spine and 8 for spine switch5 
[root@cm01cel01 oracle.SupportTools]# 

28) Apply the minimal pack to database tier hosts (Section 6 of the README.txt).  Start by starting an LO console by SSH-ing into cm01dbm01-ilom and doing "start /SP/console"

28) Check imagehistory by running "# imagehistory".  We're in good shape, since we recently applied BP 13

29) Stop dbconsole for each database on cm01dbm01 (and cm01dbm02)

30) Stop cluster using /u01/app/11.2.0/grid/bin/crsctl stop cluster -f -all

31) Stop OSW by running "/opt/oracle.oswatcher/osw/stopOSW.sh"

32) Set memory settings in /etc/security/limits.conf.  On this step, since we'd setup hugepages, the previous values calculated by "let -i x=($((`cat /proc/meminfo | grep 'MemTotal:' | awk '{print $2}'` * 3 / 4))); echo $x" are commented out, and this is OK.

33) SCP db_patch_11.2.2.4.2.111221.zip from /tmp/patch_11.2.2.4.2.111221 to /tmp on cm01dbm01 and cm01dbm02.  When we apply these patches, we'll be applying from an SSH session on each Database tier host

34) Unzip /tmp/db_patch_11.2.2.4.2.111221.zip and go to /tmp/db_patch_11.2.2.4.2.111221 directory

35) Run "./install.sh -force" on cm01dbm02.  This will take a little while ...

36) While this is running, repeat above on cm01dbm01

37) On cm01dbm02 (first node patched), check imageinfo.  It should look like below:

[root@cm01dbm02 ~]# /usr/local/bin/imageinfo

Kernel version: 2.6.18-238.12.2.0.2.el5 #1 SMP Tue Jun 28 05:21:19 EDT 2011 x86_64
Image version: 11.2.2.4.2.111221
Image activated: 2012-02-11 23:26:55 -0500
Image status: success
System partition on device: /dev/mapper/VGExaDb-LVDbSys1

[root@cm01dbm02 ~]# 

38) Verify the ofa rpm by running "rpm -qa | grep ofa", comparing against kernel version.  It should look like this:

[root@cm01dbm02 ~]# rpm -qa | grep ofa
ofa-2.6.18-238.12.2.0.2.el5-1.5.1-4.0.53
[root@cm01dbm02 ~]# uname -a
Linux cm01dbm02.centroid.com 2.6.18-238.12.2.0.2.el5 #1 SMP Tue Jun 28 05:21:19 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
[root@cm01dbm02 ~]# 


39) Verify the controller cache is on using "/opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -a0".  You should see this:

Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU

40) Run "/opt/MegaRAID/MegaCli/MegaCli64 -AdpAllInfo -aAll | grep 'FW Package Build'" and ensure is says "FW Package Build: 12.12.0-0048"

41) Reboot the server (cm01dbm02) after running "crsctl stop crs"

42) Repeat steps 37-41 on cm01dbm01 when it's back up



Patching InfiniBand Switches

1) Login to cm01sw-ib2 as root

2) Check the version of the software - on our switches, we're already at  1.3.3.2 so we didn't actually need to do anything:

[root@cm01sw-ib2 ~]# version
SUN DCS 36p version: 1.3.3-2
Build time: Apr  4 2011 11:15:19
SP board info:
Manufacturing Date: 2010.08.21
Serial Number: "NCD4V1753"
Hardware Revision: 0x0005
Firmware Revision: 0x0000
BIOS version: SUN0R100
BIOS date: 06/22/2010
[root@cm01sw-ib2 ~]# 

3) Validate on cm01sw-ib3

Patching PDUs

1) Go to 13551280/Infrastructure/SunRackIIPDUMeteringUnitFirmware/1.04 and unzip the zip file

2) Transfer the *DL files to laptop

3) Login to PDUA (http://cm01-pdua.centroid.com/, in our case)

4) Click on Network Configuration and login as dmin

5) Go down to Firmware Upgrade and Choose MKAPP_V1.0.4.DL and click Submit

6) When done, update the HTML DL file.  It seems to "hang" for a very long time, but eventually ...

7) Repeat on cm01-pdub.centroid.com

Upgrade GI Home and RDBMS Home from 11.2.0.2 to 11.2.0.3

Prior to patching the latest BP14 updates to 11.2.0.3, we need to get our GI and RDBMS Homes updated to 11.2.0.3 by following MOS note 1373255.1.  There are a couple of sections of steps required:

- Prepare environments
- Install and Upgrade GI to 11.2.0.3
- Install 11.2.0.3 database software
- Upgrade databases to 11.2.0.3.  In our case, this includes dwprd, dwprod, and visx cluster database
- Do some post upgrade steps

1) Download 11.2.0.3 from https://updates.oracle.com/ARULink/PatchDetails/process_form?patch_num=10404530 and transfer to /u01/stg on cm01dbm01.

2) Since we're already at BP13, we don't need to apply 12539000 

3) Run Exachk to validate that the cluster is ready to patch.  MOS Document 1070954.1 contains details.  Download exachk_213_bundle.zip, unzip it, and run:

[oracle@cm01dbm01 ~]$ cd /u01/stg/exachk/
[oracle@cm01dbm01 exachk]$ ls
collections.dat  exachk_213_bundle.zip        exachk_dbm_121311_115203-public.html  ExachkUserGuide.pdf  readme.txt  UserGuide.txt
exachk  ExachkBestPracticeChecks.xls  Exachk_Tool_How_To.pdf      exachk.zip   rules.dat
[oracle@cm01dbm01 exachk]$ ./exachk

Our Exachk run showed a couple of issues, and we fixed the following:

- Set processes initialization parameter to 200 for both ASM instances
- Set cluster_interconnects to appropriate interface for dwprd and dwprod1
- Cleaned up audit dest files and trace/trm file for ASM instances, both nodes
- Set filesystemio_options=setall on all instances
- When done, bounced cluster using "crsctl stop cluster -f -all", followed by "crsctl start cluster -all"

4) Validate readiness of CRS by running cluvfy. Go to <stage>/grid, login as grid, and run this:

[grid@cm01dbm01 grid]$ ./runcluvfy.sh stage -pre crsinst -upgrade \
> -src_crshome /u01/app/11.2.0/grid \
> -dest_crshome /u01/app/11.2.0.3/grid \
> -dest_version 11.2.0.3.0 \
> -n cm01dbm01,cm01dbm02 \
> -rolling \
> -fixup -fixupdir /home/grid/fixit

- Failed on kernel parameters because grid didn't have access to /etc/sysctl.conf.  Ignore it.
- Failed on bondeth0 and some VIP stuff - ignore it.  I think this is a cluvfy bug


5) Create new GI Homes for 11.2.0.3.  Example below from cm01dbm01, but do this on both nodes:

[root@cm01dbm01 ~]# mkdir -p /u01/app/11.2.0.3/grid/
[root@cm01dbm01 ~]# chown grid /u01/app/11.2.0.3/grid
[root@cm01dbm01 ~]# chgrp -R oinstall /u01/app/11.2.0.3
[root@cm01dbm01 ~]# 

6) Unzip all the 10404530 software

7) No need to update OPatch software - I didn't this with the BP14 stuff from patch 13513783 (see next section)

8) Disable AMM in favor of ASMM for ASM instance. Follow the steps in the 1373255.1 document.  In our case, our SPFILE is actually a data file in $GI_HOME/dbs/DBFS_DG instead of the ASM disk group - I'm thinking about moving it with spmove and asmcmd, but I think I'll hold off for now.  When done it should look like below for both instances:


SQL> select instance_name from v$instance;

INSTANCE_NAME
----------------
+ASM2

SQL> show sga

Total System Global Area 1319473152 bytes
Fixed Size     2226232 bytes
Variable Size  1283692488 bytes
ASM Cache    33554432 bytes
SQL> 

9) Bounce databases and ASM and validate __shared_pool_size and __large_pool_size, along with values changed above.  Again, refer to the output above


10) Validate cluster interconnects.  This is how it shoud look - they need to be manually set:

SQL> select inst_id, name, ip_address from gv$cluster_interconnects
  2  /

   INST_ID NAME     IP_ADDRESS
---------- --------------- ----------------
2 bondib0    192.168.10.2
1 bondib0    192.168.10.1

SQL> create pfile='/tmp/asm.ora' from spfile;

File created.

SQL> !cat /tmp/asm.ora|grep inter
*.cluster_interconnects='192.168.10.1'
+ASM1.cluster_interconnects='192.168.10.1'
+ASM2.cluster_interconnects='192.168.10.2'

SQL> 

11) Shutdown visx, dwprd, and dwprod databases.  I'm going to shutdown everything for now for simplicity's sake

12) Login as "grid" to cm01dbm01 and unset ORACLE_HOME, ORACLE_BASE, and ORACLE_SID.  Get a VNC session established to we can launch the installer.  Run "./runInstaller" and follow the instructions on 1373255.1.  It will fail on VIP, Node connectivity, and patch 12539000 but I'm crossing my fingers and assuming this is a bug.  

(insert deep breath ...)

Things did install/upgrade fine, it took about 45 minutes.  The post-install CVU step failed with the same errors are the pre-CVU stage (network/VIP stuff), but this is OK.

13) Stop CRS on both nodes

14) Relink GI oracle executable with RDS

[grid@cm01dbm01 ~]$ dcli -g ./dbs_group ORACLE_HOME=/u01/app/11.2.0.3/grid \
> make -C /u01/app/11.2.0.3/grid/rdbms/lib -f ins_rdbms.mk \
> ipc_rds ioracle


15) Start CRS on both nodes

16) Login as oracle on cm01dbm01 and start a VNC session.  At this point, the 11.2.0.3 software has already been installed.

17) Unset ORACLE_HOME, ORACLE_BASE, and ORACLE_SID, and launch runInstaller.  The pre-req checks will fail on subnet and VIP details, as above - choose to ignore.

18) When installation completes, link oracle with RDS:

dcli -l oracle -g ~/dbs_group ORACLE_HOME=/u01/app/oracle/product/11.2.0.3/dbhome_1 \
          make -C /u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/lib -f ins_rdbms.mk ipc_rds ioracle

19) Copy OPatch from 11.2.0.2 directory to 11.2.0.3 directory - at the same time, might has well do the GI home as "grid".  When this is done, we can move on to the bundle patch



Patching Compute Nodes to 11.2.0.3 BP 14 (13513783)


1) Go to 13551280/Database/11.2.0.3 on first node and unzip p13513783_112030_Linux-x86-64.zip

2) Extract OPatch from 13551280/Database/OPatch on both cm01dbm01 and cm01dbm02, RDBMS and GI Homes.  For example:

[root@cm01dbm01 OPatch]# unzip p6880880_112000_Linux-x86-64.zip -d /u01/app/11.2.0/grid/
Archive:  p6880880_112000_Linux-x86-64.zip
replace /u01/app/11.2.0/grid/OPatch/docs/FAQ? [y]es, [n]o, [A]ll, [N]one, [r]ename: 

3) Make sure GI/OPatch files are owned by grid:oinstall and RDBMS/OPatch files are owned by oracle

4) Check inventory for RDBMS home (both nodes):

[oracle@cm01dbm01 ~]$ /u01/app/oracle/product/11.2.0.3/dbhome_1/OPatch/opatch lsinventory -detail -oh /u01/app/oracle/product/11.2.0.3/dbhome_1/

5) Check inventory for GI home (both nodes):

[grid@cm01dbm01 ~]$ /u01/app/11.2.0/grid/OPatch/opatch lsinventory -detail -oh /u01/app/11.2.0/grid/

6) Set permissions to oracle:oinstall to patch location:

[root@cm01dbm01 11.2.0.3]# chown -R oracle:oinstall 13513783/
[root@cm01dbm01 11.2.0.3]# 

7) Check for patch conflicts in GI home.  Login as "grid" and run the below.  You will see some conflicts:

[grid@cm01dbm01 ~]$ /u01/app/11.2.0/grid/OPatch/opatch prereq \
> CheckConflictAgainstOHWithDetail -phBaseDir /u01/stg/13551280/Database/11.2.0.3/13513783/13513783/

[grid@cm01dbm01 ~]$ /u01/app/11.2.0/grid/OPatch/opatch prereq \
> CheckConflictAgainstOHWithDetail -phBaseDir /u01/stg/13551280/Database/11.2.0.3/13513783/13540563/

[grid@cm01dbm01 ~]$ /u01/app/11.2.0/grid/OPatch/opatch prereq \
> CheckConflictAgainstOHWithDetail -phBaseDir /u01/stg/13551280/Database/11.2.0.3/13513783/13513982/

8) Check for patch conflicts on the RDBMS home, as oracle:

[oracle@cm01dbm01 11.2.0.3]$ /u01/app/oracle/product/11.2.0.3/dbhome_1/OPatch/opatch  prereq CheckConflictAgainstOHWithDetail -phBaseDir ./13513783/13513783/

[oracle@cm01dbm01 11.2.0.3]$ /u01/app/oracle/product/11.2.0.3/dbhome_1/OPatch/opatch  prereq CheckConflictAgainstOHWithDetail -phBaseDir ./13513783/13540563/custom/server/13540563

9) Login as root and add opatch directory for GI home to path

10) Patch by running the opatch command below - don't try to patch the GI and RDBMS homes alone:

[root@cm01dbm01 11.2.0.3]# opatch auto ./13513783/
Executing /usr/bin/perl /u01/app/11.2.0.3/grid/OPatch/crs/patch112.pl -patchdir . -patchn 13513783 -paramfile /u01/app/11.2.0.3/grid/crs/install/crsconfig_params
opatch auto log file location is /u01/app/11.2.0.3/grid/OPatch/crs/../../cfgtoollogs/opatchauto2012-02-13_00-34-53.log
Detected Oracle Clusterware install
Using configuration parameter file: /u01/app/11.2.0.3/grid/crs/install/crsconfig_params
OPatch  is bundled with OCM, Enter the absolute OCM response file path:
/u01/app/oracle/product/11.2.0.3/dbhome_1/OPatch/ocm/bin/ocm.rsp

11) The above succeeded on the first patch, failed on the second.  So I'm going to patch manually.  First, as oracle, ensure ORACLE_HOME is the new 11.2.0.3 home and run this: 

[oracle@cm01dbm01 ~]$ srvctl stop home -o $ORACLE_HOME -s /tmp/x.status -n cm01dbm01

12) Unlock crs by running this:

[root@cm01dbm01 11.2.0.3]# /u01/app/11.2.0.3/grid/crs/install/rootcrs.pl -unlock
Using configuration parameter file: /u01/app/11.2.0.3/grid/crs/install/crsconfig_params

13) Apply the first GI patch:

[grid@cm01dbm01 ~]$ /u01/app/11.2.0.3/grid/OPatch/opatch napply -oh /u01/app/11.2.0.3/grid/ -local /u01/stg/13551280/Database/11.2.0.3/13513783/13540563/

14) Apply second GI patch:

[grid@cm01dbm01 ~]$ /u01/app/11.2.0.3/grid/OPatch/opatch napply -oh /u01/app/11.2.0.3/grid/ -local /u01/stg/13551280/Database/11.2.0.3/13513783/13513982/

15) Login as oracle (database owner) on cm01dbm01 and run pre-script:

[oracle@cm01dbm01 scripts]$ pwd
/u01/stg/13551280/Database/11.2.0.3/13513783/13540563/custom/server/13540563/custom/scripts
[oracle@cm01dbm01 scripts]$ ./prepatch.sh -dbhome /u01/app/oracle/product/11.2.0.3/dbhome_1/
./prepatch.sh completed successfully.
[oracle@cm01dbm01 scripts]$ 

16) Apply BP patch to RDBMS home on cm01dbm01:

[oracle@cm01dbm01 13513783]$ pwd
/u01/stg/13551280/Database/11.2.0.3/13513783
[oracle@cm01dbm01 13513783]$ /u01/app/oracle/product/11.2.0.3/dbhome_1/OPatch/opatch napply -oh /u01/app/oracle/product/11.2.0.3/dbhome_1 -local ./13513783/

[oracle@cm01dbm01 13513783]$ /u01/app/oracle/product/11.2.0.3/dbhome_1/OPatch/opatch napply -oh /u01/app/oracle/product/11.2.0.3/dbhome_1 -local ./13540563/custom/server/13540563

17) Run post DB script:

[oracle@cm01dbm01 13513783]$ ./13540563/custom/server/13540563/custom/scripts/postpatch.sh -dbhome /u01/app/oracle/product/11.2.0.3/dbhome_1/
Reading /u01/app/oracle/product/11.2.0.3/dbhome_1//install/params.ora..

18) Run post scripts as root:

[root@cm01dbm01 11.2.0.3]# cd /u01/app/oracle/product/11.2.0
[root@cm01dbm01 11.2.0]# cd /u01/app/11.2.0.3/grid/
[root@cm01dbm01 grid]# cd rdbms/install/
[root@cm01dbm01 install]# ./rootadd_rdbms.sh 
[root@cm01dbm01 install]# cd ../../crs/installq
[root@cm01dbm01 install]# ./rootcrs.pl -patch
Using configuration parameter file: ./crsconfig_params

20) Repeat steps 11-18 on cm01dbm02, except in this case we need to first apply the first GI patch

21) At this point, we've got GI completely upgraded to 11.2.0.3 and a patched 11.2.0.3 home for our RDBMS tier, but our databases still live on 11.2.0.2.  Let's go on to the next section




Upgrading databases to 11.2.0.3 and Applying CPU BP bundle (see 1373255.1)

1) Start all databases on 11.2.0.2 and make sure they're healthy.  One thing I screwed up was on the GI installation, I put in the wrong ASMDBA/ASMOPER/ASMADMIN accounts.  This made it impossible for the database instances to start after things were patched (on 11.2.0.2).  I worked around it by adding "oracle" to all the "asm" groups (i.e. made its group membership look like grid).  I'll fix this later.

2) Run the upgrade prep tool for each database (NEW_HOME/rdbms/admin/utlu112i.sql).  Note that this took quite a long time on our Oracle EBS R12 database

3) Set cluster_interconnects to correct InfiniBand IP.  Below is an example from one of the 3 databases, but they look the same on all:

SQL> select inst_id, name, ip_address from gv$cluster_interconnects;

   INST_ID NAME     IP_ADDRESS
---------- --------------- ----------------
1 bondib0    192.168.10.1
2 bondib0    192.168.10.2

SQL> 


4) I don't have any Data Guard environments or listener_networks setup, so I can skip these sections from the README.txt

5) Launch dbua from the 11.2.0.3 home and to upgrade the first database.  It complained on a few underscore parameters and dictionary statistics, but I chose to ignore and move on.

6) After my first database was upgraded, I validated things by running "srvctl status database", checking /etc/oratab, checking V$VERSION, etc.

7) Repeat steps 5 and 6 for the remaining databases.

8) On each database that was upgraded, a couple of underscore parameter need to be reset - see below:

SYS @ dwprd1> alter system set "_lm_rcvr_hang_allow_time"=140 scope=both;

System altered.

Elapsed: 00:00:00.04
SYS @ dwprd1> alter system set "_kill_diagnostics_timeout"=140 scope=both;

System altered.

Elapsed: 00:00:00.01
SYS @ dwprd1> alter system set "_file_size_increase_increment"=2143289344 scope=both 
  2  ;

System altered.

Elapsed: 00:00:00.00

9) Apply Exadata Bundle Patch.  See below:

/u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/admin
[oracle@cm01dbm01 admin]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Mon Feb 13 13:28:55 2012

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options

SQL> @catbundle.sql exa apply.

- Make sure no ORA- errors exist in logs in /u01/app/oracle/cfgtoollogs/catbundle
- Check DBA_REGISTRY

10) Start applications and test

11) Start EM dbconsole for each database on first compute node


Finishing Up

1) Validate all your databases, CRS/GI components, etc.
2) Validate all ASM grid disks, Cell status
3) Cleanup staged patched from /tmp and/or other locations
4) Cleanup 11.2.0.2 GI and RDBMS Homes and ensure that all initialization parameters are pointing to the right spots
5) Fix group membership