Monday, August 17, 2015

Exadata Patching


                                             PATCHING   EXADATA  





Exadata Database Machine and Exadata Storage Server Supported Versions (Doc ID 888828.1)

A. It is recommended that Exadata systems with Data Guard configured use the "Standby
First" patching approach.
B. Patching should never be interrupted due to a connection drop. It is therefore
recommended that you use VNC or the screen utility.
C. Before patching cells in a rolling manner, you must check asmdeactivationoutcome
amModestatus and make sure that cells on all disks are online and that disks can be
deactivated.





First, the Oracle Exadata Patch has 3 different components that should be patched. As we know about Oracle Exadata, the Exadata rack has a different components, like Cisco Switch, KVM, Power Distribuion Unit, etc… and we only are responsible for patching the Database Servers (usually referenced as compute nodes), Storage Servers (usually referenced as cell nodes) and the Infiniband Switches.

We can divide the patches in 3 different parts:
Storage Server Patch
Database Server Patch
Infiniband Switches Patch


Order of Patching

This is an overview in how usually you should patch all the components, you may also want to wait a few days to progress to different components, in order to be aware of which component you had patched before starting facing one bug/error.

Infinibands Switchs

Spine
Leaf
Storage Server Software

Cell nodes
Database Minimal Pack in compute nodes
Database Bundle Patch

Grid Home (if applicable)
Oracle Home
CRS MLR patches (11.2.0.1.0)

Grid Home
Oracle Home (if applicable)





Before starting, I would like to share and note here 2 documents from My Oracle Support, aka metalink. These notes must be the first place that you need to go to review before patching the Exadata environment.

Database Machine and Exadata Storage Server 11g Release 2 (11.2) Supported Versions (Doc ID. 888828.1)
- This is for the second and third generation (V2 and X2) for Oracle Exadata, using Sun hardware.

Database Machine and Exadata Storage Server 11g Release 1 (11.1) Supported Versions (Doc ID. 835032.1)
- This is for the first generation (V1) for Oracle Exadata, using HP hardware.

Oracle usually updates these documents for every patch that is released, including different information about that.




-- The Oracle Database software on Exadata is updated using standard OPatch and the Oracle Universal Installer.
-- Running Exadata with different storage server software versions is supported, but should be minimized to  rolling patching scenarios.
-- Storage server updates require access to an Unbreakable Linux Network (ULN) based repository .



Platinum covers Exadata storage software and firmware patching, but the customer must perform
database patching.


A. Dependency issues found during yum updates require rolling back to a previous release before retrying.
B. Bundle patches applied using opatch auto ( can )roll back only the database or the grid infrastructure home.
C. Failed OS patches on database servers ( cannot  ) be rolled back.
D. Failed storage cell patches are  ( not )  rolled back to the previous release automatically.
E. Database server OS updates can be rolled back using opatch auto -rollback.
F. Dependency issues found during yum updates should ( not ) be ignored using the force option.



--  Frmware level are maintained automatically
--  Cell patches are maintian across all cell components  and are individual of  database patches
--



Exadata Database Server Patching using the DB Node Update Utility (Doc ID 1553103.1)
Exadata YUM Repository Population and Linux Database Server Updating (Doc ID 1473002.1)
Exadata YUM Repository Population, One-Time Setup Configuration and YUM upgrades (Doc ID 1556257.1) - this is an older version of 1473002.1, with additional manual steps, helping to understand more details.
Patch 16432033 - Using ISO Image with a Local Repository README
Quarterly Full Stack Download Patch For Oracle Exadata (Jul 2013)
Quarterly Full Stack Download Patch For Oracle Exadata (Oct 2013)




It is mandatory to know which components/tools are running on which servers on Exadata;
Here is the list;

DCLI -> storage cell and compute nodes, execute cellcli commands on multiple storageservers
 
ASM -> compute nodes -- it is ASM instance basically
 
RDBMS -> compute nodes -- it is Databas software
 
MS-> storage cells, provides a Java interface to the CellCLI command line interface, as well as providing an interface for Enterprise Manager plugins.
 
RS -> storage cells,RS, is a set of processes responsible for managing and restarting other processes.
 
Cellcli -> storage cell, to run storage commands
 
Cellsrv ->storage cell. It receives and unpacks iDB messages transmitted over the InfiniBand interconnect and examines metadata contained in the messages
 
Diskmon -> compute node, In Exadata, the diskmon is responsible for:,Handling of storage cell failures and I/O fencing
 
,Monitoring of Exadata Server state on all storage cells in the cluster (heartbeat),Broadcasting intra database IORM (I/O Resource Manager) plans from databases to storage cells,Monitoring or the control messages from database and ASM instances to storage cells ,Communicating with other diskmons in the cluster.
 

The following strategy should be used for applying patches in Exadata:
Review the patch README file (know what you are doing)
Run Exachk utility before the patch application (check the system , know current the situation of the system)
Automate the patch application process (automate it for being fast and to minimize problems)
Apply the patch
Run Exachk utility again -- after the patch application
Verify the patch ( Does it fix the problem or does it supply the related stuff)
Check the performance of the system(are there any abnormal performance decrease)
Test the failback procedure (as you may need to fail back in Production, who knows)




###########################################################




Exadata Database Server Patch (compute nodes):

This the patch that we are used to work as an Oracle DBA. These patches are being released specifically for Exadata, so you should not upgrade one Database to a version that is not support to the Exadata, so always check in the documentation if the release that you want was already published. The software to patches already contain upgrades for the Oracle Database and Oracle Clusterware.

All the patches provided as a bundle patches are comulative, and it also may include the most recent Patch Set Update (PSU) and/or Critical Patch Update (CPU).

a) So, for example, if you are in the 11.2.0.2.0 BP 1 (Bundle Patch 1) and would like to migrate to BP 7 (Bundle Patch 7), you just need to upgrade directly, without applying the 2, 3, 4, 5 and 6.
b) As the Bundle Patches have the last PSU and CPU, its not necessary to apply PSU/CPU in the Exadata environment, because its already included in the Bundle Patches. Always check the README file in order to check which PSU/CPU is included or not in the Bundle Patch.

– Bundle Patches Bug List:
11.2.0.2 Patch Bundles for Oracle Exadata Database Machine (Doc ID. 1314319.1)
11.2.0.1 Patch Bundles for Oracle Exadata Database Machine (Doc ID. 1316026.1)



When you need to apply a bundle patch  to Oracle Homes in Exadata, you will need to use oplan utility.
oplan utility generates instructions for applying patches, as well as instructions for rollback. It generates instructions for all the nodes in the cluster. Note that, oplan does not support DataGuard .. Oplan is supported since release 11.2.0.2 . It basically eases the patching process, because without it you need to read Readme files and extract your instructions yourself..
It is used as follows;
as Oracle software owner,(Grid or RDBMS) execute oplan;
$ORACLE_HOME/oplan/oplan generateApplySteps <bundle patch location>
it will create you patch instructions in html and txt formats;
$ORACLE_HOME/cfgtoollogs/oplan/<TimeStamp>/InstallInstructions.html 
$ORACLE_HOME/cfgtoollogs/oplan/<TimeStamp>/InstallInstructions.txt
Then, choose the apply strategy according to your needs and follow the patching instructions to apply the patch to the target.
That 's it.. 
If you want to rollback the patch;
execute the following;(replacing bundle patch location)
$ORACLE_HOME/oplan/oplan generateRollbackSteps <bundle patch location>
Again,   choose the rollback strategy according to your needs and follow the patching instructions to rollback the patch from target.




#####  Exadata Patching - DB Node Update


Copy the the dbnodeupdate.sh Utility and patch patch 16432033 from szur0023pap to the new folders. The required patches are stored there.
Alternatively download from MOS and transfer to the nodes

# scp username@szur0023pap:/sbclocal/san/DBaaS/Exadata/dbnodeupdate.zip /u01/patches/system/dbnodeupdate
# scp username@szur0023pap:/sbclocal/san/DBaaS/Exadata/16784347/Infrastructure/ExadataStorageServer/11.2.3.2.1/p16432033_112321_Linux-x86-64.zip /u01/patches/system/16432033


Unpack the dbnodeupdate.sh utility
# unzip dbnodeupdate.zip


 Make sure you have the latest version of dbnodupdate.sh. If you don't, then stop here and get the latest version from MOS by following the above link!
# cd /u01/patches/system/dbnodeupdate
# ./dbnodeupdate.sh -V
dbnodeupdate.sh, version 2.17


 Do not unpack patch 16432033
# cd /u01/patches/system/16432033
# ls -l
-rw-r--r-- 1 root root 1216696525 Jun 27 08:33 p16432033_112321_Linux-x86-64.zip


Before you start patching the node, verify the status of the cluster resources as follows:
# /u01/app/11.2.0.3/grid/bin/crsctl stat res -t


Check the current image version running on the node
# imageinfo


Run the dbnodeupdate.sh script in check/verify mode, using the zip file as the "repository".
# cd /u01/patches/system/dbnodeupdate
# ./dbnodeupdate.sh -u -v -l /u01/patches/system/16432033/p16432033_112321_Linux-x86-64.zip


The recommended procedure to update a DB noe is to use the DB Node Update Utility (aka dbnodeupdate.sh), as documented in Exadata Database Server Patching using the DB Node Update Utility (Doc ID 1553103.1).
If there are no errors in the above preparartion and verification step, proceed to the actual upgrading. This will perform a backup using the dbserver_backup.sh script, followed by patching the system.

# cd /u01/patches/system/dbnodeupdate
# ./dbnodeupdate.sh -u -s -l /u01/patches/system/16432033/p16432033_112321_Linux-x86-64.zip



Now the system will reboot
Note the last statement of the previous command, just before the reboot. Once the system is back, this last step will be executed.
# cd /u01/patches/system/dbnodeupdate/
# ./dbnodeupdate.sh -c


Patching is now completed, congratulations!
You can now check the image version and the status of the cluster
# imageinfo
# /u01/app/11.2.0.3/grid/bin/crsctl stat res -t






Exadata Patching - Compute Node


Patching on Database server can be performed serially or in parallel using DCLI utility.

dbnodeupdate.sh utility is used to perform the database server patching


What dbnodeupdate.sh utility does?

·                     Stop/unlock/disable CRS for host restart
·                     Perform LVM snapshot backup of / filesystem
·                     Mount yum ISO image and configure yum
·                     Apply OS updates via yum
·                     Relink all Oracle homes for RDS protocol
·                     Lock GI home and enable CRS upon host restart

 

Patching:


Step 1: Go to the database server patch directory


#cd /19625719/Infrastructure/ExadataDBNodeUpdate/3.60


Step 2:  Download latest dbnodeupdate.sh script and replace it with new one

You can go through the MOS ID 1553103.1 for latest dbnodeupdate.sh utility.


Step 3: Execute pre-requisites check on DB Server

#./dbnodeupdate.sh -u -l /19625719/Infrastructure/11.2.3.3.1/ExadataDatabaseServer/p18876946_112331_Linux-x86-64.zip -v

If pre-requisites check fails than fix the problem first and re-execute it again.


Step 4: Start patching on DB node post completion of successful pre-requisites check

#./dbnodeupdate.sh -u -l  /19625719/Infrastructure/11.2.3.3.1/ExadataDatabaseServer/p18876946_112331_Linux-x86-64.zip


Step 5: Closely monitor the log file for any error and do the next steps based on the given instruction by dbnodeupdate.sh utility.

Log file location /var/log/cellos/dbnodeupdate.log


Step 6: After reboot execute below command to continue as instucted by dbnodeupdate.sh utility

#./dbnodeupdate.sh -c


How it works?


-----Pre steps starts
·                     Collecting configuration details
·                     Validating system details. It will check best practice and know issues
·                     Check free space in /u01
·                     Backup yum configuration files
·                     Cleaning up the yum cache
·                     Preparing update
·                     Performing yum package dependency
·                     It will give the overview of pre-requisites check before starting the upgrade including existing and upgrading to image version
-----Pre steps completes

·                     Acceptance to continue upgrade procedure

-----Start upgrade
·                     Verifying GI and DB's are shut down
·                     Un-mount and mount /boot
·                     Performing file system backup
·                     Verifying and updating yum.conf
·                     Stop OSwatcher
·                     Cleaning up the yum cache
·                     Preparing for update 
·                     Performing yum update. Node is expected to reboot when finished
·                     Finish all the post steps
·                     Reboot system automatically
·                     After reboot run "./dbnodeupdate.sh -c" to complete the upgrade
-----Finish upgrade

-----Post Steps Starts
·                     Collect system configuration details
·                     Verifying GI and DB's are shutdown
·                     Verifying firmware updates/validations
·                     If the node reboots during this execution, re-run './dbnodeupdate.sh -c' after the node restarts
·                     Start ExaWatcher
·                     Re-linking all homes
·                     Unlocking /u01/app/11.2.0.3/grid
·                     Re-linking /u01/app/11.2.0.3/grid 
·                     Re-linking /u01/app/oracle/product/11.2.0.3/dbhome_1 
·                     Executing /u01/app/11.2.0.3/grid/crs/install/rootcrs.pl -patch
·                     Starts stack
·                     Enabling stack to start at reboot
-----Finished post steps

After completing on first node we can carry on same tasks on second node if patching is executing serially.


Rollback:


There are only two steps need to perform to rollback the patch to the previous version on database server.

Step 1 Execute #./dbnodeupdate.sh -r

Step 2 #./dbnodeupdate.sh -c





###########################################################



./patchmgr -cells /opt/oracle.SupportTools/onecommand/cell_group -cleanup


HOW ROLLING PATCH WORKS:

Since version 11.2.1.3.1, Oracle have introduced the feature “Cell Rolling Apply” in order to simplify the install and of course, to reduce downtime. You must be aware about that sometimes is necessary some patches in the compute nodes BEFORE using -rolling option in the cells nodes.

Below is a brief overview in how it works:

Preparation of the cell node X
Loop
Turn all Grid Disks offline in the cell node X
Patch cell node X
Grid Disks online
End Loop



The Storage Server Patch is responsible for keeping our cell nodes always up-to-date, fixing possibles problems and this patch includes different component patches, like kernel patches, firmware, operation system, etc… for the Storage Server.
These patches are usually lunched from any Database Server (typically using the compute node 1) using dcli and ssh to remotely patch each cell node.
All the Storage Server patch came with what the Oracle calls as “Database Minimal Pack” or “Database Convenience Pack”. These packs are included in a .zip file inside the storage patch software and must be applied in all the Database Servers (compute nodes), because it includes, as in the storage patch, kernel, firmware, and operation system patches necessary.
Applying these patches in the Storage Server, will also change the Linux version for each server, but for the Database Minimal Pack, it will NOT change the linux version.


SOME USEFUL COMMANDS:

— Rolling Patch Example
./patchmgr -cells <cells group> -patch_check_prereq -rolling
./patchmgr -cells <cells group> -patch -rolling

– Non-Rolling Patch Example
./patchmgr -cells <cells group> -patch_check_prereq
./patchmgr -cells <cells group> -patch

– Information about the versions in the kernel.
– run in the database server, also the status of the patch.
imageinfo

– Logs to check
/var/log/cellos/validations.log
/var/log/cellos/vldrun*.log

– File with all the cell nodes names, used in the <cells group> tag
cat /opt/Oracle.SupportTools/onecommand/cell_group





Cell Node patching :

Exadata Storage Server Patching - Some details
Exadata Storage Server Patching

●● Exadata Storage Server patch is applied to all cell nodes.
●● Patching is launched from compute node 1 and will use dcli/ssh to remotely patch each cell node.

●● Exadata Storage Server Patch zip also contains Database Minimal Pack or Database Convenience Pack, which are applied to all compute nodes.
This patch is copied to each compute node and run locally.

●● Applying the storage software on the cell nodes will also change the Linux version and applying the database minimal pack
on the compute nodes does NOT change the Linux version
To upgrade the Linux on Compute Node follow MOS Note: 1284070.1

Non rolling patch apply is much faster as you are applying the patch on all the cell nodes simultaneously, also there
are NO risk to single disk failure. Please note, this would require full outage.

In case of rolling patch apply, database downtime is not required, but patch application time is very high. Major risk:
ASM high redundancy to reduce disk failure exposure

Grid disks offline >>> Patch Cel01 >>> Grid disks online
Grid disks offline >>> Patch Cel02 >>> Grid disks online
Grid disks offline >>> Patch Cel..n>>> Grid disks online

Rolling Patch application can be risky affair, please be appraised of the followings:
Do not use -rolling option to patchmgr for rolling update or rollback without first applying required fixes on database hosts
./patchmgr -cells cell_group -patch_check_prereq -rolling >>> Make sure this is successful and review spool carefully.
./patchmgr -cells cell_group -patch –rolling

Non-rolling Patching Command:
./patchmgr -cells cell_group -patch_check_prereq
./patchmgr -cells cell_group -patch


How to Verify Cell Node is Patched Successfully

# imageinfo

Output of this command gives some good information, including Kernel Minor Version.

Active Image Version: 11.2.2.3.1.110429.1
Active Image Status: Success

If you get anything in "Active Image Status" except success, then you need to look at validations.log and vldrun*.log.
The image status is marked as failure when there is a failure reported in one or more validations.

Check the /var/log/cellos/validations.log and /var/log/cellos/vldrun*.log files for any failures.

If a specific validation failed, then the log will indicate where to look for the additional logs for that validation.




Exadata Patching - Cell Server


Cell storage patching can be done by patchmgr utility which is used to do a patching in rolling as
well in non-rolling fashion.

Syntax: ./patchmgr -cells cell_group -patch [-rolling] [-ignore_alerts] [- smtp_from "addr" -smtp_to "addr1 addr2 addr3 ..."]

Here addr is the sending mail id which is used to send status of patching and addr1,addr2,addr3 are receiving mail id to receive the status of patching.

Step-1  First note down the current image version of cell by executing 

#imageinfo

Step-2  Go to cell patch directory where patch has been copied


/19625719/Infrastructure/11.2.3.3.1/ExadataStorageServer_InfiniBandSwitch/patch_11.2.3.3.1.140708

Step-3
  Reset the server to a known state using the following command

./patchmgr -cells cell_group -reset_force

Step-4  Clean up any previous patchmgr utility runs using the following command

./patchmgr -cells cell_group -cleanup

Step-5 Verify that the cells meet prerequisite checks using the following command



(Rolling)

./patchmgr -cells ~/cellgroup -patch_check_prereq -rolling 

or


(Non-rolling)


./patchmgr -cells ~/cellgroup -patch_check_prereq 

Here cellgroup file contains IPs of all cell server.


Step-6  Output should not have any error, If any error than resolve it first than re-execute above command

Step-7 
Patch cell server

(Rolling)


./patchmgr -cells ~/cellgroup -patch -rolling

or


(Non-rolling)


./patchmgr -cells ~/cellgroup -patch

Step-8  
Check logs if any error in patchmgr.stdout file.


How it works?


Entire patching activity done by patchmgr utility automatically.
·                     To ensure good backup exists, USB recovery media is recreated 
·                     Check cells have ssh equivalence for root user
·                     Initialize files, check space and state of cell services
·                     Copy, extract prerequisite check archive to cells 
·                     Check prerequisites on cell
·                     Copy the patch to cell
·                     Execute plug-in check for Patch Check Prereq
·                     Initiate patch on cell
·                     Reboot the cell
·                     Execute plug-in check for Patching
·                     Finalize patch
·                     Reboot the cell
·                     Check the state of patch
·                     Execute plug-in check for Post Patch
·                     Done
After completion of patching you can check the image version, it should be changed to new version

#imageinfo
#imagehistory


Rollback



Step-1 
 Disable writeback flash cache (You can refer Oracle DOC ID - 1500257.1)

Step-2  Check rollback pre-requisites

(Rolling)


./patchmgr -cells ~/cellgroup -rollback_check_prereq -rolling -ignore_alerts

or


(non-rolling) 


./patchmgr -cells ~/cellgroup -rollback_check_prereq -ignore_alerts  

Step-3  
Perform the rollback

(Rolling)
 


./patchmgr -cells ~/cellgroup -rollback -rolling -ignore_alerts  


or 


(Non-Rolling)


./patchmgr -cells ~/cellgroup -rollback -ignore_alerts 

Step-4  Clean up the cells using the -cleanup option to clean up all the temporary patch or rollback files on the cells

./patchmgr -cells ~/cellgroup -cleanup




###########################################################





Most of Exadata Machines have 2 leafs of Infiniband switches and 1 spine. The position of the 2 Leafs is usually in U20 and U24 and the Spine is located in the very bottom position, U1. Actually, there are some Exadata Machines that have the Spine switch in the Half or Quarter Rack, but usually only the Full rack should have it.
In order to patch the switches, you must apply the patch in the SPINE switch first, rebooting and waiting it to back online BEFORE proceeding to the next switches (leafs).

All the Infiniband Switchs Patch are not CUMULATIVE, its necessary to upgrade version by version, for example, if I want to migrate the version 1.0.1-1 to 1.3.3-2, I must first upgrade 1.0.1-1 to 1.1.3-2 and after 1.1.3-2 to 1.3.3-2.



Exadata Patching - Infiniband Switch


Here we have listed down bullet points to patch IB switches on Exadata Database Machine.


Syntax: #patchmgr -ibswitches [ibswitch_list_file] <-upgrade | -downgrade [-ibswitch_precheck] [-force]]

Here 


ibswitch_list_file
 contains IP of all the IB switches
Upgrade - to upgrade the switch
Downgrade - to downgrade the switch
ibswitch_precheck - to check the prerequisites

Patchmgr utility would be available in storage server patch directory.


Patching


#./patchmgr -ibswitches ibswitches -upgrade -ibswitch_precheck



How it works?
·                     Disable Subnet Manager
·                     Copy firmware to switch
·                     Check minimal firmware version to upgrade it
·                     Verify enough space in /tmp and /
·                     Verify for free memory to start upgrade
·                     Verify host details in /etc/hosts and /etc/sysconfig/network-scripts/ifcfg-eth0 /etc/sysconfig/network-scripts/ifcfg-eth1
·                     Verify for NTP server
·                     Pre-upgrade validation
·                     Start upgrade
·                     Load firmware
·                     Disable Subnet Manager
·                     Verify that /conf/configvalid is set to 1
·                     Set SMPriority to 5
·                     Reboot switch
·                     Restart Subnet Manager 
·                     Start post-update validation
·                     Confirmation: Inifiniband switch is at target patching level
·                     Verifying host details in /etc/hosts and /etc/sysconfig/network-scripts/ifcfg-eth0 /etc/sysconfig/network-scripts/ifcfg-eth1
·                     Verifying NTP Server
·                     Firmware verification on InfiniBand switch 
·                     Post-check validation on IBSwitch 
·                     Final Confirmation: Update switch to 2.1.3_4 (Firmware version)

Once it completes on one switch, it will start to upgrade next available switch and in last it will give overall status of upgrade.




###########################################################


Applying patch to Exadata Box

Hi everyone! I’m here to post about the patch apply on Exadata Machine. As best practices we will apply the QFSP (Quatterly Full Stack Patch) for Exadata Jan/2014. The patch apply is totally automatic so if the prereqs were addressed correctly, you will have no bad surprise and your Exadata environment will be patched successfully. At my job, our team applied it recently without any issue.
The patch number is 17816100 [Quarterly Full Stack Download Patch For Oracle Exadata (Jan 2014 - 11.2.3.3.0)] which has 3.6G . This patch will patch most of the Exadata Database Machine components, whic are: databases; dbnodes; storage servers; infiniband switches; and PDUs (Power Distribution Units). Our databases are already patched to version (11.2.0.3.21) and on the end of this patching, the image version for the db and cell nodes should be 11.2.3.3.0 as we are moving from image 11.2.3.2.1.
You should carefully read all the README and notes regarding this patch as there is a complete list of prereqs and things to analyze. Although the db and cell nodes will all end with the same image version, on our case, the infiniband switches upgrade was optional according to the compatibility matrix but to keep things easy, we upgraded them too. The PDUs upgrade are optional and these is the easiest one.
Now lets get hands on it and lets begin with the PDUs. Doing this upgrade will cost you no outage and it is as simple as upgrading the firmware from your home network router. Just navigate to your PDU from your browser and hit “Net Configuration”. Scroll down to “Firmware Upgrade” and select the file MKAPP_Vx.x.dl to upgrade. After the PDU firmware was upgraded it will popup for the HTML interface to be upgraded so you need to select the file HTML_Vx.x.dl. Do that on all of the PDUs and your are done with it. Peace of cake.
Now lets proceed to the cells upgrade. As we usage the rolling upgrade strategy (no outage), all of the database software must have 17854520 patch applied on them, other while, the DBs may hang or crash. The utility used to patch the cells and infiniband switches is patchmgr (which should be executed as root). Also, you can run a precheck for the upgrade from this utility, as mentioned below:
# ./patchmgr -cells cell_group -patch_check_prereq -rolling
It is recommended to higher the disk repair time from diskgroups, in order to do not drop the disks. Also, and according to Oracle docs, it is recommended to reset the cells if this is the first time that those cells image are upgraded. Do this one cell at a time and then initiate the cell upgrade. The patchmgr should be executed from the dbnode.
# ./patchmgr -cells cel01 -reset_force
# ./patchmgr -cells cel02 -reset_force
# ./patchmgr -cells cel03 -reset_force
# ./patchmgr -cells cell_group -rolling
After finishing successfully the cells upgrade, go for infiniband switches precheck upgrade and execute the patchmgr utility as listed below:
# ./patchmgr -ibswitches -upgrade -ibswitch_precheck
To continue with the ib switches upgrade just remove the precheck parameter:
# ./patchmgr -ibswitches -upgrade
When you are done with the infiniband switches and the cell nodes you should go to upgrade the database nodes. For this upgrade, you will have the dbnodeupdate.sh utility. This will upgrade dbnodes kernel and all of the dependent packages. Pay attention that if you have any other third package installed you should upgrade it manually after the upgrade. On our environment, the kernel will be upgrade to Oracle Linux 5.9 (kernel-2.6.39-400.126.1.el5uek).The dbnodeupdate.sh is fully automatic and it will disable and bring down the CRS for the node. You must use root user to run it and for best practices do it one node at a time.
To perform a precheck run it with the parameter -v on the end
# ./dbnodeupdate.sh -u -l $PATCH_17816100/Infrastructure/ExadataStorageServer/11.2.3.3.0/p17809253_112330_Linux-x86-64.zip -v
Now to start the upgrade for the dbnode, execute it without the -v parameter
# ./dbnodeupdate.sh -u -l $PATCH_17816100/Infrastructure/ExadataStorageServer/11.2.3.3.0/p17809253_112330_Linux-x86-64.zip
After the machine reboots, confirm the upgrade executing:
# ./dbnodeupdate.sh -c
Perform this steps on all the dbnodes remaining and you are done. The whole Exadata Machine is patched, run imageinfo on all dbnodes e storage servers to confirm the new image. On the ibswitches run the command version to confirm it:
# dcli -g all_group -l root imageinfo
db01:
db01: Kernel version: 2.6.39-400.126.1.el5uek #1 SMP Fri Sep 20 10:54:38 PDT 2013 x86_64
db01: Image version: 11.2.3.3.0.131014.1
db01: Image activated: 2014-03-29 10:30:56 -0300
db01: Image status: success
db01: System partition on device: /dev/mapper/VGExaDb-LVDbSys1
db01:
db02:
db02: Kernel version: 2.6.39-400.126.1.el5uek #1 SMP Fri Sep 20 10:54:38 PDT 2013 x86_64
db02: Image version: 11.2.3.3.0.131014.1
db02: Image activated: 2014-03-30 10:23:58 -0300
db02: Image status: success
db02: System partition on device: /dev/mapper/VGExaDb-LVDbSys1
db02:
cel01:
cel01: Kernel version: 2.6.39-400.126.1.el5uek #1 SMP Fri Sep 20 10:54:38 PDT 2013 x86_64
cel01: Cell version: OSS_11.2.3.3.0_LINUX.X64_131014.1
cel01: Cell rpm version: cell-11.2.3.3.0_LINUX.X64_131014.1-1
cel01:
cel01: Active image version: 11.2.3.3.0.131014.1
cel01: Active image activated: 2014-03-28 23:42:33 -0300
cel01: Active image status: success
cel01: Active system partition on device: /dev/md6
cel01: Active software partition on device: /dev/md8
cel01:
cel01: In partition rollback: Impossible
cel01:
cel01: Cell boot usb partition: /dev/sdm1
cel01: Cell boot usb version: 11.2.3.3.0.131014.1
cel01:
cel01: Inactive image version: 11.2.3.1.0.120304
cel01: Inactive image activated: 2012-05-21 18:00:09 -0300
cel01: Inactive image status: success
cel01: Inactive system partition on device: /dev/md5
cel01: Inactive software partition on device: /dev/md7
cel01:
cel01: Boot area has rollback archive for the version: 11.2.3.1.0.120304
cel01: Rollback to the inactive partitions: Possible
cel02:
cel02: Kernel version: 2.6.39-400.126.1.el5uek #1 SMP Fri Sep 20 10:54:38 PDT 2013 x86_64
cel02: Cell version: OSS_11.2.3.3.0_LINUX.X64_131014.1
cel02: Cell rpm version: cell-11.2.3.3.0_LINUX.X64_131014.1-1
cel02:
cel02: Active image version: 11.2.3.3.0.131014.1
cel02: Active image activated: 2014-03-29 00:46:13 -0300
cel02: Active image status: success
cel02: Active system partition on device: /dev/md6
cel02: Active software partition on device: /dev/md8
cel02:
cel02: In partition rollback: Impossible
cel02:
cel02: Cell boot usb partition: /dev/sdm1
cel02: Cell boot usb version: 11.2.3.3.0.131014.1
cel02:
cel02: Inactive image version: 11.2.3.1.0.120304
cel02: Inactive image activated: 2012-05-21 18:01:07 -0300
cel02: Inactive image status: success
cel02: Inactive system partition on device: /dev/md5
cel02: Inactive software partition on device: /dev/md7
cel02:
cel02: Boot area has rollback archive for the version: 11.2.3.1.0.120304
cel02: Rollback to the inactive partitions: Possible
cel03:
cel03: Kernel version: 2.6.39-400.126.1.el5uek #1 SMP Fri Sep 20 10:54:38 PDT 2013 x86_64
cel03: Cell version: OSS_11.2.3.3.0_LINUX.X64_131014.1
cel03: Cell rpm version: cell-11.2.3.3.0_LINUX.X64_131014.1-1
cel03:
cel03: Active image version: 11.2.3.3.0.131014.1
cel03: Active image activated: 2014-03-29 01:51:22 -0300
cel03: Active image status: success
cel03: Active system partition on device: /dev/md6
cel03: Active software partition on device: /dev/md8
cel03:
cel03: In partition rollback: Impossible
cel03:
cel03: Cell boot usb partition: /dev/sdm1
cel03: Cell boot usb version: 11.2.3.3.0.131014.1
cel03:
cel03: Inactive image version: 11.2.3.1.0.120304
cel03: Inactive image activated: 2012-05-21 18:01:28 -0300
cel03: Inactive image status: success
cel03: Inactive system partition on device: /dev/md5
cel03: Inactive software partition on device: /dev/md7
cel03:
cel03: Boot area has rollback archive for the version: 11.2.3.1.0.120304
cel03: Rollback to the inactive partitions: Possible
sw-ib2 # version
SUN DCS 36p version: 2.1.3-4
Build time: Aug 28 2013 16:25:57
SP board info:
Manufacturing Date: 2011.05.08
Serial Number: “NCD6I0106″
Hardware Revision: 0×0006
Firmware Revision: 0×0000
BIOS version: SUN0R100
BIOS date: 06/22/2010
sw-ib3 # version
SUN DCS 36p version: 2.1.3-4
Build time: Aug 28 2013 16:25:57
SP board info:
Manufacturing Date: 2011.05.11
Serial Number: “NCD6Q0110″
Hardware Revision: 0×0006
Firmware Revision: 0×0000
BIOS version: SUN0R100
BIOS date: 06/22/2010
Docs:
• Exadata 11.2.3.3.0 release and patch (16278923) (Doc ID 1487339.1)
• Exadata Database Server Patching using the DB Node Update Utility (Doc ID 1553103.1)
• Exadata Patching Overview and Patch Testing Guidelines (Doc ID 1262380.1)
• Exadata Database Machine and Exadata Storage Server Supported Versions (Doc ID 888828.1)






###########################################################


Upgrading Exadata to 11.2.0.3 and Applying BP14


Upgrading Exadata to 11.2.0.3 and Applying Bundle Patch 14

In this blog, I'll walk you through the abbreviated steps to apply Bundle Patch 14 on our Exadata X2-2 Quarter Rack.  I say "abbreviated" because I'm simply going to bullet all the steps - this is no substitute for reading the various README files.

For BP14, I'm going to apply all patches in a rolling upgrade fashion.  The nodes in our Exadata are:

- cm01dbm01 (Compute node 1)
- cm01dbm02 (Compute node 2)
- cm01cel01 (Cell 1)
- cm01cel02 (Cell 2)
- cm01cel03 (Cell 3)

Preparation

1) Downloaded p13551280_112030_Linux-x86-64.zip from MOS

2) Transferred p13551280_112030_Linux-x86-64.zip to our first compute node, cm01dbm01

3) Unzipped p13551280_112030_Linux-x86-64.zip

4) Read the various README.txt files

5) Login to each storage server, compute node, Infiniband switch, and validate the current versions using "imageinfo"

Patch Contents

Bundle Patch 14 (13551280) contains the latest software versions for the entire Exadata techology stack.  The patch contents are split into 3 sections:

* Infrastructure
- Includes patches for Exadata Storage Server nodes, version 11.2.2.4.2
- InfiniBand switches, version 1.3.3-2
- PDUs, firmware version 1.04
* Database
- Oracle RDBMS 11.2.0.3
- Grid Infrastructure , 11.2.0.3
- OPatch 11.2.0.1.9
- OPlan 11.2.0.2.7
* Systems Management
- EM Agent, 11.1.0.1.0
- EM Plugins for InfiniBand Switches, Cisco switches, PDUs, KVMs, ILOMs
- OMS patches for any/all OMS homes monitoring Exadata targets (11.1.0.1.0)

Patching Storage Servers

1) Transfer 13551280/Infrastructure/ExadataStorageServer/11.2.2.4.2 contents to storage cell cm01cel01:/tmp, our first cell and unzip the zip file

2) Read MOS note 1388400.1 and do the following:
- "# cellcli -e list griddisk where diskType=FlashDisk".  Make sure we don't have any Flash Grid disks, which we didn't.
- "# cellcli -e list physicaldisk attributes name, status, slotNumber".  Make sure no duplicate disks exists with the same slot number.  In our case, we didn't have any.
- "# cellcli -e list physicaldisk".  Make sure they're all normal.
- "# grep -in 'Failed to parse the command' $CELLTRACE/ms-odl.trc*".  Make sure we don't have any flash disk population errors.  We didn't.
- Since our current cell version image is > 11.2.2.2.x, we skipped steps 3a and 3b.
- Transfer validatePhysicalDisks from MOS note to /tmp and run it. It should look like this:

[root@cm01cel01 patch_11.2.2.4.2.111221]# /tmp/validatePhysicalDisks 
[SUCCESS] CellCLI output and MegaCLI output are consistent.
[root@cm01cel01 patch_11.2.2.4.2.111221]# 

- Ensure database tier hosts are > 11.2.0.1 to support rolling upgrades.  In our case, they are.
3) Validate that all physical disks have valid physicalInsertTime:

[root@cm01cel01 patch_11.2.2.4.2.111221]#  cellcli -e 'list physicaldisk attributes luns where physicalInsertTime = null'
[root@cm01cel01 patch_11.2.2.4.2.111221]# 

4) Verify that no duplicate slotNumbers exist.  This was done per MOS note 1388400.1, step 2

5) Obtain LO and serial console access for cell
- Login to cm01cel01-ilom as root
- Type "start /SP/console"
- Login to console as root

6) Check version of ofa by doing "rpm -qa|grep ofa".  Ours was higher than the minimum version, so we're OK

7) Since we're doing in rolling fashion, ensure the Grid Disk are all offline:

[root@cm01cel01 ~]#  cellcli -e "LIST GRIDDISK ATTRIBUTES name WHERE asmdeactivationoutcome != 'Yes'"
[root@cm01cel01 ~]# 
[root@cm01cel01 ~]# cellcli -e "ALTER GRIDDISK ALL INACTIVE"
GridDisk DATA_CD_00_cm01cel01 successfully altered
GridDisk DATA_CD_01_cm01cel01 successfully altered
GridDisk DATA_CD_02_cm01cel01 successfully altered
GridDisk DATA_CD_03_cm01cel01 successfully altered
GridDisk DATA_CD_04_cm01cel01 successfully altered
GridDisk DATA_CD_05_cm01cel01 successfully altered
GridDisk DATA_CD_06_cm01cel01 successfully altered
GridDisk DATA_CD_07_cm01cel01 successfully altered
GridDisk DATA_CD_08_cm01cel01 successfully altered
GridDisk DATA_CD_09_cm01cel01 successfully altered
GridDisk DATA_CD_10_cm01cel01 successfully altered
GridDisk DATA_CD_11_cm01cel01 successfully altered
GridDisk DBFS_DG_CD_02_cm01cel01 successfully altered
GridDisk DBFS_DG_CD_03_cm01cel01 successfully altered
GridDisk DBFS_DG_CD_04_cm01cel01 successfully altered
GridDisk DBFS_DG_CD_05_cm01cel01 successfully altered
GridDisk DBFS_DG_CD_06_cm01cel01 successfully altered
GridDisk DBFS_DG_CD_07_cm01cel01 successfully altered
GridDisk DBFS_DG_CD_08_cm01cel01 successfully altered
GridDisk DBFS_DG_CD_09_cm01cel01 successfully altered
GridDisk DBFS_DG_CD_10_cm01cel01 successfully altered
GridDisk DBFS_DG_CD_11_cm01cel01 successfully altered
GridDisk RECO_CD_00_cm01cel01 successfully altered
GridDisk RECO_CD_01_cm01cel01 successfully altered
GridDisk RECO_CD_02_cm01cel01 successfully altered
GridDisk RECO_CD_03_cm01cel01 successfully altered
GridDisk RECO_CD_04_cm01cel01 successfully altered
GridDisk RECO_CD_05_cm01cel01 successfully altered
GridDisk RECO_CD_06_cm01cel01 successfully altered
GridDisk RECO_CD_07_cm01cel01 successfully altered
GridDisk RECO_CD_08_cm01cel01 successfully altered
GridDisk RECO_CD_09_cm01cel01 successfully altered
GridDisk RECO_CD_10_cm01cel01 successfully altered
GridDisk RECO_CD_11_cm01cel01 successfully altered
[root@cm01cel01 ~]# 
[root@cm01cel01 ~]# cellcli -e "LIST GRIDDISK WHERE STATUS != 'inactive'"
[root@cm01cel01 ~]# 

8) Shutdown cell services

[root@cm01cel01 ~]# sync
[root@cm01cel01 ~]# sync
[root@cm01cel01 ~]# shutdown -F -r now

Broadcast message from root (ttyS0) (Sat Feb 11 19:57:42 2012):

The system is going down for reboot NOW!
audit(1329008264.759:2153236): audit_pid=0 old=7383 by auid=4294967295
type=1305 audit(1329008264.850:2153237): auid=4294967295 op=remove rule key="time-change" list=4 res=1

9) Since we're doing this in rolling fashion, activate all disks and check grid disk attributes.  I'm wondering if steps 7 and 8 were actually required, but I believe they were to ensure we had a healthy disk status:

[root@cm01cel01 ~]# cellcli -e 'list griddisk attributes name,asmmodestatus'
DATA_CD_00_cm01cel01     OFFLINE
DATA_CD_01_cm01cel01     OFFLINE
DATA_CD_02_cm01cel01     OFFLINE
(wait)
[root@cm01cel01 ~]# cellcli -e 'list griddisk attributes name,asmmodestatus' \
> |grep -v ONLINE
[root@cm01cel01 ~]# 

10) Ensure network configuration is consistent with cell.conf by running "/opt/oracle.cellos/ipconf -verify"

[root@cm01cel01 ~]# /opt/oracle.cellos/ipconf -verify
Verifying of Exadata configuration file /opt/oracle.cellos/cell.conf
Done. Configuration file /opt/oracle.cellos/cell.conf passed all verification checks
[root@cm01cel01 ~]# 

11) Prep for patchmgr - ensure that root has user equivalence by running dcli commands below:

[root@cm01cel01 ~]# dcli -g cell_group -l root 'hostname -i'
cm01cel01: 172.16.1.12
cm01cel02: 172.16.1.13
cm01cel03: 172.16.1.14
[root@cm01cel01 ~]# 

12) Check pre-requisites by running "./patchmgr -cells ~/cell_group -patch_check_prereq -rolling" from patch stage location:

[root@cm01cel01 patch_11.2.2.4.2.111221]# ./patchmgr -cells ~/cell_group \
> -patch_check_prereq -rolling

[NOTICE] You will need to patch this cell by starting patchmgr from some other cell or database host.
20:10-11-Feb:2012        :Working: DO: Check cells have ssh equivalence for root user. Up to 10 seconds per cell ...
20:10-11-Feb:2012        :SUCCESS: DONE: Check cells have ssh equivalence for root user.
20:10-11-Feb:2012        :Working: DO: Check space and state of Cell services on target cells. Up to 1 minute ...
20:10-11-Feb:2012        :SUCCESS: DONE: Check space and state of Cell services on target cells.
20:10-11-Feb:2012        :Working: DO: Copy and extract the prerequisite archive to all cells. Up to 1 minute ...
20:10-11-Feb:2012        :SUCCESS: DONE: Copy and extract the prerequisite archive to all cells.
20:10-11-Feb:2012        :Working: DO: Check prerequisites on all cells. Up to 2 minutes ...
20:11-11-Feb:2012        :SUCCESS: DONE: Check prerequisites on all cells.

[root@cm01cel01 patch_11.2.2.4.2.111221]# 

13) Check ASM disk group repair time.  I'm leaving mine at 3.6 hours:

 1  select dg.name,a.value from v$asm_diskgroup dg, v$asm_attribute a
  2* where dg.group_number=a.group_number and a.name='disk_repair_time'
SQL> /

NAME
------------------------------
VALUE
--------------------------------------------------------------------------------
DATA_CM01
3.6h

DBFS_DG
3.6h

RECO_CM01
3.6h


14) Make sure you're not using the LO or serial console - stay logged into the LO console to monitor things in case things go wrong:

[root@cm01cel01 ~]#  echo $consoletype
pty
[root@cm01cel01 ~]# 

15) Apply the patch in rolling fashion - note that this will patch cm01cel02 and cm01cel03, since I'm launching it from cm01cel01.  After it's done, we'll have to patch cm01cel01.  I should have launched this from a compute node, for some reason I always forget =)

[root@cm01cel01 patch_11.2.2.4.2.111221]# ./patchmgr -cells ~/cell_group -patch -rolling
NOTE Cells will reboot during the patch or rollback process.
NOTE For non-rolling patch or rollback, ensure all ASM instances using
<< outut truncated >>

16) Validate cm01cel02 and cm01cel03 using "imageinfo":

[root@cm01cel02 ~]# imageinfo

Kernel version: 2.6.18-238.12.2.0.2.el5 #1 SMP Tue Jun 28 05:21:19 EDT 2011 x86_64
Cell version: OSS_11.2.2.4.2_LINUX.X64_111221
Cell rpm version: cell-11.2.2.4.2_LINUX.X64_111221-1

Active image version: 11.2.2.4.2.111221
Active image activated: 2012-02-11 20:58:06 -0500
Active image status: success
Active system partition on device: /dev/md6
Active software partition on device: /dev/md8

17) Validate Grid disks are active and in the correct state, on both cm01cel02 and cm01cel03, using "cellcli -e 'list griddisk attributes name,status,asmmodestatus'"

18) Check /var/log/cellos/validations.log on both cm01cel02 and cm01cel03

19) From cm01dbm01 (first compute node), un-staged the Infrastructure patch in /tmp/patch_11.2.2.4.2.111221

20) Create cell_group file containing only "cm01cel01"

21) Check user-equivalence by doing below:

[root@cm01dbm01 patch_11.2.2.4.2.111221]#  dcli -g cell_group -l root 'hostname -i'
cm01cel01: 172.16.1.12
[root@cm01dbm01 patch_11.2.2.4.2.111221]# p

22) Run "./patchmgr -cells cell_group -patch_check_prereq -rolling"

23) Patch cm01cel01 by doing "./patchmgr -cells cell_group -patch -rolling":

[root@cm01dbm01 patch_11.2.2.4.2.111221]# ./patchmgr -cells cell_group -patch -rolling
NOTE Cells will reboot during the patch or rollback process.
NOTE For non-rolling patch or rollback, ensure all ASM instances using

24) Check Grid Disk and ASM status on cm01cel01 using "cellcli -e 'list griddisk attributes name,status,asmmodestatus'"

25) Check imageinfo and /var/log/cellos/validations.log on cm01cel01

26) Cleanup using "./patchmgr -cells cell_group -cleanup" (from cm01dbm01 - it will cleanup on all 3 cells)

27) Login to cm01cel01 to check InfiniBand.  As a side-note, we should have patched IBs first according to something very far down in the README, but luckily our IB versions are in good shape:

[root@cm01cel01 oracle.SupportTools]# ./CheckSWProfile.sh -I cm01sw-ib2,cm01sw-ib3
Checking if switch cm01sw-ib2 is pingable...
Checking if switch cm01sw-ib3 is pingable...
Use the default password for all switches? (y/n) [n]: n
Use same password for all switches? (y/n) [n]: y
Enter admin or root password for All_Switches:
Confirm password:
[INFO] SUCCESS Switch cm01sw-ib2 has correct software and firmware version:
           SWVer: 1.3.3-2
[INFO] SUCCESS Switch cm01sw-ib2 has correct opensm configuration:
           controlled_handover=TRUE polling_retry_number=5 routing_engine=ftree sminfo_polling_timeout=1000 sm_priority=5 

[INFO] SUCCESS Switch cm01sw-ib3 has correct software and firmware version:
           SWVer: 1.3.3-2
[INFO] SUCCESS Switch cm01sw-ib3 has correct opensm configuration:
           controlled_handover=TRUE polling_retry_number=5 routing_engine=ftree sminfo_polling_timeout=1000 sm_priority=5 

[INFO] SUCCESS All switches have correct software and firmware version:
           SWVer: 1.3.3-2
[INFO] SUCCESS All switches have correct opensm configuration:
           controlled_handover=TRUE polling_retry_number=5 routing_engine=ftree sminfo_polling_timeout=1000 sm_priority=5 for non spine and 8 for spine switch5 
[root@cm01cel01 oracle.SupportTools]# 

28) Apply the minimal pack to database tier hosts (Section 6 of the README.txt).  Start by starting an LO console by SSH-ing into cm01dbm01-ilom and doing "start /SP/console"

28) Check imagehistory by running "# imagehistory".  We're in good shape, since we recently applied BP 13

29) Stop dbconsole for each database on cm01dbm01 (and cm01dbm02)

30) Stop cluster using /u01/app/11.2.0/grid/bin/crsctl stop cluster -f -all

31) Stop OSW by running "/opt/oracle.oswatcher/osw/stopOSW.sh"

32) Set memory settings in /etc/security/limits.conf.  On this step, since we'd setup hugepages, the previous values calculated by "let -i x=($((`cat /proc/meminfo | grep 'MemTotal:' | awk '{print $2}'` * 3 / 4))); echo $x" are commented out, and this is OK.

33) SCP db_patch_11.2.2.4.2.111221.zip from /tmp/patch_11.2.2.4.2.111221 to /tmp on cm01dbm01 and cm01dbm02.  When we apply these patches, we'll be applying from an SSH session on each Database tier host

34) Unzip /tmp/db_patch_11.2.2.4.2.111221.zip and go to /tmp/db_patch_11.2.2.4.2.111221 directory

35) Run "./install.sh -force" on cm01dbm02.  This will take a little while ...

36) While this is running, repeat above on cm01dbm01

37) On cm01dbm02 (first node patched), check imageinfo.  It should look like below:

[root@cm01dbm02 ~]# /usr/local/bin/imageinfo

Kernel version: 2.6.18-238.12.2.0.2.el5 #1 SMP Tue Jun 28 05:21:19 EDT 2011 x86_64
Image version: 11.2.2.4.2.111221
Image activated: 2012-02-11 23:26:55 -0500
Image status: success
System partition on device: /dev/mapper/VGExaDb-LVDbSys1

[root@cm01dbm02 ~]# 

38) Verify the ofa rpm by running "rpm -qa | grep ofa", comparing against kernel version.  It should look like this:

[root@cm01dbm02 ~]# rpm -qa | grep ofa
ofa-2.6.18-238.12.2.0.2.el5-1.5.1-4.0.53
[root@cm01dbm02 ~]# uname -a
Linux cm01dbm02.centroid.com 2.6.18-238.12.2.0.2.el5 #1 SMP Tue Jun 28 05:21:19 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
[root@cm01dbm02 ~]# 


39) Verify the controller cache is on using "/opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -a0".  You should see this:

Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU

40) Run "/opt/MegaRAID/MegaCli/MegaCli64 -AdpAllInfo -aAll | grep 'FW Package Build'" and ensure is says "FW Package Build: 12.12.0-0048"

41) Reboot the server (cm01dbm02) after running "crsctl stop crs"

42) Repeat steps 37-41 on cm01dbm01 when it's back up



Patching InfiniBand Switches

1) Login to cm01sw-ib2 as root

2) Check the version of the software - on our switches, we're already at  1.3.3.2 so we didn't actually need to do anything:

[root@cm01sw-ib2 ~]# version
SUN DCS 36p version: 1.3.3-2
Build time: Apr  4 2011 11:15:19
SP board info:
Manufacturing Date: 2010.08.21
Serial Number: "NCD4V1753"
Hardware Revision: 0x0005
Firmware Revision: 0x0000
BIOS version: SUN0R100
BIOS date: 06/22/2010
[root@cm01sw-ib2 ~]# 

3) Validate on cm01sw-ib3

Patching PDUs

1) Go to 13551280/Infrastructure/SunRackIIPDUMeteringUnitFirmware/1.04 and unzip the zip file

2) Transfer the *DL files to laptop

3) Login to PDUA (http://cm01-pdua.centroid.com/, in our case)

4) Click on Network Configuration and login as dmin

5) Go down to Firmware Upgrade and Choose MKAPP_V1.0.4.DL and click Submit

6) When done, update the HTML DL file.  It seems to "hang" for a very long time, but eventually ...

7) Repeat on cm01-pdub.centroid.com

Upgrade GI Home and RDBMS Home from 11.2.0.2 to 11.2.0.3

Prior to patching the latest BP14 updates to 11.2.0.3, we need to get our GI and RDBMS Homes updated to 11.2.0.3 by following MOS note 1373255.1.  There are a couple of sections of steps required:

- Prepare environments
- Install and Upgrade GI to 11.2.0.3
- Install 11.2.0.3 database software
- Upgrade databases to 11.2.0.3.  In our case, this includes dwprd, dwprod, and visx cluster database
- Do some post upgrade steps

1) Download 11.2.0.3 from https://updates.oracle.com/ARULink/PatchDetails/process_form?patch_num=10404530 and transfer to /u01/stg on cm01dbm01.

2) Since we're already at BP13, we don't need to apply 12539000 

3) Run Exachk to validate that the cluster is ready to patch.  MOS Document 1070954.1 contains details.  Download exachk_213_bundle.zip, unzip it, and run:

[oracle@cm01dbm01 ~]$ cd /u01/stg/exachk/
[oracle@cm01dbm01 exachk]$ ls
collections.dat  exachk_213_bundle.zip        exachk_dbm_121311_115203-public.html  ExachkUserGuide.pdf  readme.txt  UserGuide.txt
exachk  ExachkBestPracticeChecks.xls  Exachk_Tool_How_To.pdf      exachk.zip   rules.dat
[oracle@cm01dbm01 exachk]$ ./exachk

Our Exachk run showed a couple of issues, and we fixed the following:

- Set processes initialization parameter to 200 for both ASM instances
- Set cluster_interconnects to appropriate interface for dwprd and dwprod1
- Cleaned up audit dest files and trace/trm file for ASM instances, both nodes
- Set filesystemio_options=setall on all instances
- When done, bounced cluster using "crsctl stop cluster -f -all", followed by "crsctl start cluster -all"

4) Validate readiness of CRS by running cluvfy. Go to <stage>/grid, login as grid, and run this:

[grid@cm01dbm01 grid]$ ./runcluvfy.sh stage -pre crsinst -upgrade \
> -src_crshome /u01/app/11.2.0/grid \
> -dest_crshome /u01/app/11.2.0.3/grid \
> -dest_version 11.2.0.3.0 \
> -n cm01dbm01,cm01dbm02 \
> -rolling \
> -fixup -fixupdir /home/grid/fixit

- Failed on kernel parameters because grid didn't have access to /etc/sysctl.conf.  Ignore it.
- Failed on bondeth0 and some VIP stuff - ignore it.  I think this is a cluvfy bug


5) Create new GI Homes for 11.2.0.3.  Example below from cm01dbm01, but do this on both nodes:

[root@cm01dbm01 ~]# mkdir -p /u01/app/11.2.0.3/grid/
[root@cm01dbm01 ~]# chown grid /u01/app/11.2.0.3/grid
[root@cm01dbm01 ~]# chgrp -R oinstall /u01/app/11.2.0.3
[root@cm01dbm01 ~]# 

6) Unzip all the 10404530 software

7) No need to update OPatch software - I didn't this with the BP14 stuff from patch 13513783 (see next section)

8) Disable AMM in favor of ASMM for ASM instance. Follow the steps in the 1373255.1 document.  In our case, our SPFILE is actually a data file in $GI_HOME/dbs/DBFS_DG instead of the ASM disk group - I'm thinking about moving it with spmove and asmcmd, but I think I'll hold off for now.  When done it should look like below for both instances:


SQL> select instance_name from v$instance;

INSTANCE_NAME
----------------
+ASM2

SQL> show sga

Total System Global Area 1319473152 bytes
Fixed Size     2226232 bytes
Variable Size  1283692488 bytes
ASM Cache    33554432 bytes
SQL> 

9) Bounce databases and ASM and validate __shared_pool_size and __large_pool_size, along with values changed above.  Again, refer to the output above


10) Validate cluster interconnects.  This is how it shoud look - they need to be manually set:

SQL> select inst_id, name, ip_address from gv$cluster_interconnects
  2  /

   INST_ID NAME     IP_ADDRESS
---------- --------------- ----------------
2 bondib0    192.168.10.2
1 bondib0    192.168.10.1

SQL> create pfile='/tmp/asm.ora' from spfile;

File created.

SQL> !cat /tmp/asm.ora|grep inter
*.cluster_interconnects='192.168.10.1'
+ASM1.cluster_interconnects='192.168.10.1'
+ASM2.cluster_interconnects='192.168.10.2'

SQL> 

11) Shutdown visx, dwprd, and dwprod databases.  I'm going to shutdown everything for now for simplicity's sake

12) Login as "grid" to cm01dbm01 and unset ORACLE_HOME, ORACLE_BASE, and ORACLE_SID.  Get a VNC session established to we can launch the installer.  Run "./runInstaller" and follow the instructions on 1373255.1.  It will fail on VIP, Node connectivity, and patch 12539000 but I'm crossing my fingers and assuming this is a bug.  

(insert deep breath ...)

Things did install/upgrade fine, it took about 45 minutes.  The post-install CVU step failed with the same errors are the pre-CVU stage (network/VIP stuff), but this is OK.

13) Stop CRS on both nodes

14) Relink GI oracle executable with RDS

[grid@cm01dbm01 ~]$ dcli -g ./dbs_group ORACLE_HOME=/u01/app/11.2.0.3/grid \
> make -C /u01/app/11.2.0.3/grid/rdbms/lib -f ins_rdbms.mk \
> ipc_rds ioracle


15) Start CRS on both nodes

16) Login as oracle on cm01dbm01 and start a VNC session.  At this point, the 11.2.0.3 software has already been installed.

17) Unset ORACLE_HOME, ORACLE_BASE, and ORACLE_SID, and launch runInstaller.  The pre-req checks will fail on subnet and VIP details, as above - choose to ignore.

18) When installation completes, link oracle with RDS:

dcli -l oracle -g ~/dbs_group ORACLE_HOME=/u01/app/oracle/product/11.2.0.3/dbhome_1 \
          make -C /u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/lib -f ins_rdbms.mk ipc_rds ioracle

19) Copy OPatch from 11.2.0.2 directory to 11.2.0.3 directory - at the same time, might has well do the GI home as "grid".  When this is done, we can move on to the bundle patch



Patching Compute Nodes to 11.2.0.3 BP 14 (13513783)


1) Go to 13551280/Database/11.2.0.3 on first node and unzip p13513783_112030_Linux-x86-64.zip

2) Extract OPatch from 13551280/Database/OPatch on both cm01dbm01 and cm01dbm02, RDBMS and GI Homes.  For example:

[root@cm01dbm01 OPatch]# unzip p6880880_112000_Linux-x86-64.zip -d /u01/app/11.2.0/grid/
Archive:  p6880880_112000_Linux-x86-64.zip
replace /u01/app/11.2.0/grid/OPatch/docs/FAQ? [y]es, [n]o, [A]ll, [N]one, [r]ename: 

3) Make sure GI/OPatch files are owned by grid:oinstall and RDBMS/OPatch files are owned by oracle

4) Check inventory for RDBMS home (both nodes):

[oracle@cm01dbm01 ~]$ /u01/app/oracle/product/11.2.0.3/dbhome_1/OPatch/opatch lsinventory -detail -oh /u01/app/oracle/product/11.2.0.3/dbhome_1/

5) Check inventory for GI home (both nodes):

[grid@cm01dbm01 ~]$ /u01/app/11.2.0/grid/OPatch/opatch lsinventory -detail -oh /u01/app/11.2.0/grid/

6) Set permissions to oracle:oinstall to patch location:

[root@cm01dbm01 11.2.0.3]# chown -R oracle:oinstall 13513783/
[root@cm01dbm01 11.2.0.3]# 

7) Check for patch conflicts in GI home.  Login as "grid" and run the below.  You will see some conflicts:

[grid@cm01dbm01 ~]$ /u01/app/11.2.0/grid/OPatch/opatch prereq \
> CheckConflictAgainstOHWithDetail -phBaseDir /u01/stg/13551280/Database/11.2.0.3/13513783/13513783/

[grid@cm01dbm01 ~]$ /u01/app/11.2.0/grid/OPatch/opatch prereq \
> CheckConflictAgainstOHWithDetail -phBaseDir /u01/stg/13551280/Database/11.2.0.3/13513783/13540563/

[grid@cm01dbm01 ~]$ /u01/app/11.2.0/grid/OPatch/opatch prereq \
> CheckConflictAgainstOHWithDetail -phBaseDir /u01/stg/13551280/Database/11.2.0.3/13513783/13513982/

8) Check for patch conflicts on the RDBMS home, as oracle:

[oracle@cm01dbm01 11.2.0.3]$ /u01/app/oracle/product/11.2.0.3/dbhome_1/OPatch/opatch  prereq CheckConflictAgainstOHWithDetail -phBaseDir ./13513783/13513783/

[oracle@cm01dbm01 11.2.0.3]$ /u01/app/oracle/product/11.2.0.3/dbhome_1/OPatch/opatch  prereq CheckConflictAgainstOHWithDetail -phBaseDir ./13513783/13540563/custom/server/13540563

9) Login as root and add opatch directory for GI home to path

10) Patch by running the opatch command below - don't try to patch the GI and RDBMS homes alone:

[root@cm01dbm01 11.2.0.3]# opatch auto ./13513783/
Executing /usr/bin/perl /u01/app/11.2.0.3/grid/OPatch/crs/patch112.pl -patchdir . -patchn 13513783 -paramfile /u01/app/11.2.0.3/grid/crs/install/crsconfig_params
opatch auto log file location is /u01/app/11.2.0.3/grid/OPatch/crs/../../cfgtoollogs/opatchauto2012-02-13_00-34-53.log
Detected Oracle Clusterware install
Using configuration parameter file: /u01/app/11.2.0.3/grid/crs/install/crsconfig_params
OPatch  is bundled with OCM, Enter the absolute OCM response file path:
/u01/app/oracle/product/11.2.0.3/dbhome_1/OPatch/ocm/bin/ocm.rsp

11) The above succeeded on the first patch, failed on the second.  So I'm going to patch manually.  First, as oracle, ensure ORACLE_HOME is the new 11.2.0.3 home and run this: 

[oracle@cm01dbm01 ~]$ srvctl stop home -o $ORACLE_HOME -s /tmp/x.status -n cm01dbm01

12) Unlock crs by running this:

[root@cm01dbm01 11.2.0.3]# /u01/app/11.2.0.3/grid/crs/install/rootcrs.pl -unlock
Using configuration parameter file: /u01/app/11.2.0.3/grid/crs/install/crsconfig_params

13) Apply the first GI patch:

[grid@cm01dbm01 ~]$ /u01/app/11.2.0.3/grid/OPatch/opatch napply -oh /u01/app/11.2.0.3/grid/ -local /u01/stg/13551280/Database/11.2.0.3/13513783/13540563/

14) Apply second GI patch:

[grid@cm01dbm01 ~]$ /u01/app/11.2.0.3/grid/OPatch/opatch napply -oh /u01/app/11.2.0.3/grid/ -local /u01/stg/13551280/Database/11.2.0.3/13513783/13513982/

15) Login as oracle (database owner) on cm01dbm01 and run pre-script:

[oracle@cm01dbm01 scripts]$ pwd
/u01/stg/13551280/Database/11.2.0.3/13513783/13540563/custom/server/13540563/custom/scripts
[oracle@cm01dbm01 scripts]$ ./prepatch.sh -dbhome /u01/app/oracle/product/11.2.0.3/dbhome_1/
./prepatch.sh completed successfully.
[oracle@cm01dbm01 scripts]$ 

16) Apply BP patch to RDBMS home on cm01dbm01:

[oracle@cm01dbm01 13513783]$ pwd
/u01/stg/13551280/Database/11.2.0.3/13513783
[oracle@cm01dbm01 13513783]$ /u01/app/oracle/product/11.2.0.3/dbhome_1/OPatch/opatch napply -oh /u01/app/oracle/product/11.2.0.3/dbhome_1 -local ./13513783/

[oracle@cm01dbm01 13513783]$ /u01/app/oracle/product/11.2.0.3/dbhome_1/OPatch/opatch napply -oh /u01/app/oracle/product/11.2.0.3/dbhome_1 -local ./13540563/custom/server/13540563

17) Run post DB script:

[oracle@cm01dbm01 13513783]$ ./13540563/custom/server/13540563/custom/scripts/postpatch.sh -dbhome /u01/app/oracle/product/11.2.0.3/dbhome_1/
Reading /u01/app/oracle/product/11.2.0.3/dbhome_1//install/params.ora..

18) Run post scripts as root:

[root@cm01dbm01 11.2.0.3]# cd /u01/app/oracle/product/11.2.0
[root@cm01dbm01 11.2.0]# cd /u01/app/11.2.0.3/grid/
[root@cm01dbm01 grid]# cd rdbms/install/
[root@cm01dbm01 install]# ./rootadd_rdbms.sh 
[root@cm01dbm01 install]# cd ../../crs/installq
[root@cm01dbm01 install]# ./rootcrs.pl -patch
Using configuration parameter file: ./crsconfig_params

20) Repeat steps 11-18 on cm01dbm02, except in this case we need to first apply the first GI patch

21) At this point, we've got GI completely upgraded to 11.2.0.3 and a patched 11.2.0.3 home for our RDBMS tier, but our databases still live on 11.2.0.2.  Let's go on to the next section




Upgrading databases to 11.2.0.3 and Applying CPU BP bundle (see 1373255.1)

1) Start all databases on 11.2.0.2 and make sure they're healthy.  One thing I screwed up was on the GI installation, I put in the wrong ASMDBA/ASMOPER/ASMADMIN accounts.  This made it impossible for the database instances to start after things were patched (on 11.2.0.2).  I worked around it by adding "oracle" to all the "asm" groups (i.e. made its group membership look like grid).  I'll fix this later.

2) Run the upgrade prep tool for each database (NEW_HOME/rdbms/admin/utlu112i.sql).  Note that this took quite a long time on our Oracle EBS R12 database

3) Set cluster_interconnects to correct InfiniBand IP.  Below is an example from one of the 3 databases, but they look the same on all:

SQL> select inst_id, name, ip_address from gv$cluster_interconnects;

   INST_ID NAME     IP_ADDRESS
---------- --------------- ----------------
1 bondib0    192.168.10.1
2 bondib0    192.168.10.2

SQL> 


4) I don't have any Data Guard environments or listener_networks setup, so I can skip these sections from the README.txt

5) Launch dbua from the 11.2.0.3 home and to upgrade the first database.  It complained on a few underscore parameters and dictionary statistics, but I chose to ignore and move on.

6) After my first database was upgraded, I validated things by running "srvctl status database", checking /etc/oratab, checking V$VERSION, etc.

7) Repeat steps 5 and 6 for the remaining databases.

8) On each database that was upgraded, a couple of underscore parameter need to be reset - see below:

SYS @ dwprd1> alter system set "_lm_rcvr_hang_allow_time"=140 scope=both;

System altered.

Elapsed: 00:00:00.04
SYS @ dwprd1> alter system set "_kill_diagnostics_timeout"=140 scope=both;

System altered.

Elapsed: 00:00:00.01
SYS @ dwprd1> alter system set "_file_size_increase_increment"=2143289344 scope=both 
  2  ;

System altered.

Elapsed: 00:00:00.00

9) Apply Exadata Bundle Patch.  See below:

/u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/admin
[oracle@cm01dbm01 admin]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Mon Feb 13 13:28:55 2012

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options

SQL> @catbundle.sql exa apply.

- Make sure no ORA- errors exist in logs in /u01/app/oracle/cfgtoollogs/catbundle
- Check DBA_REGISTRY

10) Start applications and test

11) Start EM dbconsole for each database on first compute node


Finishing Up

1) Validate all your databases, CRS/GI components, etc.
2) Validate all ASM grid disks, Cell status
3) Cleanup staged patched from /tmp and/or other locations
4) Cleanup 11.2.0.2 GI and RDBMS Homes and ensure that all initialization parameters are pointing to the right spots
5) Fix group membership



Tuesday, May 6, 2014

Exadata exam 1z0-027

                  

                              Exadata exam 1z0-027 


Exadata -- For Exadata Database Machine Admins -- Filtered Information



This is actually someone else post. i have copied since it exactly covers only what is required for exam 

This post will be based on the information concerning Exadata Administrators.. You will find information gathered on the way for being certified on Exadata..
The information provided will be item by item, and I think
this post will be useful for Exadata Admins even though they already have field experience..
This post will be unstructured,.I mean the information provided in this document is not grouped into
subtitles, but it will present the whole picture about Exadata Administration, when you will read the all of it.
 The information provided in this document will be like snips from the Exadata knowledge..
In order to understand this documents, you should already know the basic concepts of Exadata.
( like Storage indexes, smart scan and etc.. )

Let start;

You can make indexes invisible to increase the chance for using storage indexes.
You can load your data stored on the filtered columnd to maximize the benefit of storage indexes.
Bind variables can be used with storage indexes.
Oracle Database QOS management can offer recommendations for the CPU bottlenecks. QOS
 management can not provide recommendations for the Global Cache resource bottlenecks. OQS
 management cannot resolve IO resource bottlenecks, too.
For media based backups, it s recommended to allocate equivalent number of channels and instances
 per tape drive.. For this type of backups, the network cables between Exadata and media servers
 should be connected through Exadata Database nodes. 
Bonding on Exadata can be used both for reliability and load balancing, but when I look to a Production Exadata Machine , which is X3, I see that the bonding is configured for reliability.

cat /proc/net/bonding/bondeth0
Ethernet Channel Bonding Driver: v3.2.3 (December 6, 2007)
Bonding Mode: fault-tolerance (active-backup)
....
BONDING_OPTS="mode=active-backup miimon=100 downdelay=5000 updelay=5000 num_grat_arp=100" 


active-backup or 1 -> Sets an active-backup policy for fault tolerance.
Transmissions are received and sent out via the first available bonded slave interface.
Another bonded slave interface is only used if the active bonded slave interface fails.


To guarentee proper cooling for an Exadata Machine, perforated floor tiles should be placed
at the front, because the air flow is from front to back.

Creating multiple grid disks on a single disk in Exadata provides us multiple storage pools
 with different performance characteristics and multiple pools that can be assigned to different
 databases.

Here is a general information about Disk layout in Exadata
Fiziksel disk -> Lun -> Cell Disk( its like a filsystem on Lun..) -> GridDisk ( it s like a partition)
Lastly Grid disks are served to the ASM diskgroups..
Note that , we can create flash based ASM diskgroups,too. They occupy may share space
with the Flash Cache on the flash disks.
Things to consider for Exadata migrations,
For Transportable Database method-> Source database should be 10.2.0.5 and little endian
For Data Pump method ->, 10.2.0.5 is good, but it is time consuming..
For Data guard physical method ->, Source platform must be Linux, Solaris x86 or Windows
 and source must be on 11.2.
For ASM rebalance method -> 11.2 Linux x86 64 database that uses  ASM w/ 4MB AU.
Transportable tablespaces method -> Big endian source >= 10.1 or Little endian source>=10.1,
 <11.2  is needed
Logical standby method -> Source does not need to be on 11g. Logical Standby is not
 supported from HPUX to Linux migrations.

Also here is an useful information about the methods:

Standby Physical Standby:
-------------------------------------------------
Source platform must be Linux, Solaris x86 or Windows (see Document 413484.1) and
source on 11.2
Source database in ARCHIVELOG mode
NOLOGGING operations to the database are not permitted, i.e. FORCE LOGGING is
 turned on at the database level

Transportable Database
-------------------------------------------------

Source system must be 10gR2 (10.2.0.5) and little endian
Stricter service level requirements that obviate the required downtime with Oracle Data Pump

Transportable Tablespaces
-------------------------------------------------
Any source platform
EBS Release 12 with source database RDBMS release 10gR2 (10.2.0.5) or higher
EBS 11i with source database RDBMS release 11.2
Service levels cannot tolerate a significant outage.  Using transportable tablespaces instead
 of Oracle Data Pump export/import can significantly reduce the outage time for large (> 300 GB)
EBS databases.  Thorough testing will determine the precise outage time.
For a point of reference, tests on an Oracle Exadata Database Machine quarter rack with the
 Rapid Install Vision database (about 300 GB) took about 12 hours.  This time should remain
 about the same regardless of the amount of data in the database.  This is because the metadata
creation takes the longest time in the migration process and accounts for the bulk of time.

Oracle Data Pump
-------------------------------------------------
Any source platform
Source database is RDBMS release 10.2 or higher
To implement Oracle Exadata Database Machine best practices on the target
Service levels can tolerate a significant outage.
For a point of reference, tests on an Oracle Exadata Database Machine quarter rack with the
Rapid Install Vision database (about 300 GB) took about 24 hours (export - 7:42; import - 16:42)
 using Network Storage and no dump file copy (i.e. the export dump storage was mounted on the
source and the target).
Timings will vary depending on your system configuration and increase as the amount of data
increases.

In Exadata, Oracle Enterprise Agents must be deployed to the compute nodes. .Oracle
Exadata Plug-in deployed with the Agent. Plugins allow you to monitor the following key
components of Exadata machine. There are several plugins for Grid Control and Cloud
Control (such as Avocent MergePoint Unity Switch, Cisco switch, Oracle ILOM, Infiniband switch, PDU) .. Note that : A trap forwarder is required to catch cisco switch and kvm traps due to
a port mismatch..
Agent communicates with Storage Server and Infiniband Switch targets directly. Oracle
 Exadata Plug-in also monitors the other DBM components. Oracle Enterprise Manager
12c agent collects data and communicates with the remote Enterprise Manager Repository.

In Exadata X3, we have 512 MB flashlogs on the storage servers.
In compute nodes, we have raid 5 arrays;

Virtual Drives
Virtual drive : Target Id 0 ,VD name DBSYS
Size : 556.929 GB 
State : Optimal
RAID Level : 5

Exadata -> Action plan to replace a Flash disk when Griddisks are created them,
Ref: Replacing FlashCards or FDOM's when Griddisks are created on FlashDisk's
 (Doc ID 1545103.1)

If the flash card needs to be replaced, drop the disks used in +ASM for the FlashDisk
Drop FlashCache / Flashlog and delete celldisks of type Flashdisk
Shutdown
Replace
create flashdisks
Now create your griddisk's back on DiskType: FlashDisk
add back the disks to the diskgroup. (it is optional ->If you used 'force' when dropping
the disks from ASM then Exadata auto management should automatically add these
 disks back into ASM.)

Note that, we have different failure types in Exadata. For example a disk which is in predictive
 failure state must be replaced immediately. ASM will drop these kind of disks automatically
from the associated disk group , and start a rebalance operation.
For flash a flash disk also; if the disk is in predictive failure then ->
If the flash disk is used for flash cache, then flash cache will be disabled on this disk thus
 reducing the effective flash cache size. If the flash disk is used for flash log, then flash log
will be disabled on this disk thus reducing the effective flash log size. If the flash disk is used
 for grid disks, then Oracle ASM rebalance will automatically restore the data redundancy..

If you want to change a memory dimm or want to make a similar activity, you just need
 to shutdown the affected cell.. The database, and the other cell servers will not be affected
 from this operation.
What you have to do is;
Ensure asmdeactivityoutcome of the Griddisks first..
Then inactive all the griddisks on that cell and ,shutdown the cell using shutdown -h now ..

quartery battery learn cycle is a normal maintanence activity.. It is used for charging the batteries
. When this kind of activity is happening, you can end up with a performance degregation in terms
 of I/O, as this event may require the related device to be put in the write through mode..
 Keep that in mind and, If you can or need, set the device back to write back mode...

Configure and use smart flash logs if you need a better LGWR performance, as LGWR writes
 redo data both flash and disk in parallel.. It considers whichever of these writes completes first,
 as done..
Smart Flash Logs are a new feature comes with 11.2.2.2.4 cell software. They are not for reading,
 they are used like a circular buffer for Redo writes.. Smart Flash logs can enhance the performance
of an OLTP database. By default they occupy 512 MB per cell.
 (32 MB size per flash disk *16 flash disk) space in the Flash Cards.. So Smart Flash logs reduce
 the size of Flash Cache in a manner.
Note that , Flash Smart logs can be enabled or disabled using IORM according to the
databases if needed..
Also LGWR writes and Controlfile IO will be automatically on high priority in IORM..
On the other hand, DBWR IO is managed automatically at normal priority.
Flash logs if needed somehow, can be dropped using drop flashlog command through the
cellcli utility residing on the storage servers..

Consider using CTAS for bulk data loading from external tables to Exadata.. CTAS automatically
uses direct path load.. Insert /*+append*/ can also be used for this kind of data loading, as by
using the append hint, oracle will use direct path loading in insert operations, too..

cellip.ora is the configuration file , if you want to separete cells which are connected by the
asm instances. This file is located on where ASM resides(compute node), and it basically
tells ASM which cells are available. cellip.ora is located in every compute node, and its
 contents are like following;

cat /etc/oracle/cell/network-config/cellip.ora
cell="192.168.10.10"
cell="192.168.10.11"
cell="192.168.10.12" 

This file should also be used if you want to add an expansion rack to the storage grid.
In default configuration , DATA and RECO diskgroups are build on top of non-interleaving disks.
 For detailed information about Interleaving , please see my following posthttp://ermanarslan.blogspot.com.tr/2013/12/exadata-zbr-and-interleaving.html
Also, in default configuration, we dont have any free space in Flash Disks;

CellCLI> list celldisk where name='FD_15_cel01' detail
name: FD_15_cel01
comment:
creationTime: 2012-01-10T10:13:06+00:00
deviceName: /dev/sdy
devicePartition: /dev/sdy
diskType: FlashDisk
errorCount: 0
freeSpace: 0 
id: 8ddbd2c8-8446-4735-8948-d8aea5744b35
interleaving: none
lun: 5_3
size: 22.875G
status: normal

Note that , you can use infiniband to connect an Exadata to a Exalogic. You can also use
 infiniband to connect and Exadata to a Oracle ZFS Storage ZS3, as ZS3 has a native infiniband
 connectivity.
Alternatively, Sun Zfs Storage 7420 Appliance can be connected to Exadata directly from infiniband.. 
In addition, you can connect any media servers which have infiniband cards, to Exadata via infiniband.
 Then you can connect those media servers to tape libraries to maximize the tape backup thorughput..
In terms of tape backups, Oracle docs suggest to have Disk-to-disk-to-tape, in other words D2D2T strategy, which allows keeping old backups on tape while retaining new/fresh backups on disks for achieving fast recovery times.  You can consider the following Oracle Slide as a good Exadata-tape backup scenario;


Uncommited transactions and migrated rows can cause cell single block physical reads even if you
are doing a Full table scan. Note that: Single block reads and Multi block reads may benefit from the
 Flash Cache,
Also note that, smart scan can not be done against Index Organized Tables and clustered tables.

When you need to apply a bundle patch  to Oracle Homes in Exadata, you will need to use oplan utility.
oplan utility generates instructions for applying patches, as well as instructions for rollback. It generates instructions for all the nodes in the cluster. Note that, oplan does not support DataGuard ..
 Oplan is supported since release 11.2.0.2 . It basically eases the patching process, because
 without it you need to read Readme files and extract your instructions yourself..
It is used as follows;
as Oracle software owner,(Grid or RDBMS) execute oplan;
$ORACLE_HOME/oplan/oplan generateApplySteps <bundle patch location>
it will create you patch instructions in html and txt formats;
$ORACLE_HOME/cfgtoollogs/oplan/<TimeStamp>/InstallInstructions.html
$ORACLE_HOME/cfgtoollogs/oplan/<TimeStamp>/InstallInstructions.txt
Then, choose the apply strategy according to your needs and follow the patching instructions to
 apply the patch to the target.
That 's it.. 
If you want to rollback the patch;
execute the following;(replacing bundle patch location)
$ORACLE_HOME/oplan/oplan generateRollbackSteps <bundle patch location>
Again,   choose the rollback strategy according to your needs and follow the patching instructions
 to rollback the patch from target.

It is mandatory to know which components/tools are running on which servers on Exadata;
Here is the list;

DCLI -> storage cell and compute nodes, execute cellcli commands on multiple storageservers
ASM -> compute nodes -- it is ASM instance basically
RDBMS -> compute nodes -- it is Databas software
MS-> storage cells, provides a Java interface to the CellCLI command line interface, as well as providing an interface for 

Enterprise Manager plugins.
RS -> storage cells,RS, is a set of processes responsible for managing and restarting other processes.
Cellcli -> storage cell, to run storage commands
Cellsrv ->storage cell. It receives and unpacks iDB messages transmitted over the InfiniBand interconnect and examines 

metadata contained in the messages
Diskmon -> compute node, In Exadata, the diskmon is responsible for:,Handling of storage cell failures and I/O fencing
,Monitoring of Exadata Server state on all storage cells in the cluster (heartbeat),Broadcasting intra database IORM 

(I/O Resource Manager) plans from databases to storage cells,Monitoring or the control messages from database and 
ASM instances to storage cells ,Communicating with other diskmons in the cluster. 

The following strategy should be used for applying patches in Exadata:
Review the patch README file (know what you are doing)
Run Exachk utility before the patch application (check the system , know current the situation of the system)
Automate the patch application process (automate it for being fast and to minimize problems)
Apply the patch
Run Exachk utility again -- after the patch application
Verify the patch ( Does it fix the problem or does it supply the related stuff)
Check the performance of the system(are there any abnormal performance decrease)
Test the failback procedure (as you may need to fail back in Production, who knows)


Multiple Grid disks can be created on a single Cell Disks, as you know. While creating multiple
Grid disks on a Single Disk, you can end up having multiple disks which have different performance characteristics . In order to have more balanced disk layout, you can use interleaving options or
Intelligent Data Placement technology which is based on ASM..

The internal Infiband network used in Exadata transmits IDB messages between compute nodes
and storage servers as well as Rac interconnect traffic packets between the compute nodes
 in the cluster.

Dbfs or a Nfs filesystem can be used as a stage for loading data from the external tables.
 If choosing Nfs to load data, the Nfs share should be mounted to the preferred compute node..
Dbfs can be created on DBFS_DG , as well as on a standart filesystem..
It can enhance performance if you need to bulk load data in to your Production System ,
which resides on Exadata.. By using DBFS created on ASM, you will have a parallelization in
 storage, which will enhance perfomance of IO.

Diagget.sh script can be used to gather the diagnostic information with software logs, trace files
and OS information..

Exadata storage server has alerts defined on it by default.. This alerts are based on some defined metrics.
. We can define new metrics also, this metrics will persist accross cell node reboots.

If you have a 11.1.0.2 database(little endian) and want to migrate it to Exadata with minimum downtime,
 you can upgrade it to 11.2.0 and use Data pump physical to minimize downtime or alternatively, you
 can use Golden Gate for this. Ofcourse you can use datapump, as well, but your with datapump the
 downtime will be significant..

According to the Non-interleaving disk configuration in Exadata(which is default), we can say that
 the first Grid diskdisks created using Create Griddisk... command will have the best performance.
  So the Diskgroup created based on the first created griddisk will have better performance than the
other Diskgroup on the Same Exadata Machine. ...

If you want to do  some administrating stuff on Exadata Storage servers, like dropping the cell disk
and etc;
you can use celladmin OS account in storage servers to do this kind of an operation. You can
use cellcli(on every cell) or dcli(from one cell to administer all of the cells) utility to execute your
 commands..
DCLI is a pyhton script. It is used to execute command on the cells remotely without login in to them
.. (In first execution it create ssh keys with the following -> dcli -k -g mycells.)

Note that, we have firewall (iptables) configured with Oracle Supplied rules on cell servers.

In OLTP systems,
Exadata write-back flash cache can enhance performance , as it also caches database block writes.
Also Flash log can be useful for enhanching perfomance of OLTP systems, as fast response time for Log Writer is crucial in OLTP..
For Big OLTP systems, Flash cache and Flash log can enhance performance and High Capacity Disks in Exadata can meet the storage size needs..

Note that , IDP, IDB and IORM manager can only seen on an Exadata Environment.

IORM is used for managing the Storage IO resources.

Here is a diagram for the description of the architecture of IORM. (reference: Centroid)


So we can manage our IO resources based on the Categories, Databases and consumer groups.
There is a hierarchy as you see in the picture.. The hierarchy used to distribute I/O.
IORM must be used on Exadata if you have a lot of databases running on Exadata Machine..
IORM is a friend of consolidation projects, in my opinion..

If you configured ASR manager in your environment, note that : database uses SNMP to transfer the notifications from database to ASR Manager and these notifications are forwarded using SNMP from
 ASR to Enterprise Managern In addition Faults are tranferred to the Oracle securely using https.

The fault telemetry sent from ASR manager is in the below format;

Telemetry Data:

System-id: System serial number.
¦ Host-id: hostname of the system sending the message.
¦ Message-time: time the message was generated.
¦ Product-name: Model name of the server


As known, Exachk is the utility to validate an Exadata Environments. We check the system using this
utility time to time(before patches, after patches and etc..) Also, it can be scheduled to run regularly..
To schedule exachk, we can create a cron job or we can create a job in Enterprise Manager..

Compression on Exadata can only be done on the Compute nodes.. Decompression, on the other hand,
 can be done on Compute Nodes or Storage Cells.. Decompression can be done on Storage Cells if the associated operation is based on the Smart Scan.. Same rule applies for Encryption and Decryption, too..

Here is a general information about compression types:
BASIC compression, introduced in Oracle 8 already and only recommended for Data Warehouse
OLTP compression, introduced in Oracle 11 and recommended for OLTP Databases as well
QUERY LOW compression (Exadata only), recommended for Data Warehouse with Load Time as a critical factor  --> HCC
QUERY HIGH compression (Exadata only), recommended for Data Warehouse with focus on Space Saving --> HCC
ARCHIVE LOW compression (Exadata only), recommended for Archival Data with Load Time as a critical factor --> HCC
ARCHIVE HIGH compression (Exadata only), recommended for Archival Data with maximum Space Saving --> HCC

If your index creation takes long time in Exadata, consider the following information:
Cell single block physical read can impact the peformance of an index creation activity. Cell single block physical reads are like db file sequential reads in a tradional system.
Migrated and chained rows can cause cell single block read events on an Exadata Machine.  Also uncommited rows during a query is handled based on the consistency.. For supplying the consistency, Database nodes may require additional blocks. These blocks are sent by the Cell servers to the database nodes.. This activity can cause cell single block physical reads , as well. If you have a lot of blocks in one of these conditions, then your index creation time can take long time..

When migrating to Exadata, you should consider Database Type(oltp or warehouse), size of the source database, the version of the source database and the Endian format of the Source Operating Systems.. By analyzing these inputs you can choose an optimal migration method and strategy.

In Exadata Enterprise Manager monitoring, the communication flow from the ILOM of the Storage Servers to Enterprise Manager is through the Storage Server's MSprocesses. ILOM sends data using snmp to MSprocess, and MSprocess send the data to the Enterprise Manager using snmp. Data is triggered and transfered through the snmp traps.
Based on the preset thresholds defined in ILOM, we can monitor motherboard,memory,power, and network cards of Database Nodes using Enterprise Manager. We can see the faults and alerts  produced for these hardware components from Enterprise Manager.

Oracle Auto Service Request (ASR) is a secure, scalable, customer-installable software feature of warranty and Oracle Support Services that provides auto-case generation when common hardware component faults occur. ASR manager can be installed on an external Oracle Linux or Oracle Solaris server. Also you can use one of Exadata db nodes for installing ASR manager (not preferred).
ASR manager communicates with Oracle using https..

In Exadata, some database work can be offloaded to Storage Servers as you know.. Besides operations like full table scan, single row functions and simple comparison operators, some joins can be offloaded to storage. Column filtering, Predicate filtering  and Virtual Column filtering are the things that can be offloaded to the Exadata Storage Servers, as well.
If you want to see all the functions that can benefit from smart scan; you can use the following sql:
select name from v$sqlfn_metadata
The output will be like;

>
<
>=
<=
=
!=
OPTTIS
OPTTUN
OPTTMI
OPTTAD
OPTTSU
OPTTMU
OPTTDI
OPTTNG
AVG
SUM
COUNT
MIN
MAX
OPTDESC
TO_NUMBER
TO_CHAR
NVL
CHARTOROWID
ROWIDTOCHAR
OPTTLK
OPTTNK
CONCAT
SUBSTR
LENGTH
INSTR
LOWER
UPPER
ASCII
CHR
SOUNDEX
ROUND
TRUNC
..
.....
.....

Operations like full table scan and fast full index scan executed in parallel always generate Smart scans.. For full table scan, you can see cell smart table scan, and for fast full index scan, you can see cell smart index scan events.. Fast full index scan operations can be executed through the smart scan because in fast full index scan, Oracle just reads the index block as they exist on the storage.. I mean a bulk read will be performed, and that makes Oracle to use smart scan even if it is an index operation.  Smart scan is performed during Direct path read operations.. Parallel queries use direct path read, that 's why they make use of smart scans.. 
So if we need to gather all together; in order to have smart scans, we need to execute queries in parallel, we need to use direct path reads towards the process memory and we need to have cell.smart_scan_capable parameter set to TRUE for our ASM diskgroups.

We have Sun servers in Exadata.. Compute nodes and storage servers are acutally Sun Servers. They have ILOM cards on them. These ILOM cardscan be used to administer these servers remotely.. For Example ILOM can be used to power-on database servers or open a remote console for a Storage Server.

Note that, if you want to check the status of all ports located in the Infiniband Switch, you can use Enterprise Manager or ibqueryerros.pl script located in the Infiniband switch.

To properly shutdown the Exadata, we need to first stop the database and grid services. We may use crsctl stop cluster -all command for this. Then we need to shutdown Exadata storage servers first. Then we need to shutdown Database servers. Lastly we need to power off the network switches and cut the power using power switches on PDUs. Why firstly shutdown storage servers? ( this is still a mystery for me)

After Exadata migrations, analysis are made to drop the indexes . This activity is done to increase the chance of using smart scans on our query, but care must be taken while dropping those indexes. For example, in an OLTP system we need fast response times for single block reads, so dropping an index may result a negative performance impact for some of our OLTP queries.. So it s better to drop an index after analyzing the queries of the corresponding application. You can use invisible indexes to see the difference .. You can check Execution plans to see if Oracle wants to use smart scan instead of an index access for a query.. According to your analysis, make the decision to drop the unnecessary indexes..

As you know Exadata contains Compute Nodes and Storage Nodes.. For Storage Nodes, you dont have the choice to use a different OS than Linux. Exadata Storage Servers are Linux, and will continue to be Linux. Oracle Linux servers are coming with Uek kernels.
 On the other hand, you can choose to have Solaris 11 rather than Linux for compute nodes. Operating systems are selectable at install time.

Note that, In Exadata we have different networks, like Management network and public network, infiniband network.

Following is a picture representing those networks; (it is for X4 actually, but it is useful)



Ssh, for example , works from the Management network, as it is an utility to manage the corresponding servers.  If you want to change the network that ssh listens from, you can use ssh config file to do that.

Following inputs can be used for configuring the Exadata Machine at install time..

Customer name
applcation
region
Timezone
Compute OS
Database Machine_Prefix
Admin Network -> Start Ip address for pool,pool size,Ending Ip address for pool,Subnet Mask,Gateway
Admin Network -> Database Server admin name, Storage Server Admin Name, ILON Name,
Client Network ->Start Ip address for pool,pool size,Ending Ip address for pool,Subnet Mask,Gateway, Adapter speed (1gbe/10gbe Base-t or 10gbE sFP+optical)
Infiniband NEtwork -> Start Ip address for pool, pool size, ending ip address for pool, subnet mask, compute node private name
Backup / Data Guard Ethernet NEtwork
Os configuration -> Domain , DNS, NTP , Grid ASM home os user ,asm dba group, asm home oper group, asm home admin group, rdbms home os user, rdbms dba group, oinstall group, rdbms home oper group, base loc for grid and rdbms --> you can set userid,groupid of all users and groups
Home and Database -> ınv location, grid home, db home loc, software install, db name, data reco group names and redundancy (cant change their sizes), block size , type DW or OLTP
Cell Alerting -> Enable Email Address
ASR conf
OCM conf
Grid Control agent

In order to actually have the Exadata Storage Servers send notifications via email (or alternately SNMP) each of the servers has to be configured with the appropriate settings. This is done using the ALTER CELL command in CELLCLI.

ALTER CELL smtpServer='mailserver', -
smtpFromAddr='exacel@blabla.com, -
smtpPwd='email_password', -
smtpToAddr='erm@blabla.com', -
notificationPolicy='critical,warning,clear', -
notificationMethod='mail'

The alerts may be stateful and stateless, as well..
If the alert is based on a threshold, it gets cleared automatically when it no longer violates the threshold;
For example; a filesystem alerts , as follows;

CellCLI> list alerthistory detail
name: 4_1
alertMessage: "File system "/" is 82% full, which is above the 80% threshold.
Accelerated space reclamation has started.
This alert will be cleared when file system "/" becomes less than 75% full."

Also alerts can be fired for Critical, Warning and Information purposes..

We have Storage indexes located in the physical memory of  the  Storage Servers.. Storage indexes are maintained automatically  the cellsrv and they are not persistent across reboots..(as they are in memory).. Cellsrv build these indexes based on the filter columns of the offloaded queries.
In Storage indexes, Oracle builds range regions based on the column values. By the use of storage indexes Oracle can easily find where to look in the storage for a given column value..
Maximum 8 columns for a table are indexed per storage region, but different storage regions can have different columns indexed for the same table.

Storage servers are very sensitive environments in Exadata, so Oracle doesnt support a lot of activities on them.. For example, we can change root password of these servers or we can set up a ssh equivalence for cellmonitor users(http://docs.oracle.com/cd/E11857_01/install.111/e12651/pisag.htm#CIHJGEHI)
 But for example, if we want to upgrade the storage server software, we need to use patchmgr utility..
the patchmgr utility is a tool Exadata Database Administrators use to apply (or rollback) an update to the Oracle Exadata Storage Cell.
Here is another restrictions is ;
Oracle Exadata Storage Server Software and the operating systems cannot be modified, and customers cannot install any additional software or agents on the Exadata Storage Servers.

Lastly, I will mention about placing multiple Exadata Machines in a System Room/Data Center.
If you have multiple Exadata Database machines, you need to place them side by side while ensuring the exhaust air of one rack does not enter the airinlet of another. If you have multiple clusters running on several Exadata Machines, you can place the racks that are part of a common cluster together/side by side..