Monday, September 27, 2021

Oracle Cloud -- Applying Patch on OCI

 There are multiple ways to apply patch

1) Using Console 
2) Using dbcli 
3) For exacloud we can use dbaascli   or  exadbcpatchmulti  

In this Blog we will mainly discuss about applying patch using console and dbcli . This was tested by my friend i still need to test it . 


Using OCI  Console :

The one-off patches (now they are call interim patches) can be applied via the Console, API or even manually. To apply an interim patch manually, you can use the Opatch utility. The detailed steps are provided in the Applying one-off patches on Oracle Database 21c documentation. The patches can be applied in any order.

Here is how simple and easy it is:

1. For the database on which you want to apply the patches, just click its name to display details and under Resources, click Updates:

2. Click on “Apply a one-off patch“:

3. Then, in the Apply one-off patch dialog, enter the patch numbers. Use a comma-separated list to enter more than one patch. I did apply them one after each other. Paste the patch number and then click Apply.

While the patch is being applied, the database’s status displays as Updating:

A work request is created for the patching operation allowing us to monitor the progress of the operation.

If the operation completes successfully, the database’s status changes to Available:



Using dbcli 

Prerequisites
The /u01 directory has at least 15 GB of free space.
The Oracle Clusterware is up and running on the DB system.
All nodes of the DB system are up and running.
Before starting the patching, we need to upgrade the DBCLI utility. For that perform the below steps


Login to OPC user with Public IP/Private IP by using private key. Switch the user to root.
 
[opc@abdu1 ~]$ sudo su -
[root@abdu1 bin]# export PATH=/opt/oracle/dcs/bin:$PATH
  • Update the CLI by using the cliadm update-dbcli command.
[root@abdu1 bin]# cliadm update-dbcli
[root@abdu1 bin]# dbcli list-jobs
[root@abdu1 bin]# dbcli describe-component

  • Display the latest patch versions available in Object Storage by using the dbcli describe-latestpatch command.
[root@abdu1 bin]# dbcli describe-latestpatch
  • Now, we can perform the prechecks on the server components (Grid InfraStruture) by using the dbcli update-server command with -p argument. Note the Job Id.

[root@abdu1 bin]# dbcli update-server --precheck
[root@abdu1 bin]# dbcli list-jobs

  • Apply the patch by using the dbcli update-server command. Note the Job Id
 

[root@abdu1 log]# dbcli update-server

[root@abdu1 log]#

[root@abdu1 log]# dbcli describe-job -i fa0996a6-7f99-48b2-92dc-5c1fb8e20481
[root@abdu1 log]# dbcli list-jobs
[root@abdu1 log]# dbcli describe-component

  • Now, we can start the precheck for the database home by using the dbcli update-dbhome command with -p argument . Note the Job Id.

[root@abdu1 log]# dbcli list-dbhomes
[root@abdu1 log]# dbcli update-dbhome -i c177404f-d22e-46b5-95ac-28ed5b2da50d -p
[root@abdu1 log]# dbcli list-jobs


  • Apply the patch by using the dbcli update-dbhome command. Note the Job Id.

[root@abdu1 log]# dbcli update-dbhome -i c177404f-d22e-46b5-95ac-28ed5b2da50d

[root@abdu1 log]#

 [root@abdu1 log]# dbcli describe-job -i 1de24101-07e0-4031-835d-de912c35656f
[root@abdu1 log]#
 
[root@abdu1 log]# dbcli list-jobs
[root@abdu1 log]# dbcli describe-job -i 1de24101-07e0-4031-835d-de912c35656f
 [root@abdu1 log]# dbcli describe-component
 
[root@abdu1 log]# dbcli list-databases




Exacloud patching using exadbcpatchmulti

1)  list patches:
/var/opt/oracle/exapatch/exadbcpatchmulti -list_patches -oh=sphw1-oz9nd1:/u02/app/oracle/product/19.0.0.0/dbhome_10
 
2)  take tar backup of onbuatc on both nodes in /acfs02/JULY_PSU_DBBACKUP:

node1:
nohup tar -pcvzf /acfs02/JULY_PSU_DBBACKUP/db19000_home_10_sphw1-oz9nd1_$(date "+%m-%d-%Y:%R:%S").tar.gz  /u02/app/oracle/product/19.0.0.0/dbhome_10 > /acfs02/JULY_PSU_DBBACKUP/db19000_dbhome_10_sphw1-oz9nd1.log &
 
node2:
nohup tar -pcvzf /acfs02/JULY_PSU_DBBACKUP/db19000_home_10_sphw1-oz9nd2_$(date "+%m-%d-%Y:%R:%S").tar.gz  /u02/app/oracle/product/19.0.0.0/dbhome_10 > /acfs02/JULY_PSU_DBBACKUP/db19000_dbhome_10_sphw1-oz9nd2.log &
 

3)  perform prechecks as root user:(do not perform prechecks at the same time on both nodes   

node1:
/var/opt/oracle/exapatch/exadbcpatchmulti -precheck_async 32895426 -instance1=sphw1-oz9nd1:/u02/app/oracle/product/19.0.0.0/dbhome_10
node2:
/var/opt/oracle/exapatch/exadbcpatchmulti -precheck_async 32895426 -instance1=sphw1-oz9nd2:/u02/app/oracle/product/19.0.0.0/dbhome_10
 
If any conflicting patch is there, we need to first check why and when that patch is applied, if it is removable and latest version of it available in 
support, we can rollback that conflicting patch rollback patch in rolling fashion:
on node1:
->make db down 
->rollback command: 
/u02/app/oracle/product/19.0.0.0/dbhome_10/OPatch/opatch rollback -id 33144001
->up the db nd check services
->perform same on n2
and 
try running precheck again on both nodes one after the other.
 
 
4) Put blackout for ONBUATC on both nodes for 6-8 hrs:
 


5)  Apply patch as root user:
nohup /var/opt/oracle/exapatch/exadbcpatchmulti -apply_async 32895426 -instance1=10.18.160.252:/u02/app/oracle/product/19.0.0.0/dbhome_10 &
node2:
nohup /var/opt/oracle/exapatch/exadbcpatchmulti -apply_async 32895426 -instance1=10.18.160.243:/u02/app/oracle/product/19.0.0.0/dbhome_10 &
(-run_datasql=1    --> to be added only for primary dbs not for standbys)
 
node1:
6) check 'list patches', patch must be applied
opatch lspatches
 


7)  now check db service nd pdb services
srvctl status database -d PRDFIKC_phx2fn
srvctl status service -d PRDFIKC_phx2fn
 

8)  proceed patch apply on node 2 and check services post patching



Below is  out of Box Patching Method Used . 

1) List Patch Images 
[root@hostname1 ~]# dbaascli cswlib showImages

2) Create New Home 
[root@hostname1 ~]# dbaascli dbhome create --version 19000

3) Patch Pre check on New Home 
[root@hostname1 ~]# dbaascli dbhome patch --oracleHome /u02/app/oracle/product/19.0.0.0/dbhome_12 --targetVersion 19.11.0.0.0 --executePrereqs

4) Patch New Home 
[root@hostname1 ~]# dbhome patch --oracleHome /u02/app/oracle/product/19.0.0.0/dbhome_12 --targetVersion 19.11.0.0.0 --skipDatapatch

5) Move Database to New Home
[root@hostname1 ~]# dbaascli database move --dbname testdb01 --ohome /u02/app/oracle/product/19.0.0.0/dbhome_12

6) Apply data patch
[oracle@hostname1 OPatch]$  $ORACLE_HOME/OPatch/datapatch -verbose

7) Purge Old Home .
[root@hostname1 ~]# dbaascli dbhome purge



References : 

https://docs.oracle.com/en/cloud/paas/exadata-cloud/csexa/apply-patch.html#GUID-50BDEF7D-A30E-4B32-BAE7-486538413E2D
https://docs.oracle.com/en-us/iaas/Content/Database/Tasks/patchingDatabase.htm
https://docs.oracle.com/en-us/iaas/Content/Database/Tasks/exapatching.htm
https://docs.oracle.com/en-us/iaas/Content/Database/Tasks/examanagingpatching.htm
https://docs.oracle.com/en-us/iaas/Content/Database/Tasks/patchingDB.htm
https://docs.oracle.com/en/cloud/cloud-at-customer/exadata-cloud-at-customer/exacc/patching.html


Friday, September 24, 2021

Oracle PGA_AGGREGATE_LIMIT dependency on Processes parameter

 Intention to write Blog  is recently we faced scenario where PGA_AGGREGATE_LIMIT value was automatically increasing when it been set lower then desired value . 

The initialization parameter PGA_AGGREGATE_LIMIT has been introduced since Oracle Database 12.1.0.1. It is used to put a hard limit on PGA memory usage. If PGA usage exceeds the PGA_AGGREGATE_LIMIT value defined, Oracle Database aborts or terminates the sessions or processes that are consuming the most  PGA memory.

From Oracle 12.1.0.2,  By default,  PGA_AGGREGATE_LIMIT parameter is set to the greater of 2 GB, 200% of the PGA_AGGREGATE_TARGET value, or 3 MB times the value of the PROCESSES parameter. However, it will not exceed 120% of the physical memory size minus the total SGA size.“


Since Oracle Database 12.2.0.1, the default value calculation has been adjusted again as below:

> If MEMORY_TARGET is set, then PGA_AGGREGATE_LIMIT defaults to the MEMORY_MAX_TARGET value.

> If MEMORY_TARGET is not set, then PGA_AGGREGATE_LIMIT defaults to 200% of PGA_AGGREGATE_TARGET.

> If MEMORY_TARGET is not set, and PGA_AGGREGATE_TARGET is explicitly set to 0, then the value of PGA_AGGREGATE_LIMIT is set to 90% of the      physical memory size minus the total SGA size.

> In all cases, the default PGA_AGGREGATE_LIMIT is at least 2GB and at least 3MB times the PROCESSES parameter (and at least 5MB times the      PROCESSES parameter for an Oracle RAC instance).



Reference:

Limiting Process Size with Database Parameter PGA_AGGREGATE_LIMIT (Doc ID 1520324.1)

Swapping While Sum of SGA Usage And PGA_AGGREGATE_LIMIT Is Larger Than The Amount Of Physical Memory (Doc ID 2273931.1)

ORA-38856: cannot mark instance UNNAMED_INSTANCE_2 : after rac database restore



When Performing a RAC 2 node Production Database Restore with disk based backup, by duplicating to a single instance database, Restoration went successful.
Once Restore has completed, i have tried to open the database with RESETLOGS option. But end with the below error, So thought of sharing this little information.


##########################
#   Errors 
##########################
RESETLOGS after incomplete recovery UNTIL CHANGE 13115861554203
ORA-38856 signalled during: alter database open resetlogs…
ORA-38856: cannot mark instance UNNAMED_INSTANCE_2 (redo thread 2) as enabled

##########################
#   Command Used  
##########################
Alter Database open ResetLogs;


##########################
#   Informations  
##########################
This seems to be Bug 4355382 and its expected while doing RAC Restore/Recovery.



##########################
#  Solution 
##########################
  ADD _no_recovery_through_resetlogs Parameter and set it to TRUE. 

I have added _no_recovery_through_resetlogs=TRUE parameter to our PFILE and brought up the database to Mount Stage.
Now opened the database with RESETLOGS options and it worked.
This parameter tells oracle not to do any recovery while doing this resetlogs operation.
After Opening the database, the parameter can be removed from the pfile.

Tuesday, September 14, 2021

Oracle - Sql ID Signature , FORCE_MATCHING_SIGNATURE , EXACT_MATCHING_SIGNATURE ,Adaptive Cursor Sharing

 

In this Blog we will try to get some insight on below topics which are all interlinked to understand hard parsing 

1) How different sql id having same signature , FORCE_MATCHING_SIGNATURE and EXACT_MATCHING_SIGNATURE are related 
2) What is bind peeking and Bind Seeking 
3) Adaptive Cursor Sharing  ,  is_bind_sensitive and is_bind_aware
5) Literals  vs bind variables 
6) Cursor sharing parameter 
7) Usefull sql statements 
8) What is skew data 
 


Sql performance depends on 

1) Cursore sharing  parameter  
2) Use of Bind Variables 
3) If data is skewed 
4) Optimizer mode parameter 
5) Whether histograms  statistics are gathered . 



########################################
Sqlid signature , FORCE_MATCHING_SIGNATURE and EXACT_MATCHING_SIGNATURE 
########################################

Use of bind variables in SQL statements is a must to make the transaction processing application scalable. 

To find the queries that don’t use bind variable and are parsed each time they are executed we may see Signature 

Starting with Oracle 10gR2, two interesting columns – FORCE_MATCHING_SIGNATURE and EXACT_MATCHING_SIGNATURE – were added in a number of tables and views. We could see them in V$SQL, V$SQLAREA, DBA_HIST_SQLSTAT and DBA_HIST_ACTIVE_SESS_HISTORY data dictionaries.  As you see they are present all over the place – shared pool, ASH, AWR and Statspack. That means we have good choice of source to look for problematic sql’s.

EXACT_MATCHING_SIGNATURE means, a signature is calculated on the normalized sql text. The normalization includes the removal of white space and uppercasing of all the non-literal strings.

FORCE_MATCHING_SIGNATURE means, a signature used as if CURSOR_SHARING set to FORCE. It seems to be another hash value calculated from SQL statement.

As noted before FORCE_MATCHING_SIGNATURE is calculated from sql text as if CURSOR_SHARING set to FORCE – that forces sql statements as to share cursors by replacing constants with bind variables, so all statements that differ only by the constants share the same cursor.  

Let’s have a look at the FORCE_MATCHING_SIGNATURE value for the same sql’s.
When Cursor_sharing=force ==> force_matching_signature applies
When Cursor_sharing=exact ==> exact_matching_signature applies




########################################
Bind peeking and Bind Seeking 
########################################

Sharing the same execution plan for different bind variables is not always optimal as far as different bind variables can generate different  data volume. This is why Oracle introduces bind variable peeking feature which allows Oracle to peek at the bind variable value and give it the best execution plan possible. However, bind variable peeking occurs only at hard parse time which means as far as 
the query is not hard parsed it will share the same execution plan that corresponds to the last hard parsed bind variable. 

Bind Variable Peeking was introduced in Oracle 9i. Prior to Bind Variable Peeking the Oracle optimizer did not know the value being  used in the statement and could end up choosing a poor execution plan. Bind Variable Peeking will look at the value of the bind variable the very first time the statement is executed and generate an execution plan. The cached execution plan will be optimal based on the first 
value sent to the bind variable. Subsequent executions of this statement will use the same execution plan regardless of the value of the bind variable. 

Using Bind Variable Peeking is good if the bind variable is selecting from a column which is not skewed. This means the initial  execution plan for the statement will be optimal for all values of the bind variable. For example, a bind variable for the emp.id is a good idea as the value is unique in the table. The initial execution plan will be good for all statements against id. On the other hand, using  a bind variable for emp.deptid could pose problems. Let say there are two departments. Department 20 has 3 employees and department 21 has  10,000 employees. As the emp.deptid data is skewed, the initial execution and execution plan of the statement may not be optimal for subsequent executions of the statement. Looking at select name from emp where deptid=:deptid, with deptid set to 20 returns 3 rows. 

If this was the initial statement, the optimizer would create an execution plan which would use an index. If deptid is then set to 21, where 10,000 rows are returned, the optimizer will still use the initial plan and use the index. Using the index in this case is not optimal  as a large percentage of the table will be returned. A full table scan would be better. So you see the problem with bind variable peeking. 
Oracle 11g overcomes the Bind Variable Peeking problem with the new Adaptive Cursor Sharing feature. Due to the Bind Peeking problem, some developers will purposely use literal values, for fields with highly skewed data, to avoid the Bind Variable Peeking problem.

When they use literal values it forces Oracle to create a single cursor with its own execution plan. This ensures the query will be  executed optimally.




########################################
Adaptive Cursor Sharing  ,  is_bind_sensitive and is_bind_aware
########################################

In order to avoid such situation Oracle introduces in its 11gR2 release, Adaptive Cursor Sharing allowing Oracle to adapt itself to  the bind variable when necessary without having to wait for a hard parse of the query.

In v$sql this is indicated by the columns is_bind_sensitive and is_bind_aware. The former indicates that a particular sql_id is a candidate  for adaptive cursor sharing, whereas the latter means that Oracle acts on the information it has gathered about the cursor and alters the  execution plan.

Problematic is that adaptive cursor sharing can only lead to an improved plan after the SQL statement has tanked at least once. 

You can bypass the initial monitoring by supplying the BIND_AWARE hint: it instructs the database that the query is bind sensitive and adaptive cursor sharing should be used from the very first execution onwards. A prerequisite for the hint to be used is that the bind variables only appear in the WHERE clause and an applicable histogram is available. The hint may improve the performance but 
you should be aware that it’s rarely the answer in the case of generic static statements, which we describe below. 

The NO_BIND_AWARE hint does exactly the opposite: it disables bind-aware cursor sharing.



########################################
Literals  vs bind variables 
########################################

A literal means the values being compared in the SQL statement are hard coded in the statement itself.
An example of a literal value would be, select name from emp where id=10.
The use of literal values will cause many unique statements to be cached as each literal value causes the statement to be different. 

This will cause more space to be used in the Share Pool. With the use of bind variables the statement remains the same, therefore there is only one statement cached as opposed to many

A bind variable is created and set to a particular value, which is then used by a SQL statement.
This allows the developer to assign any value to the bind variable and the SQL statement will use the new value. For example,  select name from emp where id=:id. The :id is the bind variable and the values being passed into the statement can change as the developer needs.



########################################
Cursor Sharing
########################################

The cursor_sharing parameter can be set to one of three values, FORCE, EXACT or SIMILAR. This parameter is really telling the Oracle  server process how to handle statements which have literal values. If the parameter is set to FORCE or SIMILAR the Oracle server process  will strip out the literal values in the SQL statements and generate system generated bind variables. With FORCE or SIMILAR all statements will go through the Bind Variable Peeking process. At first I though both SIMILAR and FORCE will expose and amplify the Bind Peeking problem

In my testing I determined that SIMILAR does not expose the Bind Peeking problem, but FORCE does. 
If the parameter is set to EXACT, the  Oracle server processes the query as it is and generates an execution plan based on the query. With EXACT literal values are maintained. 

Here are the tests I performed. Notice that both SIMILAR and EXACT act the same when literal and bind variables are used.

According to MOSC Doc 11657468.8, adaptive cursor sharing can be disabled by setting hidden parameters:
 
alter system set "cursor_sharing"=exact scope=both;
alter system set "_optimizer_extended_cursor_sharing_rel"=none scope=both;
alter system set "_optimizer_extended_cursor_sharing"=none scope= both;
alter system set "_optimizer_adaptive_cursor_sharing"=false scope= both;



########################################
Skew data
########################################
Skewed columns are columns in which the data is not evenly distributed among the rows.
For example, suppose:
You have a table order_lines with 100,000,000 rows
The table has a column named customer_id
You have 1,000,000 distinct customers

Some (very large) customers can have hundreds of thousands or millions of order lines.
In the above example, the data in order_lines.customer_id is skewed. On average, you'd expect each distinct customer_id to have 100 order lines
(100 million rows divided by 1 million distinct customers). But some large customers have many, many more than 100 order lines.

This hurts performance because Oracle bases its execution plan on statistics. So, statistically speaking, Oracle thinks it can access order
lines based on a non-unique index on customer_id and get only 100 records back, which it might then join to another table or whatever using 
a NESTED LOOP operation.

But, then when it actually gets 1,000,000 order lines for a particular customer, the index access and nested loop join are hideously slow. 

It would have been far better for Oracle to do a full table scan and hash join to the other table.
So, when there is skewed data, the optimal access plan depends on which particular customer you are selecting!

Oracle lets you avoid this problem by optionally gathering "histograms" on columns, so Oracle knows which values have lots of rows and 
which have only a few. That gives the Oracle optimizer the information it needs to generate the best plan in most cases.




########################################
Useful sql queries 
########################################

-- Converting Sql id to Signature 
SET SERVEROUTPUT ON
DECLARE
 SQLTEXT   CLOB;
 SIG       NUMBER;
BEGIN
 SELECT SQL_TEXT
 INTO SQLTEXT
 FROM DBA_HIST_SQLTEXT
 WHERE SQL_ID = '80xyxfffu6awb';
 SIG := DBMS_SQLTUNE.SQLTEXT_TO_SIGNATURE (SQLTEXT, FALSE);
 DBMS_OUTPUT.PUT_LINE ('SIGNATURE=' || SIG);
END;
/
 

-- If you want to find Bind variable for SQL_ID
col VALUE_STRING for a50  
SELECT NAME,POSITION,DATATYPE_STRING,VALUE_STRING FROM gv$sql_bind_capture WHERE sql_id='&sql_id';


-- checking Adaptive Cursor sharing 
   select
    hash_value,
   sql_id,
   child_number,
   range_id,
   low,
   high,
   predicate
from
   v$sql_cs_selectivity;
 
select
   hash_value,
   sql_id,
   child_number,
   bucket_id,
   count
from
   v$sql_cs_histogram;
 
select
  sql_id,
  hash_value,
  plan_hash_value,
  is_bind_sensitive,
  is_bind_aware,
  sql_text
from
   v$sql;
 
select
   hash_value,
   sql_id,
   child_number,
   bind_set_hash_value,
   peeked,
   executions,
   rows_processed,
   buffer_gets
   cpu_time
from
   v$sql_cs_statistics; 


  col exact_matching_signature for 99999999999999999999999999
  col sql_text for a50
  select sql_id,force_matching_signature, exact_matching_signature, SQL_TEXT from v$sqlarea where UPPER(sql_text) like ‘%DUMMY%’ order by UPPER(sql_text);
 
 col force_matching_signature for 99999999999999999999999999
 select * from (select force_matching_signature, count(*) "Count" from v$sqlarea group by force_matching_signature order by 2 desc) where rownum<=3;
 
 select sql_fulltext from v$sql where force_matching_signature=7415896326081021278 and rownum=1;
 



########################################
References : 
########################################
Adaptive Cursor Sharing Overview [ID 740052.1] 


Tuesday, September 7, 2021

Exadata Disk replacement

 
In this  blog  We will be concentrating on disk  replacement strategy for Exadata . However before proceeding on disk replacement we will also try to get some insight on   Asm redundancy levels in Exadata and failover groups . 



When we created these ASM disk groups, we specified normal redundancy. With Exadata, external redundancy is not an option - you either need to use normal or high redundancy. With normal redundancy, each extent is mirrored to a different cell, and with high redundancy, they are mirrored via ASM to two additional cells. Specifically, extents are mirrored to partner disks in different failure groups.

On an Exadata Quarter Rack, Oracle only supports normal redundancy because there are only three failure groups, one for each cell. While this certainly seems like it should be enough failure groups to support three extent mirrors, the challenges comes with Oracle's requirement on OCR and voting disks. 

With high redundancy ASM disk groups, Oracle requires five failure groups. Since Oracle cannot guarantee that you won't place your OCR/voting disks on this ASM disk group, it simply prevents you from creating it. This being said, the syntax above for creating a high redundancy ASM disk group is valid for the Half Rack and Full Rack configurations.

In the Exadata environment, the failure group for disks on each cell node is cellnode itself. In this way, the blocks on the disk in the cell node is also present on another cellnode, so even if the cellnode is turned off, there will be no problem.



Before we proceed we need to understand what is Cell Disk  and Grid Disk 

The Exadata Storage Server contains 12 physical disks.
There is one to one relationship between a physical disk and the cell disk. One physical disk corresponds to a single cell disk.
Then a Grid disk comprises of many cell disks.
And a Grid disk corresponds to one ASM disk.
An ASM diskgroup comprises of many ASM disks.


For disk maintenance activity asmdeactivationoutcome  for  grid disk  and DISK_REPAIR_TIME for diskgroup is important .


 Below are 2 files that make cell nodes visible to Database node ( Compute Node ) 
 cellinit.ora -- decides which network takes storage traffic.
 cellip.ora - list of cells, new cells can be added dynamically without shutdown


 Below is hierarchy in which disk are created 
 Physical disk / Cell disk is allocated to Cell 
 Grid disk is created on top of Cell Disk 
 Asm  disk is allocated using Grid Disk 


Each Cell comes with 12 SAS Harddisks (600 GB each with High Performance resp. 2 TB each with High Capacity). The picture below shows a Cell with the 12 Harddisks on the front:
Also each Cell has 4 Flashcards built in that are divided into 4 Flashdisks each, summarizing to 16 Flashdisks in each Cell that deliver by default 384 GB Flash Cache.  


1) Physical Disks
Physical Disks can be of the type Harddisk or of the type Flashdisk. You cannot create or drop them. The only administrative task on that layer can be to turn the LED at the front of the Cell on before you replace a damaged Harddisk to be sure you pull out the right one, with a command like
CellCLI> alter physicaldisk  serviceled on



2) Luns
Luns are the second layer of abstraction. They have been introduced, because the first two Harddisks in every Cell are different than the other 10 in so far as they contain the Operating System (Oracle Enterprise Linux). About 30 GB have been carved out of the first 2 Harddisks for that purpose. We have 2 of them for redundancy  – the Cell can still operate if only one of the first 2 Harddisks fails. If we investigate the first 2 LUNs, we see the mirrored OS Partitions. Jo 
As an Administrator, you do not need to do anything on the Lun Layer except looking at it with commands like
CellCLI> list lun



3) Celldisks
Celldisks are the third layer of abstraction. It was introduced to enable interleaving in the first place.
CellCLI> create celldisk all harddisk interleaving='normal_redundancy'
CellCLI> create celldisk all harddisk


4) Griddisks
Griddisks are the fourth layer of abstraction, and they will be the Candidate Disks to build your ASM diskgroups from.
With interleaving specified at the Celldisk layer, this is different: The Griddisks are then being created from outer and inner parts of the Harddisk, leading to equal performance of the Griddisks and also then of the later created Diskgroups.


5) Asm  diskgroups 

Asm diskgroups are created using Grid disk from step 4 



++++++++++++++++++++++++++++++++++++++++++++++++++++
Strategy for Disk  replacement depends on  below scenario
++++++++++++++++++++++++++++++++++++++++++++++++++++
1) If disk is   failed /total failure 
2) Disk is slow performance or  Predictive Failures
3)  if   poor performance disk to be replace is flash disk 



 
 
++++++++++++++++++++++++++++++++++++++++++++++++++++
Flash Disk Replacement due to poor performance in Exadata 
++++++++++++++++++++++++++++++++++++++++++++++++++++
 
To identify a poor performance flash disk, use the following command:

CellCLI> LIST PHYSICALDISK WHERE DISKTYPE=flashdisk AND STATUS= 'warning - poor performance' DETAIL
 
Recommended Action  
The flash disk has entered poor performance status. A white cell locator LED has been lit to help locate the affected cell. 
Please replace the flash disk.
If the flash disk is used for flash cache, then flash cache will be disabled on this disk thus reducing the effective flash cache size. 
If the flash disk is used for flash log, then flash log will be disabled on this disk thus reducing the effective flash log size.
If the flash disk is used for grid disks, then Oracle ASM rebalance will automatically restore the data redundancy.
Sun Oracle Exadata Storage Server is equipped with four PCIe cards. Each card has four flash disks (FDOMs) for a total of 16 flash disks. 
The 4 PCIe cards are present on PCI slot numbers 1, 2, 4, and 5. 
The PCIe cards are not hot-pluggable such that Exadata Cell must be powered down before replacing the flash disks or cards.
Hence DataCenter Team replaced a flash disk in co-ordination with us (DBA) because the flash disk was in poor performance status.


[1] Verify the existing disk_repair_time attribute for all diskgroups

SQL> select dg.name,a.value from 
v$asm_diskgroup dg, v$asm_attribute a 
where dg.group_number=a.group_number and
a.name='disk_repair_time';


[2] The default disk_repair_time is 3.6 hours only so better to adjust.

 SQL> ALTER DISKGROUP DATA SET ATTRIBUTE 'DISK_REPAIR_TIME'='8.5H';





1. Shut down the cell.
The following procedure describes how to power down Exadata Cell.Run the following command to check if there are offline disks 
on other cells that are mirrored with disks on this cell:

CellCLI > LIST GRIDDISK ATTRIBUTES name WHERE asmdeactivationoutcome != 'Yes'

If any grid disks are returned, then it is not safe to take the storage server offline because proper Oracle ASM disk group redundancy 
will not be intact. Taking the storage server offline when one or more grid disks are in this state will cause Oracle ASM to dismount 
the affected disk group, causing the databases to shut down abruptly.
Inactivate all the grid disks when Oracle Exadata Storage Server is safe to take offline using the following command:

CellCLI> ALTER GRIDDISK ALL INACTIVE

The preceding command will complete once all disks are inactive and offline. Depending on the storage server activity,
 it may take several minutes for this command to complete.
Verify all grid disks are INACTIVE to allow safe storage server shut down by running the following command.

CellCLI> LIST GRIDDISK

If all grid disks are INACTIVE, then the storage server can be shutdown without affecting database availability.


Stop the cell services using the following command:

CellCLI> ALTER CELL SHUTDOWN SERVICES ALL
Shut down the cell.

2. Replace the failed flash disk based on the PCI number and FDOM number.

3. Power up the cell. The cell services will be started automatically.

4.Bring all grid disks are online using the following command:
CellCLI> ALTER GRIDDISK ALL ACTIVE



5. Verify that all grid disks have been successfully put online using the following command:
CellCLI> LIST GRIDDISK ATTRIBUTES name, asmmodestatus
        Wait until asmmodestatus from SYNCING to ONLINE for all grid disks. 
 
CellCLI> LIST GRIDDISK ATTRIBUTES name, asmmodestatus


      
6. Verify FlashCache 

CellCLI> list flashcache
orclcel05_FLASHCACHE warning - degraded

If the flashcache state is not normal after the flash disk change, drop flashcache, flashlog, and flashdisk drop and recreate in the same order.

Exadata X6 systems do not require this step.

CellCLI> drop flashcache
Flash cache orclcel05_FLASHCACHE successfully dropped
 
CellCLI> drop flashlog
Flash log orclcel05_FLASHLOG successfully dropped
 
CellCLI> drop celldisk all flashdisk force


CellCLI> create celldisk all flashdisk
 
CellCLI> create flashlog all
Flash log orclcel05_FLASHLOG successfully created
 
CellCLI> create flashcache all
Flash cache orclcel05_FLASHCACHE successfully created





Oracle ASM synchronization is only complete when all grid disks show attribute asmmodestatus=ONLINE. Before taking another storage server offline, Oracle ASM synchronization must complete on the restarted Oracle Exadata Storage Server. If synchronization is not complete, then the check performed on another storage server will fail.

The new flash disk will be automatically used by the system. If the flash disk is used for flash cache, then the effective cache size will increase. If the flash disk is used for grid disks, then the grid disks will be recreated on the new flash disk. If those gird disks were part of an Oracle ASM disk group, then they will be added back to the disk group and the data will be rebalanced on them based on the disk group redundancy and asm_power_limit parameter.

Oracle ASM rebalance occurs when dropping or adding a disk. To check the status of the rebalance, do the following:

The rebalance operation may have been successfully run. Check the Oracle ASM alert logs to confirm
The rebalance operation may be currently running. Check the GV$ASM_OPERATION view to determine if the rebalance operation is still running.

The rebalance operation may have failed. Check the GV$ASM_OPERATION.ERROR view to determine if the rebalance operation failed.

Rebalance operations from multiple disk groups can be done on different Oracle ASM instances in the same cluster if the physical disk being replaced contains ASM disks from multiple disk groups. One Oracle ASM instance can run one rebalance operation at a time. If all Oracle ASM instances are busy, then rebalance operations will be queued.






++++++++++++++++++++++++++++++++++++++++++++++++++++
Replacing Failed disk in Exadata 
++++++++++++++++++++++++++++++++++++++++++++++++++++

Hard Disks on the cell node are automatically dropped at the time of damage, and the disks of the asm disk groups will also be deleted. 
After the disk drop operation and after the disk changes, rebalance process will automatically start in ASM disk groups.

 
1) 
CellCLI> List alerthistory 
CellCLI> LIST PHYSICALDISK WHERE diskType=HardDisk AND status like failed DETAIL



2) This query should return no rows indicating the disk is no longer in the ASM diskgroup configuration.    If this returns any other value, then contact the SR owner for further guidance.

SQL> select group_number,path,header_status,mount_status,mode_status,name from V$ASM_DISK where path like 'Í_05_abcdecel02';



3) The ALTER PHYSICALDISK command sets the Service Action Required LED on or off.

CellCLI> ALTER PHYSICALDISK n:n DROP FOR REPLACEMENT
CellCLI> alter physicaldisk 20:11 drop for replacement




4)   If Drop is Hung  in Step 3  proceed with Step 4  

CellCLI> alter griddisk <griddisk_name_from_alertlog> flush nowait;
NOTE: Wait for sometimes and monitor the cell's alert.log file.

Check following too to see if flushing dirty blocks from flash cache to specific grid disk(s) cached is decreasing :
CellCLI> list metriccurrent attributes name,metricvalue where name like 'FC_BY_DIRTY';
NOTE: Need to wait and monitor the cell's alert.log file for flush completed.
 
CellCLI> alter griddisk <griddisk_name_from_alertlog> cancel flush;




5) Replace damage disk 

CellCLI> list griddisk attributes name,status,asmmodestatus,asmdeactivationoutcome

NOTE: If it is showing this ->> "normal - dropped for replacement", then proceed to replace the disk.

alter physicaldisk 28:5 serviceled off




 
6)  Disk addition   back  to Asm 

If the new griddisks were not automatically added back into the ASM diskgroup configuration, then locate the disks with group_number=0, and add them back in manually using "alter diskgroup <name> add disk <path> re-balance power 10;" command:

If replacement time crosses ASM FAST MIRROR RESYNC period 

SQL> select path,header_status from v$asm_disk where group_number=0;
PATH                                               HEADER_STATU
-------------------------------------------------- ------------
o/192.168.9.10/DBFS_DG_CD_05_abcdcel02        FORMER
o/192.168.9.10/DATA_Q1_CD_05_abcdcel02        FORMER
o/192.168.9.10/RECO_Q1_CD_05_abcdcel02        FORMER

SQL> alter diskgroup dbfs_dg add disk 'o/192.168.9.10/DBFS_DG_CD_05_abcdcel02' rebalance power 10;
SQL> alter diskgroup data_q1 add disk 'o/192.168.9.10/DATA_Q1_CD_05_abcdcel02' rebalance power 10;
SQL> alter diskgroup reco_q1 add disk 'o/192.168.9.10/RECO_Q1_CD_05_abcdcel02' rebalance power 10;







++++++++++++++++++++++++++++++++++++++++++++++++++++
Below 2 sql was used for  checking disk statistics 
++++++++++++++++++++++++++++++++++++++++++++++++++++

COL cv_cellname       HEAD CELLNAME         FOR A20
COL cv_cellversion    HEAD CELLSRV_VERSION  FOR A20
COL cv_flashcachemode HEAD FLASH_CACHE_MODE FOR A20
PROMPT Show Exadata cell versions from V$CELL_CONFIG....
SELECT 
    disktype
  , cv_cellname
  , status
  , ROUND(SUM(physicalsize/1024/1024/1024)) total_gb
  , ROUND(AVG(physicalsize/1024/1024/1024)) avg_gb
  , COUNT(*) num_disks
  , SUM(CASE WHEN predfailStatus  = 'TRUE' THEN 1 END) predfail
  , SUM(CASE WHEN poorPerfStatus  = 'TRUE' THEN 1 END) poorperf
  , SUM(CASE WHEN wtCachingStatus = 'TRUE' THEN 1 END) wtcacheprob
  , SUM(CASE WHEN peerFailStatus  = 'TRUE' THEN 1 END) peerfail
  , SUM(CASE WHEN criticalStatus  = 'TRUE' THEN 1 END) critical
FROM (
    SELECT /*+ NO_MERGE */
        c.cellname cv_cellname
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/name/text()')                          AS VARCHAR2(20)) diskname
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/diskType/text()')                      AS VARCHAR2(20)) diskType          
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/luns/text()')                          AS VARCHAR2(20)) luns              
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/makeModel/text()')                     AS VARCHAR2(50)) makeModel         
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/physicalFirmware/text()')              AS VARCHAR2(20)) physicalFirmware  
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/physicalInsertTime/text()')            AS VARCHAR2(30)) physicalInsertTime
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/physicalSerial/text()')                AS VARCHAR2(20)) physicalSerial    
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/physicalSize/text()')                  AS VARCHAR2(20)) physicalSize      
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/slotNumber/text()')                    AS VARCHAR2(30)) slotNumber        
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/status/text()')                        AS VARCHAR2(20)) status            
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/id/text()')                            AS VARCHAR2(20)) id                
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/key_500/text()')                       AS VARCHAR2(20)) key_500           
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/predfailStatus/text()')                AS VARCHAR2(20)) predfailStatus    
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/poorPerfStatus/text()')                AS VARCHAR2(20)) poorPerfStatus    
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/wtCachingStatus/text()')               AS VARCHAR2(20)) wtCachingStatus   
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/peerFailStatus/text()')                AS VARCHAR2(20)) peerFailStatus    
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/criticalStatus/text()')                AS VARCHAR2(20)) criticalStatus    
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/errCmdTimeoutCount/text()')            AS VARCHAR2(20)) errCmdTimeoutCount
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/errHardReadCount/text()')              AS VARCHAR2(20)) errHardReadCount  
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/errHardWriteCount/text()')             AS VARCHAR2(20)) errHardWriteCount 
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/errMediaCount/text()')                 AS VARCHAR2(20)) errMediaCount     
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/errOtherCount/text()')                 AS VARCHAR2(20)) errOtherCount     
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/errSeekCount/text()')                  AS VARCHAR2(20)) errSeekCount      
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/sectorRemapCount/text()')              AS VARCHAR2(20)) sectorRemapCount  
    FROM
        v$cell_config c
      , TABLE(XMLSEQUENCE(EXTRACT(XMLTYPE(c.confval), '/cli-output/physicaldisk'))) v   
    WHERE 
        c.conftype = 'PHYSICALDISKS'
)
GROUP BY
    cv_cellname
  , disktype
  , status
ORDER BY
    disktype
  , cv_cellname
/




COL cellname            HEAD CELLNAME       FOR A20
COL celldisk_name       HEAD CELLDISK       FOR A30
COL physdisk_name       HEAD PHYSDISK       FOR A30
COL griddisk_name       HEAD GRIDDISK       FOR A30
COL asmdisk_name        HEAD ASMDISK        FOR A30
BREAK ON asm_diskgroup SKIP 1 ON asm_disk
PROMPT Showing Exadata disk topology from V$ASM_DISK and V$CELL_CONFIG....
WITH
  pd AS (
    SELECT /*+ MATERIALIZE */
        c.cellname
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/name/text()')                          AS VARCHAR2(100)) name
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/diskType/text()')                      AS VARCHAR2(100)) diskType          
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/luns/text()')                          AS VARCHAR2(100)) luns              
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/makeModel/text()')                     AS VARCHAR2(100)) makeModel         
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/physicalFirmware/text()')              AS VARCHAR2(100)) physicalFirmware  
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/physicalInsertTime/text()')            AS VARCHAR2(100)) physicalInsertTime
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/physicalSerial/text()')                AS VARCHAR2(100)) physicalSerial    
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/physicalSize/text()')                  AS VARCHAR2(100)) physicalSize      
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/slotNumber/text()')                    AS VARCHAR2(100)) slotNumber        
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/status/text()')                        AS VARCHAR2(100)) status            
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/id/text()')                            AS VARCHAR2(100)) id                
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/key_500/text()')                       AS VARCHAR2(100)) key_500           
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/predfailStatus/text()')                AS VARCHAR2(100)) predfailStatus    
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/poorPerfStatus/text()')                AS VARCHAR2(100)) poorPerfStatus    
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/wtCachingStatus/text()')               AS VARCHAR2(100)) wtCachingStatus   
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/peerFailStatus/text()')                AS VARCHAR2(100)) peerFailStatus    
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/criticalStatus/text()')                AS VARCHAR2(100)) criticalStatus    
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/errCmdTimeoutCount/text()')            AS VARCHAR2(100)) errCmdTimeoutCount
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/errHardReadCount/text()')              AS VARCHAR2(100)) errHardReadCount  
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/errHardWriteCount/text()')             AS VARCHAR2(100)) errHardWriteCount 
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/errMediaCount/text()')                 AS VARCHAR2(100)) errMediaCount     
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/errOtherCount/text()')                 AS VARCHAR2(100)) errOtherCount     
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/errSeekCount/text()')                  AS VARCHAR2(100)) errSeekCount      
      , CAST(EXTRACTVALUE(VALUE(v), '/physicaldisk/sectorRemapCount/text()')              AS VARCHAR2(100)) sectorRemapCount  
    FROM
        v$cell_config c
      , TABLE(XMLSEQUENCE(EXTRACT(XMLTYPE(c.confval), '/cli-output/physicaldisk'))) v  -- gv$ isn't needed, all cells should be visible in all instances
    WHERE 
        c.conftype = 'PHYSICALDISKS'
),
 cd AS (
    SELECT /*+ MATERIALIZE */
        c.cellname 
      , CAST(EXTRACTVALUE(VALUE(v), '/celldisk/name/text()')                              AS VARCHAR2(100)) name
      , CAST(EXTRACTVALUE(VALUE(v), '/celldisk/comment        /text()')                   AS VARCHAR2(100)) disk_comment
      , CAST(EXTRACTVALUE(VALUE(v), '/celldisk/creationTime   /text()')                   AS VARCHAR2(100)) creationTime
      , CAST(EXTRACTVALUE(VALUE(v), '/celldisk/deviceName     /text()')                   AS VARCHAR2(100)) deviceName
      , CAST(EXTRACTVALUE(VALUE(v), '/celldisk/devicePartition/text()')                   AS VARCHAR2(100)) devicePartition
      , CAST(EXTRACTVALUE(VALUE(v), '/celldisk/diskType       /text()')                   AS VARCHAR2(100)) diskType
      , CAST(EXTRACTVALUE(VALUE(v), '/celldisk/errorCount     /text()')                   AS VARCHAR2(100)) errorCount
      , CAST(EXTRACTVALUE(VALUE(v), '/celldisk/freeSpace      /text()')                   AS VARCHAR2(100)) freeSpace
      , CAST(EXTRACTVALUE(VALUE(v), '/celldisk/id             /text()')                   AS VARCHAR2(100)) id
      , CAST(EXTRACTVALUE(VALUE(v), '/celldisk/interleaving   /text()')                   AS VARCHAR2(100)) interleaving
      , CAST(EXTRACTVALUE(VALUE(v), '/celldisk/lun            /text()')                   AS VARCHAR2(100)) lun
      , CAST(EXTRACTVALUE(VALUE(v), '/celldisk/physicalDisk   /text()')                   AS VARCHAR2(100)) physicalDisk
      , CAST(EXTRACTVALUE(VALUE(v), '/celldisk/size           /text()')                   AS VARCHAR2(100)) disk_size
      , CAST(EXTRACTVALUE(VALUE(v), '/celldisk/status         /text()')                   AS VARCHAR2(100)) status
    FROM
        v$cell_config c
      , TABLE(XMLSEQUENCE(EXTRACT(XMLTYPE(c.confval), '/cli-output/celldisk'))) v  -- gv$ isn't needed, all cells should be visible in all instances
    WHERE 
        c.conftype = 'CELLDISKS'
),
 gd AS (
    SELECT /*+ MATERIALIZE */
        c.cellname 
      , CAST(EXTRACTVALUE(VALUE(v), '/griddisk/name/text()')                               AS VARCHAR2(100)) name
      , CAST(EXTRACTVALUE(VALUE(v), '/griddisk/asmDiskgroupName/text()')                   AS VARCHAR2(100)) asmDiskgroupName 
      , CAST(EXTRACTVALUE(VALUE(v), '/griddisk/asmDiskName     /text()')                   AS VARCHAR2(100)) asmDiskName
      , CAST(EXTRACTVALUE(VALUE(v), '/griddisk/asmFailGroupName/text()')                   AS VARCHAR2(100)) asmFailGroupName
      , CAST(EXTRACTVALUE(VALUE(v), '/griddisk/availableTo     /text()')                   AS VARCHAR2(100)) availableTo
      , CAST(EXTRACTVALUE(VALUE(v), '/griddisk/cachingPolicy   /text()')                   AS VARCHAR2(100)) cachingPolicy
      , CAST(EXTRACTVALUE(VALUE(v), '/griddisk/cellDisk        /text()')                   AS VARCHAR2(100)) cellDisk
      , CAST(EXTRACTVALUE(VALUE(v), '/griddisk/comment         /text()')                   AS VARCHAR2(100)) disk_comment
      , CAST(EXTRACTVALUE(VALUE(v), '/griddisk/creationTime    /text()')                   AS VARCHAR2(100)) creationTime
      , CAST(EXTRACTVALUE(VALUE(v), '/griddisk/diskType        /text()')                   AS VARCHAR2(100)) diskType
      , CAST(EXTRACTVALUE(VALUE(v), '/griddisk/errorCount      /text()')                   AS VARCHAR2(100)) errorCount
      , CAST(EXTRACTVALUE(VALUE(v), '/griddisk/id              /text()')                   AS VARCHAR2(100)) id
      , CAST(EXTRACTVALUE(VALUE(v), '/griddisk/offset          /text()')                   AS VARCHAR2(100)) offset
      , CAST(EXTRACTVALUE(VALUE(v), '/griddisk/size            /text()')                   AS VARCHAR2(100)) disk_size
      , CAST(EXTRACTVALUE(VALUE(v), '/griddisk/status          /text()')                   AS VARCHAR2(100)) status
    FROM
        v$cell_config c
      , TABLE(XMLSEQUENCE(EXTRACT(XMLTYPE(c.confval), '/cli-output/griddisk'))) v  -- gv$ isn't needed, all cells should be visible in all instances
    WHERE 
        c.conftype = 'GRIDDISKS'
),
 lun AS (
    SELECT /*+ MATERIALIZE */
        c.cellname 
      , CAST(EXTRACTVALUE(VALUE(v), '/lun/cellDisk         /text()')              AS VARCHAR2(100)) cellDisk      
      , CAST(EXTRACTVALUE(VALUE(v), '/lun/deviceName       /text()')              AS VARCHAR2(100)) deviceName    
      , CAST(EXTRACTVALUE(VALUE(v), '/lun/diskType         /text()')              AS VARCHAR2(100)) diskType      
      , CAST(EXTRACTVALUE(VALUE(v), '/lun/id               /text()')              AS VARCHAR2(100)) id            
      , CAST(EXTRACTVALUE(VALUE(v), '/lun/isSystemLun      /text()')              AS VARCHAR2(100)) isSystemLun   
      , CAST(EXTRACTVALUE(VALUE(v), '/lun/lunAutoCreate    /text()')              AS VARCHAR2(100)) lunAutoCreate 
      , CAST(EXTRACTVALUE(VALUE(v), '/lun/lunSize          /text()')              AS VARCHAR2(100)) lunSize       
      , CAST(EXTRACTVALUE(VALUE(v), '/lun/physicalDrives   /text()')              AS VARCHAR2(100)) physicalDrives
      , CAST(EXTRACTVALUE(VALUE(v), '/lun/raidLevel        /text()')              AS VARCHAR2(100)) raidLevel
      , CAST(EXTRACTVALUE(VALUE(v), '/lun/lunWriteCacheMode/text()')              AS VARCHAR2(100)) lunWriteCacheMode
      , CAST(EXTRACTVALUE(VALUE(v), '/lun/status           /text()')              AS VARCHAR2(100)) status        
    FROM
        v$cell_config c
      , TABLE(XMLSEQUENCE(EXTRACT(XMLTYPE(c.confval), '/cli-output/lun'))) v  -- gv$ isn't needed, all cells should be visible in all instances
    WHERE 
        c.conftype = 'LUNS'
)
 , ad  AS (SELECT /*+ MATERIALIZE */ * FROM v$asm_disk)
 , adg AS (SELECT /*+ MATERIALIZE */ * FROM v$asm_diskgroup)
SELECT 
    adg.name                        asm_diskgroup
  , ad.name                         asm_disk
  , gd.name                         griddisk_name
  , cd.name                         celldisk_name
  , pd.cellname
  , SUBSTR(cd.devicepartition,1,20) cd_devicepart
  , pd.name                         physdisk_name
  , SUBSTR(pd.status,1,20)          physdisk_status
  , lun.lunWriteCacheMode
--  , SUBSTR(cd.devicename,1,20)      cd_devicename
--  , SUBSTR(lun.devicename,1,20)     lun_devicename
--    disktype
FROM
    gd
  , cd
  , pd
  , lun
  , ad
  , adg
WHERE
    ad.group_number = adg.group_number (+)
AND gd.asmdiskname = ad.name (+)
AND cd.name = gd.cellDisk (+)
AND pd.id = cd.physicalDisk (+)
AND cd.name = lun.celldisk (+)
--GROUP BY
--    cellname
--  , disktype
--  , status
ORDER BY
--    disktype
    asm_diskgroup
  , asm_disk
  , griddisk_name
  , celldisk_name
  , physdisk_name
  , cellname
/






++++++++++++++++++++++++++++++++++++++++++++++++++++
References : 
++++++++++++++++++++++++++++++++++++++++++++++++++++
Oracle Exadata Diagnostic Information required for Disk Failures and some other Hardware issues (Doc ID 761868.1)

Things to Check in ASM When Replacing an ONLINE disk from Exadata Storage Cell (Doc ID 1326611.1)

Exadata ALTER PHYSICALDISK N:N DROP FOR REPLACEMENT is hung (Doc ID 2574663.1)

How to Replace a Hard Drive in an Exadata Storage Server (Predictive Failure) (Doc ID 1390836.1)

How to Replace a Hard Drive in an Exadata Storage Server (Hard Failure) (Doc ID 1386147.1)

Drop Online Disk: Things to Check in ASM When Replacing an ONLINE disk from Exadata Storage Cell (Doc ID 1326611.1)

Understanding ASM Capacity and Reservation of Free Space in Exadata (Doc ID 1551288.1)

Flash disk poor performance status (Doc ID 1206015.1)

Steps to shut down or reboot an Exadata storage cell without affecting ASM (Doc ID 1188080.1)

Understanding ASM Capacity and Reservation of Free Space in Exadata (Doc ID 1551288.1)

Thursday, September 2, 2021

RMAN-20011: target database incarnation is not current in recovery catalog

 
After some research, we came across Note 412113.1 on http://metalink.oracle.com, which pointed towards the possibility of some non-prod instance getting registered with RMAN catalog with the same DBID as production’s.

In addition, Notes 1076158.6 and 1061206.6 were also *probably* relevant, but the relevance was not striking

SQL> select db_key,DBID,name,current_incarnation from rc_database_incarnation order by 1;


So, in reality, what had happened was that the reincarnation of DR PROD instance had been setup in the rman catalog.

To undo this, we reset the database incarnation to the catalog using the RESET INCARNATION TO DBKEY and then did a manual resync of catalog using the rman RESYNC CATALOG command.


If backups are falling  due to said reason 

Reset PROD to previous incarnation
 1. Check the latest incarnation key for production database in rc_database_incarnation
  select DBID,NAME,DBINC_KEY,  RESETLOGS_CHANGE#, RESETLOGS_TIME   
  from rc_database_incarnation
  where dbid=2284119847;

       ==> Check the DBINC_KEY corresponding to the current incarnation of PROD database
    
2. Connect to the production database with RMAN
     rman catalog <un/pw@catalog_db> target /

     RMAN> reset database to incarnation <dbinc_key>; <<< From step 1
     Or 
     RMAN>    RESET DATABASE;
RMAN> unregister database;
   RMAN> register database;


    RMAN> resync catalog;
    RMAN> list incarnation; => Now the production DB incarnation should be the current one





Reference : 
Target Database Incarnation Is Not Current In Recovery Catalog RMAN-20011 (Doc ID 412113.1)
RMAN restore database fails with RMAN-20011, Target Incarnation Is Not Current (Doc ID 394493.1)