Saturday, November 26, 2022

Oracle Rac database Rman Backup using multiple Instances -- Node affinity


Recently we had requirement to share rman backup across  nodes .   Came across Oracle article that facilities requirement using  Node affinity


In some cluster database configurations, some nodes of the cluster have faster access to some datafiles than to other datafiles. RMAN automatically detects this affinity, which is known as node affinity awareness.



To use node affinity, configure RMAN channels on the nodes of the cluster that have affinity to the datafiles you want to back up. For example, use the syntax:

-- setup for a parallel backup

RMAN> CONFIGURE DEVICE TYPE DISK BACKUP TYPE TO COMPRESSED BACKUPSET PARALLELISM 4;
RMAN> CONFIGURE CHANNEL 1 DEVICE TYPE DISK CONNECT 'sys/passwd@inst1';
RMAN> CONFIGURE CHANNEL 2 DEVICE TYPE DISK CONNECT 'sys/passwd@inst1';
RMAN> CONFIGURE CHANNEL 3 DEVICE TYPE DISK CONNECT 'sys/passwd@inst2';
RMAN> CONFIGURE CHANNEL 4 DEVICE TYPE DISK CONNECT 'sys/passwd@inst2' ;



CONFIGURE DEVICE TYPE sbt PARALLELISM 3;
CONFIGURE DEFAULT DEVICE TYPE TO sbt; 
CONFIGURE CHANNEL 1 DEVICE TYPE sbt CONNECT 'user1/password1@node1';
CONFIGURE CHANNEL 2 DEVICE TYPE sbt CONNECT 'user2/password2@node2';
CONFIGURE CHANNEL 3 DEVICE TYPE sbt CONNECT 'user3/password3@node3';




You can manually override the automatic node affinity by specifying which channels should back up which datafiles. For example:

BACKUP
  # channel 1 gets datafile 1
  (DATAFILE 1 CHANNEL ORA_SBT_TAPE_1)
  # channel 2 gets datafiles 2-4
  (DATAFILE 2,3,4 CHANNEL ORA_SBT_TAPE_2)
  # channel 3 gets datafiles 5-10
  (DATAFILE 5,6,7,8,9,10 CHANNEL ORA_SBT_TAPE_3); 





References : 

https://docs.oracle.com/cd/B10500_01/rac.920/a96596/backup.htm





Sunday, November 6, 2022

Oracle Database DBMS_Scheduler job to run on preferred node in RAC -- instance_stickiness

 

We observed  that Rac 1st  node seems highly loaded  though we had database service spread across  instances . Checking further  came across dba_scheduler_jobs feature   instance_stickiness   introduced back in 11g but not known much . 


If instance_stickiness is set to FALSE, each instance of the job runs on the first instance available. For environments other than Oracle RAC, this attribute is not useful because there is only one instance


dba_scheduler_jobs.instance_id will be NULL unless you have explicitly set it with set_attribute.

Instance Stickiness value true means job will run on same node until node is extremely overloaded or not available. False means job will run on any available node. Its default value is true. Job will continue on that on which its first run or started in RAC case. No use in Standalone server.


BEGIN
  dbms_scheduler.create_job(
     job_name => 'TEST_JOB'
    ,job_type => 'PLSQL_BLOCK'
    ,job_action => 'begin null; end; '
    ,start_date => TIMESTAMP'2020-07-24 04:15:01 US/Eastern' 
     ,repeat_interval => 'FREQ=DAILY'
    ,enabled => TRUE
    ,comments => 'scheduler for LATE FEE CALULATION FOR LOANS');


  dbms_scheduler.set_attribute(
                              'TEST_JOB',
                              'INSTANCE_ID',
                              '1'
                             );
END;
/



--Change the attribute value with DBMS set attribute procedure

--select Preferred instance run the job is 1
exec dbms_scheduler.set_attribute(name => 'TEST_JOB' ,attribute=>'INSTANCE_ID', value=>'1');

--Disable the instance stickness attribute.
exec dbms_scheduler.set_attribute(name => 'TEST_JOB' ,attribute=>'INSTANCE_STICKINESS', value=>FALSE);



--Check the job condition and status with following query

 select OWNER,JOB_NAME,INSTANCE_STICKINESS,INSTANCE_ID from dba_scheduler_jobs where JOB_NAME like 'TEST_JOB';




Using Job Class and Database Service : 

This can also be achieved using   service defined in Job class 

The Oracle 10g scheduler allows jobs to be linked with job classes, which in turn can be linked to services to allows jobs to run on specific nodes in a RAC environment. To support our requirements we might create a job classes as follows.

--Create the service on two nodes.
srvctl add service -d ORCL -s BATCH_SERVICE -r RAC1,RAC2
srvctl add service -d ORCL -s BATCH_SERVICE -preferred RAC1 -available RAC2



SQL> BEGIN
DBMS_SCHEDULER.create_job_class(
job_class_name => 'BATCH_JOB_CLASS',
service        => 'BATCH_SERVICE');
END;
/

-- Create job by using that job class
Begin
DBMS_SCHEDULER.CREATE_JOB (
   job_name  => 'TEST_JOB',
   job_type  => 'PLSQL_BLOCK',
   job_action => 'Begin dbms_output.put_line(''Hello'') end;',
   number_of_arguments  => 0,
   start_date      => Sysdate,
   repeat_interval  => 'freq=daily;byhour=9,21;byminute=0;bysecond=0;',
   job_class      => 'BATCH_JOB_CLASS',
   enabled     => 'TRUE'
   );
END;
/




Reference : 

How To Run A Job On Specific Node Using DBMS_SCHEDULER (Doc ID 472535.1)

Doc ID 2214476.1

Monday, October 31, 2022

Reading Unix Compressed files without uncompressing them using Zcat, Zless, Zgrep, Zdiff

 

Mostly we have situation that we  need to  see content  of compressed  files without actually  uncompressing it . This  can be achieved using Z commands 



Below are options using z commands 

Viewing the compressed file with zcat.
Paging the compressed file with zless / zmore.
Searching inside the compressed file with zgrep / zegrep.
Comparison of file using zdiff / zcmp



Below are examples : 

$ zcat big-file.txt.gz 
[Note: View the file without uncompressing it]


zcat big-file.txt.gz > big-file.txt
[Note: Uncompress the file]



$ zcat filename.gz | more
$ zcat filename.gz | less

(or)

$ zless filename.gz
$ zmore filename.gz



$ zgrep -i less test-file.txt.gz



$ zdiff file1.txt.gz file2.txt.gz


Wednesday, October 26, 2022

Oracle Database Health Check with DBMS_HM -- HEALTH CHECK MONITOR

 


Was  trying to explore options  for  Block corruption checks came across this  old gem DBMS_HM  ,  and want to  document  it .  Though it is old utility introduced in 11g 





Health check run in the two mode :

1) Online Mode: Health check run while database is in open or mounted mode.
2) Offline Mode: Health check run while database is in nomount mode.Only the Redo Integrity Check and the DB Structure Integrity Check can be used in DB-offline mode.





Types of health check : 

1) DB Structure Integrity Check: This option  check datafiles if there is corruption or inaccessible situtation.if database is open this check logfile and datafile if nomount mode  only checkhed controlfile.
2) Data Block Integrity Check: This check detects disk image block corruptions such as checksum failures and logical inconsistencies within the block.
3) Redo Integrity Check: This check scans  redo log for accessibility and corruption, as well as the archive logs, if available.
4) Undo Segment Integrity Check: This check finds logical undo corruptions.
5) Transaction Integrity Check: This check is identical to the Undo Segment Integrity Check but it checks only one specific transaction.
6) Dictionary Integrity Check: This check integrity of core dictionary objects, such as tab$ and col$



Sample : 

1) DB Structure Integrity Check: 
SQL> BEGIN
DBMS_HM.RUN_CHECK('Dictionary Integrity Check', 'hk_run');
END;
/


2)  Data Block Integrity Health Check

BEGIN
DBMS_HM.RUN_CHECK (
check_name   => ‘Data Block Integrity Check’,
run_name     => ‘datablockint’,
input_params => ‘BLC_DF_NUM=4;BLC_BL_NUM=111’);
END;
/

3) Transaction Health Check: 
BEGIN
DBMS_HM.RUN_CHECK (
check_name   => ‘Transaction Integrity Check’,
run_name     => ‘transacitoncheck’,
input_params => ‘TXN_ID=7.33.2’);
END;
/


4) Undo Segment Health Check: 
BEGIN
DBMS_HM.RUN_CHECK (
check_name   => ‘Undo Segment Integrity Check’,
run_name     => 'undosegmentcheck',
input_params => ‘USN_NUMBER=1’);
END;
/




 Health Monitor Views:


V$HM_CHECK – lists all Health checks
V$HM_CHECK_PARAM - lists information about Health checks, input parameters and defaults. Join CHECK_ID with V$HM_CHECK.ID.
V$HM_FINDING – Information about findings of Health Monitor runs.
V$HM_INFO – Information about runs, finding and recommendations.
V$HM_RECOMMENDATION – information about any recommendation from Health check runs.
V$HM_RUN – Information about checker runs like name, mode, time of run, etc.


SQL>SELECT c.name check_name, p.name parameter_name, p.type,
p.default_value, p.description
FROM v$hm_check_param p, v$hm_check c
WHERE p.check_id = c.id and c.internal_check = ‘N’
ORDER BY c.name;

SQL>select * from V$HM_RECOMMENDATION;
SET LONG 100000
SET LONGCHUNKSIZE 1000
SET PAGESIZE 1000
SET LINESIZE 512
SELECT DBMS_HM.get_run_report('undosegmentcheck') FROM dual;





Viewing Reports Using the ADRCI Utility

adrci>>
adrci> show hm_run

Sunday, October 9, 2022

Creating Awr Snapshot in Oracle Active Dataguard Standby Database -- UMF TOPOLOGY

 

The feature uses the Remote Management Framework which comes with a New Oracle built-in user called SYS$UMF. This user is locked by default and should be unlocked before configuring the RMF.

Since Oracle Database 12c Release 2 (12.2), Automatic Workload Repository (AWR) data can be captured for Active Data Guard (ADG) standby databases. This feature enables analyzing any performance-related issues for ADG standby databases

AWR snapshots for ADG standby databases are called remote snapshots. A database node, called destination, is responsible for storing snapshots that are collected from remote ADG standby database nodes, called sources.

Optionally we  can also  use statspack following below documents 

Installing and Using Standby Statspack (Doc ID 454848.1)
Installing and Using Standby Statspack in 12c Multitenant Database (Doc ID 2020285.1)



Step 1: Unlock SYS$UMF user and change its password using SYS user. 

SYS$UMF account is used to administer Remote Management Framework, including the remote Automatic Workload Repository (AWR).

SQL>alter user sys$umf identified by oracle account unlock;




Step 2: Create the database link from primary to the standby database and standby database to a primary.

Primary database DB_UNIQUE_NAME: Primary
Standby database DB_UNIQUE_NAME: Standby


SQL>  create database link primary_to_standby connect to "SYS$UMF" identified by "oracle" using 'Standby';

Database link created.

SQL> create database link standby_to_primary connect to "SYS$UMF" identified by "oracle" using 'Primary';

Database link created.




Step 3: Now we need to configure database to add topology. 

Each database name must be assigned a unique name. Default name is db_unique_name. In my case dbupgrade and std_dbupgarde.


DBMS_UMF.CONFIGURE_NODE procedure configures a node that needs to be registered with the RMF topology. This procedure must be executed on the node that needs to be configured.

DBMS_UMF.CONFIGURE_NODE(
   node_name          IN VARCHAR2 DEFAULT NULL,
   dblink_to_target   IN VARCHAR2 DEFAULT NULL);



On primary :

 alter system set "_umf_remote_enabled"=TRUE scope=BOTH;

SQL> exec dbms_umf.configure_node('Primary');

PL/SQL procedure successfully completed.



On standby :

SQL>exec dbms_umf.configure_node ('Standby','standby_to_primary');

PL/SQL procedure successfully completed.





Step 4: Create RMF topology

DBMS_UMF.CREATE_TOPOLOGY procedure creates the RMF topology and designates the node on which it is executed as the destination node for that topology.

DBMS_UMF.CREATE_TOPOLOGY( topology_name IN VARCHAR2);

SQL> exec DBMS_UMF.create_topology ('Topology_1');

PL/SQL procedure successfully completed.




Step 5: Check DBA_UMF_REGISTRATION and dba_umf_topology view

SQL> select * from dba_umf_topology;

TOPOLOGY_NAME TARGET_ID TOPOLOGY_VERSION TOPOLOGY
--------------- ---------- ---------------- --------
Topology_1 4040060753   8 ACTIVE


SQL> col node_name format a20
SQL> select * from DBA_UMF_REGISTRATION;

TOPOLOGY_NAME      NODE_NAME      NODE_ID  NODE_TYPE AS_SO AS_CA  STATE
-------------------- -------------------- ---------- ---------- ----- -----  -----
Topology_1      Primary 4040060753       0 FALSE FALSE  OK





Step 6: Register the standby database with topology

DBMS_UMF.REGISTER_NODE function and procedure register a node with the RMF topology. This procedure and function must be executed only on the destination node in the RMF topology.

Syntax

DBMS_UMF.REGISTER_NODE(
   topology_name          IN  VARCHAR2,
   node_name              IN  VARCHAR2,
   dblink_to_node         IN  VARCHAR2 DEFAULT NULL,
   dblink_from_node       IN  VARCHAR2 DEFAULT NULL,
   as_source              IN  VARCHAR2 DEFAULT 'TRUE',
   as_candidate_target    IN  VARCHAR2 DEFAULT 'FALSE');

SQL> exec DBMS_UMF.register_node ('Topology_1', 'Standby', 'primary_to_standby', 'standby_to_primary', 'FALSE','FALSE');

PL/SQL procedure successfully completed.




Step 7: Enable AWR service on the remote node

DBMS_WORKLOAD_REPOSITORY.REGISTER_REMOTE_DATABASE

procedure registers a remote database in the Automatic Workload Repository (AWR) using the Remote Management Framework (RMF).

DBMS_WORKLOAD_REPOSITORY.REGISTER_REMOTE_DATABASE(
   node_name      IN VARCHAR2,
   topology_name  IN VARCHAR2 DEFAULT NULL);


SQL> exec DBMS_WORKLOAD_REPOSITORY.register_remote_database(node_name=>'Standby');

PL/SQL procedure successfully completed.




Step 8: Now again verify in dba_umf_registration view

SQL> select * from dba_umf_registration;

TOPOLOGY_NAME NODE_NAME    NODE_ID  NODE_TYPE AS_SO AS_CA STATE
--------------- --------------- ---------- ---------- ----- ----- --------------------
Topology_1 Primary 4040060753     0 FALSE FALSE OK
Topology_1 Standby 1978111757     0 FALSE FALSE OK




Step 9: Create snapshot using RMF in the primary database for the remote database.


function and procedure create a remote snapshot using the Remote Management Framework (RMF). The function returns the snapshot ID.

DBMS_WORKLOAD_REPOSITORY.CREATE_REMOTE_SNAPSHOT(
   node_id     IN NUMBER,
   flush_level IN VARCHAR2 DEFAULT 'BESTFIT');



SQL> exec dbms_workload_repository.create_remote_snapshot('Standby');

PL/SQL procedure successfully completed.




Step 10 : Create AWR report from a standby database

Note: NODE ID generated above consider as DBID for a standby database.

After specifying AWR report type HTML it will ask for DBID, give dbid associated with nodeid in dba_umf_registration.




Step 11 : After Role Switchover 

Everything should be set now, you need to be aware after a switchover this procedure might not work . We can make the role change by executing the following procedure on the candidate destination

SQL> EXEC DBMS_UMF.SWITCH_DESTINATION(topology name, force_switch=>FALSE);


BUG 21046490 - UMF-ADG: FAILED TO TAKE SNAPSHOTS AFTER SWITCHOVERS IN ADG[This section is not visible to customers.]


Per non-published Bug 28930258, if in a hurry and has not constructed a topology, then check the value of _remote_awr_enabled on the current primary database.

SELECT b.ksppstvl "Session Value",
c.ksppstvl "Instance Value"
FROM sys.x$ksppi a,
sys.x$ksppcv b,
sys.x$ksppsv c
WHERE
a.indx = b.indx
AND
a.indx = c.indx
AND a.ksppinm = '_remote_awr_enabled';

If it is equal to TRUE, then simply set it to FALSE.

ALTER SYSTEM SET "_remote_awr_enabled" = FALSE SCOPE = BOTH SID = '*';

This will cause AWR snapshots to be generated automatically again.




-- Run in case want to DROP the topology

SQL>  exec DBMS_UMF.drop_topology('NAME-OF-TOPOLOGY');



Known  Errors : 

We need to run at least two to get the begin_snap and end_snap.
If you encounter "ORA-13516: AWR Operation failed: Remote source not registered for AWR" then manually switch a few (2-3) logfiles on primary:
alter system switch logfile;



If you encounter "ORA-15766: already registered in an RMF topology" unregister the node as below and then rerun "DBMS_UMF.register_node":
exec DBMS_UMF.unregister_node ('Topology_1', 'stby');
PL/SQL procedure successfully completed.
If you encounter "ORA-13519: Database id (1730117407) exists in the workload repository" unregister the remote database as below and then rerun "DBMS_WORKLOAD_REPOSITORY.register_remote_database":

exec DBMS_WORKLOAD_REPOSITORY.unregister_remote_database('stby','Topology_1',TRUE);



Views : 

set line 132
col topology_name format a15
col node_name format a15
select * from dba_umf_topology;
select * from dba_umf_registration;
select * from dba_umf_service;
select * from dba_umf_link;



References : 

https://docs.oracle.com/en/database/oracle/oracle-database/19/tgdba/gathering-database-statistics.html#GUID-E1369092-DA6B-4CF4-B286-69D4A3BDFA1E

How to Generate AWRs in Active Data Guard Standby Databases (Doc ID 2409808.1)




Tuesday, September 27, 2022

Troubleshooting Oracle Rac Crs/Cluster Startup Issues


We need to check what  component  are started and what are pending based on that we need to  take approach . 




Known Issues  / Issues faced : 

1)  ora.crsd remains in INTERMEDIATE and commands like crsctl hang and Logical corruption check failed (Doc ID 2075966.1)

2) CRSD Failing to Start or in INTERMEDIATE State Due to OLR corruption. (Doc ID 2809968.1) 

3) CRS is not starting up on the cluster node (Doc ID 2445492.1)  --  Missing execute permission  on <GRID_HOME>/srvm/mesg directory

4)  Crs activeversion is different on both nodes after patching/upgrade 

5)  Permission of Voting/Ocr disk was changed 

6)  ohasd process was no started due to missing /etc/systemd/system/ohasd.service 

7)  Asm startup issues faced  due to  below 

-->  asm disk were not shared between nodes 
--> asm_diskstring parameters were somehow removed 



7) asm not started due to afd filter  issue  . failed to access AFD label disk  after reboot 
    This has been documented in my another Blog  below 

https://abdul-hafeez-kalsekar.blogspot.com/2021/10/how-to-install-and-configure-asm-filter.html







Solutions We can try wherever applicable  : 

1) If Ohas process is started   already , try to start asm manually    which should bring up crs 


2)  Start  crsd process alone  : 

crsctl check crs
crsctl check cluster -all
crsctl start res ora.crsd -init
or
Execute /etc/init.d/init.crs start



3) Start ohasd  manually .   : 

nohup  sh  /etc/init.d/init.ohasd run  & 
ps -ef | grep -i d.bin 
ps -ef | grep -i ohasd 

RHEL 7 onwards, it uses systemd rather than initd for starting or restarting processes and runs them as a service.
[root@pmyws01 ~]# cat /etc/systemd/system/ohasd.service
[Unit]
Description=Oracle High Availability Services
After=syslog.target

[Service]
ExecStart=/etc/init.d/init.ohasd run >/dev/null 2>&1 Type=simple
Restart=always

[Install]
WantedBy=multi-user.target
[root@pmyws01 ~]# systemctl daemon-reload
[root@pmyws01 ~]# systemctl enable ohasd.service
[root@pmyws01 ~]# systemctl start ohasd.service
[root@pmyws01 ~]# systemctl status ohasd.service



4) Try running   /u01/app/19.0.0/grid/crs/install/rootcrs.sh -postpatch 





      Handy  commands

     crsctl get cluster name
     crsctl get cluster configuration
     crsctl enable crs  
     crsctl disable crs
     crsctl config crs
     crsctl query css votedisk
     crsctl start cluster -all | -n nodename
     crsctl stop cluster -all | -n nodename
     crsctl start cluster 
     crsctl check cluster -all | -n nodename
     crsctl query crs activeversion
     crsctl query crs activeversion -f
     crsctl query crs softwareversion 
      crsctl query crs softwarepatch
crsctl start crs [-excl [-nocrs] [-cssonly]] | [-wait | -waithas | -nowait] | [-noautostart]
crsctl start res ora.crsd -init
crsctl start crs -excl -nocrs
crsctl start crs -wait
crsctl stop rollingpatch
$GRID_HOME/bin/kfod op=patches

/u01/grid/19.0.0.0/bin/srvctl status cha
crsctl modify resource ora.chad -attr "ENABLED=1" -unsupported      (to enable cha if  'ora.chad' is disabled ) 

grep "OCR MASTER" $GRID_HOME/log/<nodename>/crsd/crsd.log





Must Read  : Top 5 Grid Infrastructure Startup Issues (Doc ID 1368382.1)

Issue #1: CRS-4639: Could not contact Oracle High Availability Services, ohasd.bin not running or ohasd.bin is running but no init.ohasd or other processes
Issue #2: CRS-4530: Communications failure contacting Cluster Synchronization Services daemon, ocssd.bin is not running
Issue #3: CRS-4535: Cannot communicate with Cluster Ready Services, crsd.bin is not running
Issue #4: Agent or mdnsd.bin, gpnpd.bin, gipcd.bin not running
Issue #5: ASM instance does not start, ora.asm is OFFLINE
 



Logs to Check :

1) Cluster Alert Log 
2)  deamon  logs 
3) os messages log 
4) oragent log 

Most clusterware daemons/processes pid and output file are in <ORACLE_BASE>/crsdata/<node>/output

Most clusterware daemons/processes logs are in <ORACLE_BASE>/diag/crs/<node>/crs/trace




References : 

1) Troubleshooting CRSD Start up Issue (Doc ID 1323698.1)
2)  Redaht7/Oracle Linux7 + ORA11g : ohasd fails to start(Doc ID 1959008.1)
3) Troubleshoot Grid Infrastructure Startup Issues (Doc ID 1050908.1)
4) How To Gather & Backup ASM/ACFS Metadata In A Formatted Manner version 10.1, 10.2, 11.1, 11.2, 12.1, 12.2, 18.x and 19.x (Doc ID 470211.1)






Case Study 1: OHASD does not start
 
As ohasd.bin is responsible to start up all other cluserware processes directly or indirectly, it needs to start up properly for the rest of the stack to come up. If ohasd.bin is not up, when checking its status, CRS-4639 (Could not contact Oracle High Availability Services) will be reported; and if ohasd.bin is already up, CRS-4640 will be reported if another start up attempt is made; if it fails to start, the following will be reported:
 
 
CRS-4124: Oracle High Availability Services startup failed.
CRS-4000: Command Start failed, or completed with errors.
 
 
Automatic ohasd.bin start up depends on the following:
 
1. OS is at appropriate run level:
 
OS need to be at specified run level before CRS will try to start up.
 
To find out at which run level the clusterware needs to come up:
 
cat /etc/inittab|grep init.ohasd
h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
Note: Oracle Linux 6 (OL6) or Red Hat Linux 6 (RHEL6) has deprecated inittab, rather, init.ohasd will be configured via upstart in /etc/init/oracle-ohasd.conf, however, the process ""/etc/init.d/init.ohasd run" should still be up. Oracle Linux 7 (and Red Hat Linux 7) uses systemd to manage start/stop services (example: /etc/systemd/system/oracle-ohasd.service)
 
Above example shows CRS suppose to run at run level 3 and 5; please note depend on platform, CRS comes up at different run level.
 
To find out current run level:
 
who -r


OHASD.BIN will spawn four agents/monitors to start resource:
 
  oraagent: responsible for ora.asm, ora.evmd, ora.gipcd, ora.gpnpd, ora.mdnsd etc
  orarootagent: responsible for ora.crsd, ora.ctssd, ora.diskmon, ora.drivers.acfs etc
  cssdagent / cssdmonitor: responsible for ora.cssd(for ocssd.bin) and ora.cssdmonitor(for cssdmonitor itself)
 
If ohasd.bin can not start any of above agents properly, clusterware will not come to healthy 





Case Study 2: . OLR is corrupted
 
In this case, the daemon log will show messages like (this is a case that ora.ctssd fails to start):
 
2012-07-22 00:15:16.565: [ default][1]clsvactversion:4: Retrieving Active Version from local storage.
2012-07-22 00:15:16.575: [    CTSS][1]clsctss_r_av3: Invalid active version [] retrieved from OLR. Returns [19].
2012-07-22 00:15:16.585: [    CTSS][1](:ctss_init16:): Error [19] retrieving active version. Returns [19].
2012-07-22 00:15:16.585: [    CTSS][1]ctss_main: CTSS init failed [19]
2012-07-22 00:15:16.585: [    CTSS][1]ctss_main: CTSS daemon aborting [19].
2012-07-22 00:15:16.585: [    CTSS][1]CTSS daemon aborting
 
 
 
The solution is to restore a good copy of OLR note 1193643.1   
 






. Enable further tracing

Execute the following steps as root:

B1. List all crsd modules:

# $GRID_HOME/bin/crsctl lsmodules crs


B2. Find out the current trace level for all crsd modules - the output can be used to to revert back to original trace level once the issue is solved:

# $GRID_HOME/bin/crsctl get log crs all


B3. Set trace level to 5 for all modules:

# $GRID_HOME/bin/crsctl set log crs all:5



Alternatively trace level for each modules can be set individually:


# $GRID_HOME/bin/crsctl set log crs CRSPE:5

Note: The module name is case sensitive


B4. Once the issue is solved, set trace level back to original value for all crs modules with output from B2

# $GRID_HOME/bin/crsctl set log crs AGENT:1




Take pstack

When crsd.bin dumps, it will store a thread stack in $GRID_HOME/log/<nodename>/crsd/crsdOUT.log

Most of the time, there may not be a time window to take pstack as crsd.bin aborts very quick; but in case crsd.bin stays up for a short while, take a few pstack at interval of 30 seconds as root:

Find out pid of crsd.bin: ps -ef| grep crsd.bin

Take pstack:

  AIX        : /bin/procstack <pid-of-crsd.bin>
  Linux        : /usr/bin/pstack <pid-of-crsd.bin>
  Solaris    : /usr/bin/pstack <pid-of-crsd.bin>





Sunday, September 4, 2022

Starting Oracle Cluster Health Advisor (CHA) and CRSD manually



It just that we come across situation where we see some of rac process down .Below are  2 of process that can be started manually .  



Starting CHA  manually 

chad stands for the Cluster Health Advisor (CHA) daemon which is is part of the Oracle Autonomous Health Framework(AHF), 
It continuously monitors cluster nodes and Oracle RAC databases for performance and availability issues.

Oracle Cluster Health Advisor runs as a highly available cluster resource, ochad, on each node in the cluster. Each Oracle Cluster Health Advisor daemon (ochad) monitors the operating system on the cluster node and optionally, each Oracle Real Application Clusters (Oracle RAC) database instance on the node.

 The ochad daemon receives operating system metric data from the Cluster Health Monitor and gets Oracle RAC database instance metrics from a memory-mapped file. The daemon does not require a connection to each database instance. This data, along with the selected model, is used in the Health Prognostics Engine of Oracle Cluster Health Advisor for both the node and each monitored database instance in order to analyze their health multiple times a minute.
 

It is sometimes found that  CHA process is down and we have to start it manually 

crsctl stat res -t  
crsctl status res ora.chad
crsctl stat res ora.chad -t 
srvctl start cha
chactl status 
srvctl status cha 




Starting Crsd  Manually : 
 

$GRID_HOME/bin/crsctl stat res -t -init
$GRID_HOME/bin/crsctl start res ora.crsd -init



If pid file does not exist, $GRID_HOME/log/$HOST/agent/ohasd/orarootagent_root/orarootagent_root.log will have similar like the following:
 
 
2010-02-14 17:40:57.927: [ora.crsd][1243486528] [check] PID FILE doesn't exist.
..
2010-02-14 17:41:57.927: [  clsdmt][1092499776]Creating PID [30269] file for home /ocw/grid host racnode1 bin crs to /ocw/grid/crs/init/
2010-02-14 17:41:57.927: [  clsdmt][1092499776]Error3 -2 writing PID [30269] to the file []
2010-02-14 17:41:57.927: [  clsdmt][1092499776]Failed to record pid for CRSD
2010-02-14 17:41:57.927: [  clsdmt][1092499776]Terminating process
2010-02-14 17:41:57.927: [ default][1092499776] CRSD exiting on stop request from clsdms_thdmai
 
The solution is to create a dummy pid file ($GRID_HOME/crs/init/$HOST.pid) manually as grid user with "touch" command and restart resource ora.crsd



Reference : 

Troubleshooting CRSD Start up Issue (Doc ID 1323698.1)



Tuesday, August 16, 2022

Awr in Oracle Multitenant/ Pluggable Database Pdb/Cdb - awr_pdb_autoflush_enabled ,AWR_SNAPSHOT_TIME_OFFSET , AWR_PDB_MAX_PARALLEL_SLAVES


Since we were exploring enabling awr snapshots at  pdb level ,  we came across   below 3 parameters  that control  awr   generation at pluggable database level .

AWR_PDB_AUTOFLUSH_ENABLED
Awr_snapshot_time_offset
AWR_PDB_MAX_PARALLEL_SLAVES




AWR_PDB_AUTOFLUSH_ENABLED

1. AWR Snapshots and reports can be created only at the container database (CDB) level in 
   Oracle 12c R1 (12.1.0.1.0 / 12.1.0.2.0)
2. AWR Snapshots and reports can be created at the container database (CDB) level as well as pluggable database (PDB) level 
   in Oracle 12c R2 (12.2.0.1.0)
3. By default, AWR Snapshots and reports can be generated only at the container database (CDB) level


The default value of AWR_PDB_AUTOFLUSH_ENABLED is false. Thus, by default, automatic AWR snapshots are disabled for all the PDBs in a CDB.

When you change the value of AWR_PDB_AUTOFLUSH_ENABLED in the CDB root, the new value takes effect in all the PDBs in the CDB.

You can also change the value of AWR_PDB_AUTOFLUSH_ENABLED in any of the individual PDBs in a CDB, and the value that is set for each individual PDB will be honored. This enables you to enable or disable automatic AWR snapshots for individual PDBs.



for specified PDB

alter session set container=PDBtest;
alter system set awr_pdb_autoflush_enabled=true;



for all pdbs and CDB

alter session set container=CDB$ROOT;
alter system set awr_pdb_autoflush_enabled=true;



set interval and retention period by PDB or CDB$ROOT level. for all PDBSs and CDB we have to run script separately. for 30 days retention and 60 minutes interval script can be below.
 
alter session set container=PDB1;  
execute dbms_workload_repository.modify_snapshot_settings(interval => 60, retention=>64800);  
 


SQL> set lines 100
SQL> select * from cdb_hist_wr_control;

 SQL> SELECT con_id, to_char(end_interval_time, 'HH24:MI:SS') AS snap_time
  2  FROM cdb_hist_snapshot
  3  WHERE end_interval_time > to_timestamp('2020-10-02 11:45','YYYY-MM-DD HH24:MI')
  4  ORDER BY snap_time, con_id;


select * from awr_pdb_snapshot ; 






Awr_snapshot_time_offset

You might have observed those spikes on the top of every hour when all the AWR snapshots are taken and to  avoid thiswe  want snapshots taken on hour bases but within 5 minutes difference for every database


The parameter is specified in seconds. Normally, you set it to a value less than 3600. If you set the special value 1000000 (1,000,000), you get an automatic mode, in which the offset is based on the database name.

The automatic mode is an effective way of getting a reasonable distribution of offset times when you have a very large number of instances running on the same node.



SQL> alter session set container=CDB$ROOT;
Session altered.

SQL> SHOW CON_NAME

CON_NAME
------------------------------
CDB$ROOT



SQL> show parameter AWR_SNAPSHOT_TIME_OFFSET

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
awr_snapshot_time_offset             integer     0

SQL> alter system set AWR_SNAPSHOT_TIME_OFFSET=1000000 scope=both;





AWR_PDB_MAX_PARALLEL_SLAVES

From version 18c onward, with the dynamic initialization parameter AWR_PDB_MAX_PARALLEL_SLAVES, you can specify the maximum number of background processes that the database engine can concurrently use to take automatic PDB-level snapshots. Valid values go from 1 to 30; the default is 10. Even though this initialization parameter doesn’t affect the automatic snapshots in the root container, it can only be set in the root container. The following examples illustrate the behaviour with three PDBs and an interval of 15 minutes:

AWR_PDB_MAX_PARALLEL_SLAVES = 1: only one PDB-level snapshot is taken at the same time as the snapshot in the root container

AWR_PDB_MAX_PARALLEL_SLAVES = 10: 10 PDB-level snapshots are taken at the same time as the snapshot in the root container



Known issue : 

If   _cursor_stats_enabled  is set  , sql statistics wont be generated on PDB level and pdb awr  will have sql details missing in it . 



 

References : 

https://docs.oracle.com/en/database/oracle/oracle-database/12.2/refrn/AWR_PDB_AUTOFLUSH_ENABLED.html#GUID-08FA21BC-8FB1-4C51-BEEA-139C734D17A7

https://it.inf.unideb.hu/oracle/refrn/AWR_SNAPSHOT_TIME_OFFSET.html#GUID-90CD8379-DCB2-4681-BB90-0A32C6029C4E

Sunday, August 7, 2022

Troubleshooting a datapatch issue in Oracle Pluggable database using -verbose -debug

 

We had situation  where  datapatch was not applied to PDB .  Oracle Suggested to apply datapatch using  verbose  mode and  issue  was captured in log 


./dataoatch  -verbose -debug 



We can apply  datapatch on selective PDB using below 

./datapatch -verbose -pdbs PDB1,PDB2



It is always advisable to  check PDB violations after applying datapatch in pluggable environment . 

select name,cause,message from pdb_plug_in_violations;
select name,cause,type,action from pdb_plug_in_violations where status <> 'RESOLVED';



Below are other options 

$ORACLE_HOME/OPatch/datapatch -rollback all -force --pdb pdb1 
$ORACLE_HOME/OPatch/datapatch  -apply 7777 -force - bundle_series DBRU   --pdb pdb1 



Sql Used to verify : 

SET LINESIZE 500
SET PAGESIZE 1000
SET SERVEROUT ON
SET LONG 2000000
COLUMN action_time FORMAT A20
COLUMN action FORMAT A10
COLUMN status FORMAT A10
COLUMN description FORMAT A40
COLUMN source_version FORMAT A10
COLUMN target_version FORMAT A10
alter session set "_exclude_seed_cdb_view"=FALSE;
 select CON_ID,
        TO_CHAR(action_time, 'YYYY-MM-DD') AS action_time,
        PATCH_ID,
        PATCH_TYPE,
        ACTION,
        DESCRIPTION,
        SOURCE_VERSION,
        TARGET_VERSION
   from CDB_REGISTRY_SQLPATCH
  order by CON_ID, action_time, patch_id;


select patch_id, patch_uid, target_version, status, description, action_time
from dba_registry_sqlpatch
where action = 'APPLY';  


select * from OPATCH_XML_INV;




Reference: 

After running datapatch, PDB plugin or cloned db returns violations shown in PDB_PLUG_IN_VIOLATION (Doc ID 1635482.1)

Datapatch User Guide (Doc ID 2680521.1)


Friday, August 5, 2022

Oracle Cloud Exacc Basics Key-Words / Terms of OCI

 

Before we start  to Support  Oci  we need to Understand  Basic Terminology   Of OCI .  


bare metal host

Oracle Cloud Infrastructure provides you control of the physical host (“bare metal”) machine. Bare metal compute instances run directly on bare metal servers without a hypervisor. When you provision a bare metal compute instance, you maintain sole control of the physical CPU, memory, and network interface card (NIC). You can configure and utilize the full capabilities of each physical machine as if it were hardware running in your own data center. You do not share the physical machine with any other tenants.


regions and availability domains

Oracle Cloud Infrastructure is physically hosted in regions and availability domains. A region is a localized geographic area, and an availability domain is one or more data centers located within a region. A region is composed of one or more availability domains. Oracle Cloud Infrastructure resources are either region-specific, such as a virtual cloud network, or availability domain-specific, such as a compute instance.

Availability domains are isolated from each other, fault tolerant, and very unlikely to fail simultaneously or be impacted by the failure of another availability domain. When you configure your cloud services, use multiple availability domains to ensure high availability and to protect against resource failure. Be aware that some resources must be created within the same availability domain, such as an instance and the storage volume attached to it.
For more details see Regions and Availability Domains.


realm

A realm is a logical collection of regions. Realms are isolated from each other and do not share any data. Your tenancy exists in a single realm and has access to the regions that belong to that realm. Oracle Cloud Infrastructure currently offers a realm for commercial regions and two realms for government cloud regions: FedRAMP authorized and IL5 authorized.


Console

The simple and intuitive web-based user interface you can use to access and manage Oracle Cloud Infrastructure.



tenancy

When you sign up for Oracle Cloud Infrastructure, Oracle creates a tenancy for your company, which is a secure and isolated partition within Oracle Cloud Infrastructure where you can create, organize, and administer your cloud resources.


compartments

Compartments allow you to organize and control access to your cloud resources. A compartment is a collection of related resources (such as instances, virtual cloud networks, block volumes) that can be accessed only by certain groups that have been given permission by an administrator. A compartment should be thought of as a logical group and not a physical container. When you begin working with resources in the Console, the compartment acts as a filter for what you are viewing.

When you sign up for Oracle Cloud Infrastructure, Oracle creates your tenancy, which is the root compartment that holds all your cloud resources. You then create additional compartments within the tenancy (root compartment) and corresponding policies to control access to the resources in each compartment. When you create a cloud resource such as an instance, block volume, or cloud network, you must specify to which compartment you want the resource to belong.

Ultimately, the goal is to ensure that each person has access to only the resources they need.



security zones

A security zone is associated with a compartment. When you create and update cloud resources in a security zone, Oracle Cloud Infrastructure validates these operations against security zone policies. If any policy is violated, then the operation is denied. Security zones let you be confident that your resources comply with Oracle security principles.
virtual cloud network (VCN)

A virtual cloud network is a virtual version of a traditional network—including subnets, route tables, and gateways—on which your instances run. A cloud network resides within a single region but includes all the region's availability domains. Each subnet you define in the cloud network can either be in a single availability domain or span all the availability domains in the region (recommended). You need to set up at least one cloud network before you can launch instances. You can configure the cloud network with an optional internet gateway to handle public traffic, and an optional IPSec connection or FastConnect to securely extend your on-premises network.


instance

An instance is a compute host running in the cloud. An Oracle Cloud Infrastructure compute instance allows you to utilize hosted physical hardware, as opposed to the traditional software-based virtual machines, ensuring a high level of security and performance.


image

The image is a template of a virtual hard drive that defines the operating system and other software for an instance, for example, Oracle Linux. When you launch an instance, you can define its characteristics by choosing its image. Oracle provides a set of platform images you can use. You can also save an image from an instance that you have already configured to use as a template to launch more instances with the same software and customizations.


shape

In Compute, the shape specifies the number of CPUs and amount of memory allocated to the instance. Oracle Cloud Infrastructure offers shapes to fit various computing requirements. See the list of compute shapes.

In Load Balancing, the shape determines the load balancer's total pre-provisioned maximum capacity (bandwidth) for ingress plus egress traffic. Available shapes include 100 Mbps, 400 Mbps, and 8000 Mbps.


key pair

A key pair is an authentication mechanism used by Oracle Cloud Infrastructure. A key pair consists of a private key file and a public key file. You upload your public key to Oracle Cloud Infrastructure. You keep the private key securely on your computer. The private key is private to you, like a password.
Key pairs can be generated according to different specifications. Oracle Cloud Infrastructure uses two types of key pairs for specific purposes:

Instance SSH Key pair: This key pair is used to establish secure shell (SSH) connection to an instance. When you provision an instance, you provide the public key, which is saved to the instance's authorized key file. To log on to the instance, you provide your private key, which is verified with the public key.
API signing key pair: This key pair is in PEM format and is used to authenticate you when submitting API requests. Only users who will be accessing Oracle Cloud Infrastructure via the API need this key 
pair.

 



block volume

A block volume is a virtual disk that provides persistent block storage space for Oracle Cloud Infrastructure instances. Use a block volume just as you would a physical hard drive on your computer, for example, to store data and applications. You can detach a volume from one instance and attach it to another instance without loss of data.


Object Storage

Object Storage is a storage architecture that allow you to store and manage data as objects. Data files can be of any type and up to 10 TiB in size. Once you upload data to Object Storage it can be accessed from anywhere. Use Object Storage when you want to store a very large amount of data that does not change very frequently. Some typical use cases for Object Storage include data backup, file sharing, and storing unstructured data like logs and sensor-generated data.


bucket

A bucket is a logical container used by Object Storage for storing your data and files. A bucket can contain an unlimited number of objects.
Oracle Cloud Identifier (OCID)
Every Oracle Cloud Infrastructure resource has an Oracle-assigned unique ID called an Oracle Cloud Identifier (OCID). This ID is included as part of the resource's information in both the Console and API.




References:

https://docs.oracle.com/en-us/iaas/Content/GSG/Concepts/concepts.htm#:~:text=you%20are%20viewing.-,When%20you%20sign%20up%20for%20Oracle%20Cloud%20Infrastructure%2C%20Oracle%20creates,the%20resources%20in%20each%20compartment