Abdul hafeez kalsekar -- OCS/OCE/OCP

Sunday, January 29, 2023

Oracle database 19c Dynamic sequence cache - LAST_VALUE for sequences is incorrectly calculated

Recently after migrating to 19c , Application team started complaining that sequences value are not properly calculated .

It was later found that it was due to Dynamic sequence cache introduced in 19c .

This issue is supposed to be fixed in 19.13

If the nextval value of the sequence is frequently called in the application, there may be performance

problems.

The default sequence cache is usually 20, I still recommend configuring cache 200 to start when creating.

The feature is enabled by default and requires no additional setup by a DBA or end user. This feature resizes sequence caches dynamically based on the rate of consumption of sequence numbers. As a result sequence cache sizes can grow well beyond the configured DDL cache size.

If a sequence is used across nodes in a RAC setting, this can result in large gaps in sequence numbers seen across instances.

If you see large gaps in sequence numbers on 19.10 or later, and your application is sensitive to such gaps, then you can disable this feature with the following setting:

_dynamic_sequence_cache = FALSE;

References :

19c LAST_VALUE for sequences is incorrectly calculated when next cache of numbers required (Doc ID 2797098.1)

Sequence dynamic cache resizing feature (Doc ID 2790985.1)

Saturday, January 21, 2023

Gathering Information for Troubleshooting Oracle Physical Standby Database /Dataguard

Writing this blog to keep all scripts handy for handling dataguard issues

Script to Collect Data Guard Physical and Active Standby Diagnostic Information for Version 10g and above (Including RAC) ( ID 1577406.1)

https://abdul-hafeez-kalsekar-tuning.blogspot.com/2023/01/script-to-collect-data-guard-physical_21.html

Script to Collect Data Guard Primary Site Diagnostic Information [ID 241374.1]

https://abdul-hafeez-kalsekar-tuning.blogspot.com/2023/01/script-to-collect-dataguard-primary.html

Script to Collect Data Guard Primary Site Diagnostic Information for Version 10g and above (Including RAC). (Doc ID 1577401.1)

https://abdul-hafeez-kalsekar-tuning.blogspot.com/2023/01/script-to-collect-data-guard-primary.html

Script to Collect Data Guard Physical Standby Diagnostic Information [ID 241438.1]

https://abdul-hafeez-kalsekar-tuning.blogspot.com/2023/01/script-to-collect-data-guard-physical.html

SRDC - Collect Data Guard Diagnostic Information (Doc ID 2219763.1)

https://abdul-hafeez-kalsekar-tuning.blogspot.com/2023/01/srdc-collect-data-guard-diagnostic.html

Script to Collect Log File Sync Diagnostic Information (lfsdiag.sql) (Doc ID 1064487.1)

https://abdul-hafeez-kalsekar-tuning.blogspot.com/2023/01/script-to-collect-log-file-sync.html

Checking Log Transport Lag and Log transport error :

Have documented sql used in below separate blog foe easiness

https://abdul-hafeez-kalsekar-tuning.blogspot.com/2023/01/monitoring-oracle-standby-log-transport.html

Improving Log transport :

1) Using multithreaded log writer / lgwr process for Sync log transport

2) Increasing Archiver process for async log transport .

Handling Huge archive gap or when archive is missing :

1) Applying Incremental Scn based backup

Logs to checks :

1) Prod and Dr Site Alert Logs

2) Prod and Dr Site dg broker log

3) Prod site lgwr logfile .

DGMGRL Commands :

Spool dgconfig.log

DGMGRL> SHOW INSTANCE VERBOSE sales1;

DGMGRL> SHOW INSTANCE sales1;

DGMGRL> show configuration verbose;

DGMGRL> show configuration verbose “tststby”;

DGMGRL> show database prod1;

DGMGRL> show database prod1dr;

DGMGRL> show database prod1 statusreport;

DGMGRL> show database prod1 inconsistentProperties;

DGMGRL> show database prod1 inconsistentlogxptProps;

DGMGRL> show database prod1 logxptstatus;

DGMGRL> show database prod1 latestlog;

DGMGRL> show fast_start failover;

DGMGRL> show database STYDB InconsistentProperties

DGMGRL>show database STYDB InconsistentLogXptProps

DGMGRL>show database PRIMDB statusreport

DGMGRL>show database PRIMDB sendQentries

DGMGRL>show database STYDB recvqentries

DGMGRL>show database PRIMDB topwaitevents

DGMGRL> edit configuration set property tracelevel=support;

DGMGRL> edit database PRIMDB set property LogArchiveTrace=8191;

DGMGRL> edit database STYDB set property LogArchiveTrace=8191;

DGMGRL> edit configuration reset property tracelevel ;

DGMGRL> edit database PRIMDB reset property logarchivetrace;

DGMGRL> edit database STYDB reset property logarchivetrace;

DGMGRL> VALIDATE DATABASE ‘chennai’;

DGMGRL> validate database verbose standby

DGMGRL> VALIDATE network configuration for all ;

EVENT

To determine which resource is constraining asynchronous transport, use krsb stats which can be enabled by setting event 16421 on both the primary and standby databases:

alter session set events ‘16421 trace name context forever, level 3’;

To disable krsb stats set event 16421 to level 1:

alter session set events ‘16421 trace name context forever, level 1’;

Checking Dataguard Parameters :

set linesize 500 pages 0

col value for a90

col name for a50

select name, value

from v$parameter

where name in ('db_name','db_unique_name','log_archive_config', 'log_archive_dest_1','log_archive_dest_2',

'log_archive_dest_state_1','log_archive_dest_state_2', 'remote_login_passwordfile',

'log_archive_format','log_archive_max_processes','fal_server','db_file_name_convert',

'log_file_name_convert', 'standby_file_management');

SELECT db_param.NAME,

db_param.VALUE,

db_db.db_unique_name,

db_db.database_role

FROM v$parameter db_param,

v$database db_db

WHERE db_param.NAME IN ( 'db_file_name_convert',

'db_name',

'db_unique_name',

'fal_client',

'fal_server',

'local_listener',

'log_archive_config',

'log_archive_dest_1',

'log_archive_dest_2',

'log_archive_dest_3',

'log_archive_dest_state_1',

'log_archive_dest_state_2',

'log_archive_dest_state_3',

'log_file_name_convert',

'standby_archive_dest',

'standby_file_management',

'remote_login_passwordfile',

'log_archive_format'

)

ORDER BY db_param.NAME;

############# Dataguard redo apply rate ##########

watch -n 5 ./guard_apply_takip.sh

#!/bin/bash

date;

hostname;

export ORACLE_HOME=/oracle/product/11.2.0

export ORACLE_SID=TESTDB

echo "*******************************************************************************************************************"

sqlplus -S / as sysdba <<EOF

set lines 200

set feedback off

set pages 0

col comments for a20

col start_time for a30

col item for a30

col units for a15

col last_apply_time for a30

select to_char(R.START_TIME,'DD.MON.YYYY HH24:MI')start_time ,

R.ITEM, R.UNITS, R.SOFAR ,

to_char(R.TIMESTAMP,'DD.MON.YYYY HH24:MI:SS') last_apply_time

from v\$recovery_progress R

where start_time = (select max(start_time) from v\$recovery_progress) order by start_time, item;

select process, status, thread#, sequence#, block#, blocks from v\$managed_standby order by 1;

EOF

echo ""

echo "*******************************************************************************************************************"

top -cbn 1 | head -n 40

Other Views :

############### Gather the per log redo generation rate, . ###############

SQL> alter session set nls_date_format='YYYY/MM/DD HH24:MI:SS';

SQL> select thread#,sequence#,blocks*block_size/1024/1024 MB,(next_time-first_time)*86400 sec, blocks*block_size/1024/1024)/((next_time-first_time)*86400) "MB/s" from v$archived_log

where ((next_time-first_time)*86400<>0)

and first_time between to_date('2015/01/15 08:00:00','YYYY/MM/DD HH24:MI:SS')

and to_date('2015/01/15 11:00:00','YYYY/MM/DD HH24:MI:SS')

and dest_id=1 order by first_time;

########## To Check Lag ################

col NAME format a10

select NAME,TIME,UNIT,COUNT,LAST_TIME_UPDATED from V$STANDBY_EVENT_HISTOGRAM where

name like '%lag' and count >0 order by LAST_TIME_UPDATED;

############### To check speed of Log apply On Datagurd ########

set lines 120 pages 99

alter session set nls_date_format='YYYY/MM/DD HH24:MI:SS';

select START_TIME, ITEM, SOFAR, UNITS from gv$recovery_progress;

Reference :

https://docs.oracle.com/en/database/oracle/oracle-database/21/haovw/tune-and-troubleshoot-oracle-data-guard.html#GUID-30CD6E1C-1CE2-4BB6-A404-896D5C06ECCE

How to Generate AWRs in Active Data Guard Standby Databases (Doc ID 2409808.1)

Thursday, December 15, 2022

Gathering Concurrent Statistics in Oracle Database -- Speeding up Statistics gather

We planned to share load of stats gather across rac instances and came across this feature of Concurrent Statistics .

Please note there are few reported issue of high cpu usage

Dont forget to fulfil requirements :

1) You also need to have job_queue_processes parameter different from 0:

SQL> show parameter job_queue_processes

2) You also need to deactivate parallel_adaptive_multi_user:

ALTER SYSTEM SET parallel_adaptive_multi_user=FALSE;

3) You also need to have an active resource plan . The aim of this resource plan activation is to control the resources used by concurrent statistics jobs

SQL> show parameter resource_manager_plan

Check for the stats global preference value using below query

SELECT DBMS_STATS.get_prefs('CONCURRENT') FROM dual;

Turn On CONCURRENT statistics

exec dbsm_stats.set_global_prefs('DEGREE', dbms_stats.auto_degree) ;

BEGIN

DBMS_STATS.set_global_prefs (

pname => ‘CONCURRENT’,

pvalue => ‘ALL’);

END;

in 12cR1 the authorized values are : MANUAL, AUTOMATIC, ALL and OFF to control whether you wish concurrent statistics for manual commands, for automatic jobs for both or not.

Instead of ALL (which works for both manual and automatic stats collection), we can also set below values

MANUAL – only for manual stats collection

AUTOMATIC – only for automatic stats collection

OFF -- TO SWITCH OFF

If you see that Global preferences value is already set to ALL, you need to grant below mentioned roles to the user which is performing gather stats.

These grants are not default, so users will face issues if they use concurrent statistics.

SQL> GRANT CREATE JOB, MANAGE SCHEDULER, MANAGE ANY QUEUE TO testuser;

Be very careful with the concurrent statistics feature as when activated most of your users will not be able any more to gather statistics, even on their own objects:

ERROR AT line 1:

ORA-20000: Unable TO gather STATISTICS concurrently, insufficient PRIVILEGES

ORA-06512: AT "SYS.DBMS_STATS", line 24281

ORA-06512: AT "SYS.DBMS_STATS", line 24332

ORA-06512: AT line 1

Checking if concurrent jobs :

col COMMENTS FOR a50

SELECT job_name, state, comments

FROM dba_scheduler_jobs

WHERE job_class LIKE 'CONC%';

Checking Global Preference :

SET LINESIZE 150

COLUMN autostats_target FORMAT A20

COLUMN cascade FORMAT A25

COLUMN degree FORMAT A10

COLUMN estimate_percent FORMAT A30

COLUMN method_opt FORMAT A25

COLUMN no_invalidate FORMAT A30

COLUMN granularity FORMAT A15

COLUMN publish FORMAT A10

COLUMN incremental FORMAT A15

COLUMN stale_percent FORMAT A15

SELECT DBMS_STATS.GET_PREFS('AUTOSTATS_TARGET') AS autostats_target,

DBMS_STATS.GET_PREFS('CASCADE') AS cascade,

DBMS_STATS.GET_PREFS('DEGREE') AS degree,

DBMS_STATS.GET_PREFS('ESTIMATE_PERCENT') AS estimate_percent,

DBMS_STATS.GET_PREFS('METHOD_OPT') AS method_opt,

DBMS_STATS.GET_PREFS('NO_INVALIDATE') AS no_invalidate,

DBMS_STATS.GET_PREFS('GRANULARITY') AS granularity,

DBMS_STATS.GET_PREFS('PUBLISH') AS publish,

DBMS_STATS.GET_PREFS('INCREMENTAL') AS incremental,

DBMS_STATS.GET_PREFS('STALE_PERCENT') AS stale_percent ,

DBMS_STATS.get_prefs('CONCURRENT') AS CONCURRENT

FROM dual;

col spare4 format a40 head 'VALUE'

	select sname,spare4
	from sys.optstat_hist_control$
	order by 1
	/

Checking Table level preference :

select * from dba_tab_stat_prefs where table_name = '&&TABLE';

References :

Oracle Concurrent （CONCURREMT） Collect statistics ( file ID 1555451.1)

https://docs.oracle.com/database/121/TGSQL/tgsql_stats.htm#TGSQL428

Saturday, November 26, 2022

Oracle Rac database Rman Backup using multiple Instances -- Node affinity

Recently we had requirement to share rman backup across nodes . Came across Oracle article that facilities requirement using Node affinity

In some cluster database configurations, some nodes of the cluster have faster access to some datafiles than to other datafiles. RMAN automatically detects this affinity, which is known as node affinity awareness.

To use node affinity, configure RMAN channels on the nodes of the cluster that have affinity to the datafiles you want to back up. For example, use the syntax:

-- setup for a parallel backup

RMAN> CONFIGURE DEVICE TYPE DISK BACKUP TYPE TO COMPRESSED BACKUPSET PARALLELISM 4;

RMAN> CONFIGURE CHANNEL 1 DEVICE TYPE DISK CONNECT 'sys/passwd@inst1';

RMAN> CONFIGURE CHANNEL 2 DEVICE TYPE DISK CONNECT 'sys/passwd@inst1';

RMAN> CONFIGURE CHANNEL 3 DEVICE TYPE DISK CONNECT 'sys/passwd@inst2';

RMAN> CONFIGURE CHANNEL 4 DEVICE TYPE DISK CONNECT 'sys/passwd@inst2' ;

CONFIGURE DEVICE TYPE sbt PARALLELISM 3;

CONFIGURE DEFAULT DEVICE TYPE TO sbt;

CONFIGURE CHANNEL 1 DEVICE TYPE sbt CONNECT 'user1/password1@node1';

CONFIGURE CHANNEL 2 DEVICE TYPE sbt CONNECT 'user2/password2@node2';

CONFIGURE CHANNEL 3 DEVICE TYPE sbt CONNECT 'user3/password3@node3';

You can manually override the automatic node affinity by specifying which channels should back up which datafiles. For example:

BACKUP

# channel 1 gets datafile 1

(DATAFILE 1 CHANNEL ORA_SBT_TAPE_1)

# channel 2 gets datafiles 2-4

(DATAFILE 2,3,4 CHANNEL ORA_SBT_TAPE_2)

# channel 3 gets datafiles 5-10

(DATAFILE 5,6,7,8,9,10 CHANNEL ORA_SBT_TAPE_3);

References :

https://docs.oracle.com/cd/B10500_01/rac.920/a96596/backup.htm

Sunday, November 6, 2022

Oracle Database DBMS_Scheduler job to run on preferred node in RAC -- instance_stickiness

We observed that Rac 1st node seems highly loaded though we had database service spread across instances . Checking further came across dba_scheduler_jobs feature instance_stickiness introduced back in 11g but not known much .

If instance_stickiness is set to FALSE, each instance of the job runs on the first instance available. For environments other than Oracle RAC, this attribute is not useful because there is only one instance

dba_scheduler_jobs.instance_id will be NULL unless you have explicitly set it with set_attribute.

Instance Stickiness value true means job will run on same node until node is extremely overloaded or not available. False means job will run on any available node. Its default value is true. Job will continue on that on which its first run or started in RAC case. No use in Standalone server.

BEGIN

dbms_scheduler.create_job(

job_name => 'TEST_JOB'

,job_type => 'PLSQL_BLOCK'

,job_action => 'begin null; end; '

,start_date => TIMESTAMP'2020-07-24 04:15:01 US/Eastern'

,repeat_interval => 'FREQ=DAILY'

,enabled => TRUE

,comments => 'scheduler for LATE FEE CALULATION FOR LOANS');

dbms_scheduler.set_attribute(

'TEST_JOB',

'INSTANCE_ID',

'1'

);

END;

--Change the attribute value with DBMS set attribute procedure

--select Preferred instance run the job is 1

exec dbms_scheduler.set_attribute(name => 'TEST_JOB' ,attribute=>'INSTANCE_ID', value=>'1');

--Disable the instance stickness attribute.

exec dbms_scheduler.set_attribute(name => 'TEST_JOB' ,attribute=>'INSTANCE_STICKINESS', value=>FALSE);

--Check the job condition and status with following query

select OWNER,JOB_NAME,INSTANCE_STICKINESS,INSTANCE_ID from dba_scheduler_jobs where JOB_NAME like 'TEST_JOB';

Using Job Class and Database Service :

This can also be achieved using service defined in Job class

The Oracle 10g scheduler allows jobs to be linked with job classes, which in turn can be linked to services to allows jobs to run on specific nodes in a RAC environment. To support our requirements we might create a job classes as follows.

--Create the service on two nodes.

srvctl add service -d ORCL -s BATCH_SERVICE -r RAC1,RAC2

srvctl add service -d ORCL -s BATCH_SERVICE -preferred RAC1 -available RAC2

SQL> BEGIN

DBMS_SCHEDULER.create_job_class(

job_class_name => 'BATCH_JOB_CLASS',

service => 'BATCH_SERVICE');

END;

-- Create job by using that job class

Begin

DBMS_SCHEDULER.CREATE_JOB (

job_name => 'TEST_JOB',

job_type => 'PLSQL_BLOCK',

job_action => 'Begin dbms_output.put_line(''Hello'') end;',

number_of_arguments => 0,

start_date => Sysdate,

repeat_interval => 'freq=daily;byhour=9,21;byminute=0;bysecond=0;',

job_class => 'BATCH_JOB_CLASS',

enabled => 'TRUE'

);

END;

Reference :

How To Run A Job On Specific Node Using DBMS_SCHEDULER (Doc ID 472535.1)

Doc ID 2214476.1

Monday, October 31, 2022

Reading Unix Compressed files without uncompressing them using Zcat, Zless, Zgrep, Zdiff

Mostly we have situation that we need to see content of compressed files without actually uncompressing it . This can be achieved using Z commands

Below are options using z commands

Viewing the compressed file with zcat.

Paging the compressed file with zless / zmore.

Searching inside the compressed file with zgrep / zegrep.

Comparison of file using zdiff / zcmp

Below are examples :

$ zcat big-file.txt.gz

[Note: View the file without uncompressing it]

zcat big-file.txt.gz > big-file.txt

[Note: Uncompress the file]

$ zcat filename.gz | more

$ zcat filename.gz | less

(or)

$ zless filename.gz

$ zmore filename.gz

$ zgrep -i less test-file.txt.gz

$ zdiff file1.txt.gz file2.txt.gz

Wednesday, October 26, 2022

Oracle Database Health Check with DBMS_HM -- HEALTH CHECK MONITOR

Was trying to explore options for Block corruption checks came across this old gem DBMS_HM , and want to document it . Though it is old utility introduced in 11g

Health check run in the two mode :

1) Online Mode: Health check run while database is in open or mounted mode.
2) Offline Mode: Health check run while database is in nomount mode.Only the Redo Integrity Check and the DB Structure Integrity Check can be used in DB-offline mode.

Types of health check :

1) DB Structure Integrity Check: This option check datafiles if there is corruption or inaccessible situtation.if database is open this check logfile and datafile if nomount mode only checkhed controlfile.
2) Data Block Integrity Check: This check detects disk image block corruptions such as checksum failures and logical inconsistencies within the block.
3) Redo Integrity Check: This check scans redo log for accessibility and corruption, as well as the archive logs, if available.
4) Undo Segment Integrity Check: This check finds logical undo corruptions.
5) Transaction Integrity Check: This check is identical to the Undo Segment Integrity Check but it checks only one specific transaction.
6) Dictionary Integrity Check: This check integrity of core dictionary objects, such as tab$ and col$

Sample :

1) DB Structure Integrity Check:
SQL> BEGIN
DBMS_HM.RUN_CHECK('Dictionary Integrity Check', 'hk_run');
END;
/

2) Data Block Integrity Health Check

BEGIN
DBMS_HM.RUN_CHECK (
check_name => ‘Data Block Integrity Check’,
run_name => ‘datablockint’,
input_params => ‘BLC_DF_NUM=4;BLC_BL_NUM=111’);
END;
/

3) Transaction Health Check:
BEGIN
DBMS_HM.RUN_CHECK (
check_name => ‘Transaction Integrity Check’,
run_name => ‘transacitoncheck’,
input_params => ‘TXN_ID=7.33.2’);
END;
/

4) Undo Segment Health Check:
BEGIN
DBMS_HM.RUN_CHECK (
check_name => ‘Undo Segment Integrity Check’,
run_name => 'undosegmentcheck',
input_params => ‘USN_NUMBER=1’);
END;
/

Health Monitor Views:

V$HM_CHECK – lists all Health checks
V$HM_CHECK_PARAM - lists information about Health checks, input parameters and defaults. Join CHECK_ID with V$HM_CHECK.ID.
V$HM_FINDING – Information about findings of Health Monitor runs.
V$HM_INFO – Information about runs, finding and recommendations.
V$HM_RECOMMENDATION – information about any recommendation from Health check runs.
V$HM_RUN – Information about checker runs like name, mode, time of run, etc.

SQL>SELECT c.name check_name, p.name parameter_name, p.type,
p.default_value, p.description
FROM v$hm_check_param p, v$hm_check c
WHERE p.check_id = c.id and c.internal_check = ‘N’
ORDER BY c.name;

SQL>select * from V$HM_RECOMMENDATION;
SET LONG 100000
SET LONGCHUNKSIZE 1000
SET PAGESIZE 1000
SET LINESIZE 512
SELECT DBMS_HM.get_run_report('undosegmentcheck') FROM dual;

Viewing Reports Using the ADRCI Utility

adrci>>
adrci> show hm_run

Sunday, October 9, 2022

Creating Awr Snapshot in Oracle Active Dataguard Standby Database -- UMF TOPOLOGY

The feature uses the Remote Management Framework which comes with a New Oracle built-in user called SYS$UMF. This user is locked by default and should be unlocked before configuring the RMF.

Since Oracle Database 12c Release 2 (12.2), Automatic Workload Repository (AWR) data can be captured for Active Data Guard (ADG) standby databases. This feature enables analyzing any performance-related issues for ADG standby databases

AWR snapshots for ADG standby databases are called remote snapshots. A database node, called destination, is responsible for storing snapshots that are collected from remote ADG standby database nodes, called sources.

Optionally we can also use statspack following below documents

Installing and Using Standby Statspack (Doc ID 454848.1)

Installing and Using Standby Statspack in 12c Multitenant Database (Doc ID 2020285.1)

Step 1: Unlock SYS$UMF user and change its password using SYS user.

SYS$UMF account is used to administer Remote Management Framework, including the remote Automatic Workload Repository (AWR).

SQL>alter user sys$umf identified by oracle account unlock;

Step 2: Create the database link from primary to the standby database and standby database to a primary.

Primary database DB_UNIQUE_NAME: Primary

Standby database DB_UNIQUE_NAME: Standby

SQL> create database link primary_to_standby connect to "SYS$UMF" identified by "oracle" using 'Standby';

Database link created.

SQL> create database link standby_to_primary connect to "SYS$UMF" identified by "oracle" using 'Primary';

Database link created.

Step 3: Now we need to configure database to add topology.

Each database name must be assigned a unique name. Default name is db_unique_name. In my case dbupgrade and std_dbupgarde.

DBMS_UMF.CONFIGURE_NODE procedure configures a node that needs to be registered with the RMF topology. This procedure must be executed on the node that needs to be configured.

DBMS_UMF.CONFIGURE_NODE(

node_name IN VARCHAR2 DEFAULT NULL,

dblink_to_target IN VARCHAR2 DEFAULT NULL);

On primary :

alter system set "_umf_remote_enabled"=TRUE scope=BOTH;

SQL> exec dbms_umf.configure_node('Primary');

PL/SQL procedure successfully completed.

On standby :

SQL>exec dbms_umf.configure_node ('Standby','standby_to_primary');

PL/SQL procedure successfully completed.

Step 4: Create RMF topology

DBMS_UMF.CREATE_TOPOLOGY procedure creates the RMF topology and designates the node on which it is executed as the destination node for that topology.

DBMS_UMF.CREATE_TOPOLOGY( topology_name IN VARCHAR2);

SQL> exec DBMS_UMF.create_topology ('Topology_1');

PL/SQL procedure successfully completed.

Step 5: Check DBA_UMF_REGISTRATION and dba_umf_topology view

SQL> select * from dba_umf_topology;

TOPOLOGY_NAME TARGET_ID TOPOLOGY_VERSION TOPOLOGY

--------------- ---------- ---------------- --------

Topology_1 4040060753 8 ACTIVE

SQL> col node_name format a20

SQL> select * from DBA_UMF_REGISTRATION;

TOPOLOGY_NAME NODE_NAME NODE_ID NODE_TYPE AS_SO AS_CA STATE

-------------------- -------------------- ---------- ---------- ----- ----- -----

Topology_1 Primary 4040060753 0 FALSE FALSE OK

Step 6: Register the standby database with topology

DBMS_UMF.REGISTER_NODE function and procedure register a node with the RMF topology. This procedure and function must be executed only on the destination node in the RMF topology.

Syntax

DBMS_UMF.REGISTER_NODE(

topology_name IN VARCHAR2,

node_name IN VARCHAR2,

dblink_to_node IN VARCHAR2 DEFAULT NULL,

dblink_from_node IN VARCHAR2 DEFAULT NULL,

as_source IN VARCHAR2 DEFAULT 'TRUE',

as_candidate_target IN VARCHAR2 DEFAULT 'FALSE');

SQL> exec DBMS_UMF.register_node ('Topology_1', 'Standby', 'primary_to_standby', 'standby_to_primary', 'FALSE','FALSE');

PL/SQL procedure successfully completed.

Step 7: Enable AWR service on the remote node

DBMS_WORKLOAD_REPOSITORY.REGISTER_REMOTE_DATABASE

procedure registers a remote database in the Automatic Workload Repository (AWR) using the Remote Management Framework (RMF).

DBMS_WORKLOAD_REPOSITORY.REGISTER_REMOTE_DATABASE(

node_name IN VARCHAR2,

topology_name IN VARCHAR2 DEFAULT NULL);

SQL> exec DBMS_WORKLOAD_REPOSITORY.register_remote_database(node_name=>'Standby');

PL/SQL procedure successfully completed.

Step 8: Now again verify in dba_umf_registration view

SQL> select * from dba_umf_registration;

TOPOLOGY_NAME NODE_NAME NODE_ID NODE_TYPE AS_SO AS_CA STATE

--------------- --------------- ---------- ---------- ----- ----- --------------------

Topology_1 Primary 4040060753 0 FALSE FALSE OK

Topology_1 Standby 1978111757 0 FALSE FALSE OK

Step 9: Create snapshot using RMF in the primary database for the remote database.

function and procedure create a remote snapshot using the Remote Management Framework (RMF). The function returns the snapshot ID.

DBMS_WORKLOAD_REPOSITORY.CREATE_REMOTE_SNAPSHOT(

node_id IN NUMBER,

flush_level IN VARCHAR2 DEFAULT 'BESTFIT');

SQL> exec dbms_workload_repository.create_remote_snapshot('Standby');

PL/SQL procedure successfully completed.

Step 10 : Create AWR report from a standby database

Note: NODE ID generated above consider as DBID for a standby database.

After specifying AWR report type HTML it will ask for DBID, give dbid associated with nodeid in dba_umf_registration.

Step 11 : After Role Switchover

Everything should be set now, you need to be aware after a switchover this procedure might not work . We can make the role change by executing the following procedure on the candidate destination

SQL> EXEC DBMS_UMF.SWITCH_DESTINATION(topology name, force_switch=>FALSE);

BUG 21046490 - UMF-ADG: FAILED TO TAKE SNAPSHOTS AFTER SWITCHOVERS IN ADG[This section is not visible to customers.]

Per non-published Bug 28930258, if in a hurry and has not constructed a topology, then check the value of _remote_awr_enabled on the current primary database.

SELECT b.ksppstvl "Session Value",

c.ksppstvl "Instance Value"

FROM sys.x$ksppi a,

sys.x$ksppcv b,

sys.x$ksppsv c

WHERE

a.indx = b.indx

AND

a.indx = c.indx

AND a.ksppinm = '_remote_awr_enabled';

If it is equal to TRUE, then simply set it to FALSE.

ALTER SYSTEM SET "_remote_awr_enabled" = FALSE SCOPE = BOTH SID = '*';

This will cause AWR snapshots to be generated automatically again.

-- Run in case want to DROP the topology

SQL> exec DBMS_UMF.drop_topology('NAME-OF-TOPOLOGY');

Known Errors :

We need to run at least two to get the begin_snap and end_snap.

If you encounter "ORA-13516: AWR Operation failed: Remote source not registered for AWR" then manually switch a few (2-3) logfiles on primary:

alter system switch logfile;

If you encounter "ORA-15766: already registered in an RMF topology" unregister the node as below and then rerun "DBMS_UMF.register_node":

exec DBMS_UMF.unregister_node ('Topology_1', 'stby');

PL/SQL procedure successfully completed.

If you encounter "ORA-13519: Database id (1730117407) exists in the workload repository" unregister the remote database as below and then rerun "DBMS_WORKLOAD_REPOSITORY.register_remote_database":

exec DBMS_WORKLOAD_REPOSITORY.unregister_remote_database('stby','Topology_1',TRUE);

Views :

set line 132

col topology_name format a15

col node_name format a15

select * from dba_umf_topology;

select * from dba_umf_registration;

select * from dba_umf_service;

select * from dba_umf_link;

References :

https://docs.oracle.com/en/database/oracle/oracle-database/19/tgdba/gathering-database-statistics.html#GUID-E1369092-DA6B-4CF4-B286-69D4A3BDFA1E

How to Generate AWRs in Active Data Guard Standby Databases (Doc ID 2409808.1)

Tuesday, September 27, 2022

Troubleshooting Oracle Rac Crs/Cluster Startup Issues

We need to check what component are started and what are pending based on that we need to take approach .

Known Issues / Issues faced :

1) ora.crsd remains in INTERMEDIATE and commands like crsctl hang and Logical corruption check failed (Doc ID 2075966.1)

2) CRSD Failing to Start or in INTERMEDIATE State Due to OLR corruption. (Doc ID 2809968.1)

3) CRS is not starting up on the cluster node (Doc ID 2445492.1) -- Missing execute permission on <GRID_HOME>/srvm/mesg directory

4) Crs activeversion is different on both nodes after patching/upgrade

5) Permission of Voting/Ocr disk was changed

6) ohasd process was no started due to missing /etc/systemd/system/ohasd.service

7) Asm startup issues faced due to below

--> asm disk were not shared between nodes

--> asm_diskstring parameters were somehow removed

7) asm not started due to afd filter issue . failed to access AFD label disk after reboot

This has been documented in my another Blog below

https://abdul-hafeez-kalsekar.blogspot.com/2021/10/how-to-install-and-configure-asm-filter.html

Solutions We can try wherever applicable :

1) If Ohas process is started already , try to start asm manually which should bring up crs

2) Start crsd process alone :

crsctl check crs

crsctl check cluster -all

crsctl start res ora.crsd -init

Execute /etc/init.d/init.crs start

3) Start ohasd manually . :

nohup sh /etc/init.d/init.ohasd run &

ps -ef | grep -i d.bin

ps -ef | grep -i ohasd

RHEL 7 onwards, it uses systemd rather than initd for starting or restarting processes and runs them as a service.

[root@pmyws01 ~]# cat /etc/systemd/system/ohasd.service

[Unit]

Description=Oracle High Availability Services

After=syslog.target

[Service]

ExecStart=/etc/init.d/init.ohasd run >/dev/null 2>&1 Type=simple

Restart=always

[Install]

WantedBy=multi-user.target

[root@pmyws01 ~]# systemctl daemon-reload

[root@pmyws01 ~]# systemctl enable ohasd.service

[root@pmyws01 ~]# systemctl start ohasd.service

[root@pmyws01 ~]# systemctl status ohasd.service

4) Try running /u01/app/19.0.0/grid/crs/install/rootcrs.sh -postpatch

Handy commands

crsctl get cluster name

crsctl get cluster configuration

crsctl enable crs

crsctl disable crs

crsctl config crs

crsctl query css votedisk

crsctl start cluster -all | -n nodename

crsctl stop cluster -all | -n nodename

crsctl start cluster

crsctl check cluster -all | -n nodename

crsctl query crs activeversion

crsctl query crs activeversion -f

crsctl query crs softwareversion

crsctl query crs softwarepatch

crsctl start crs [-excl [-nocrs] [-cssonly]] | [-wait | -waithas | -nowait] | [-noautostart]

crsctl start res ora.crsd -init

crsctl start crs -excl -nocrs

crsctl start crs -wait

crsctl stop rollingpatch

$GRID_HOME/bin/kfod op=patches

/u01/grid/19.0.0.0/bin/srvctl status cha

crsctl modify resource ora.chad -attr "ENABLED=1" -unsupported (to enable cha if 'ora.chad' is disabled )

grep "OCR MASTER" $GRID_HOME/log/<nodename>/crsd/crsd.log

Must Read : Top 5 Grid Infrastructure Startup Issues (Doc ID 1368382.1)

Issue #1: CRS-4639: Could not contact Oracle High Availability Services, ohasd.bin not running or ohasd.bin is running but no init.ohasd or other processes

Issue #2: CRS-4530: Communications failure contacting Cluster Synchronization Services daemon, ocssd.bin is not running

Issue #3: CRS-4535: Cannot communicate with Cluster Ready Services, crsd.bin is not running

Issue #4: Agent or mdnsd.bin, gpnpd.bin, gipcd.bin not running

Issue #5: ASM instance does not start, ora.asm is OFFLINE

Logs to Check :

1) Cluster Alert Log

2) deamon logs

3) os messages log

4) oragent log

Most clusterware daemons/processes pid and output file are in <ORACLE_BASE>/crsdata/<node>/output

Most clusterware daemons/processes logs are in <ORACLE_BASE>/diag/crs/<node>/crs/trace

References :

1) Troubleshooting CRSD Start up Issue (Doc ID 1323698.1)

2) Redaht7/Oracle Linux7 + ORA11g : ohasd fails to start(Doc ID 1959008.1)

3) Troubleshoot Grid Infrastructure Startup Issues (Doc ID 1050908.1)

4) How To Gather & Backup ASM/ACFS Metadata In A Formatted Manner version 10.1, 10.2, 11.1, 11.2, 12.1, 12.2, 18.x and 19.x (Doc ID 470211.1)

Case Study 1: OHASD does not start

As ohasd.bin is responsible to start up all other cluserware processes directly or indirectly, it needs to start up properly for the rest of the stack to come up. If ohasd.bin is not up, when checking its status, CRS-4639 (Could not contact Oracle High Availability Services) will be reported; and if ohasd.bin is already up, CRS-4640 will be reported if another start up attempt is made; if it fails to start, the following will be reported:

CRS-4124: Oracle High Availability Services startup failed.

CRS-4000: Command Start failed, or completed with errors.

Automatic ohasd.bin start up depends on the following:

1. OS is at appropriate run level:

OS need to be at specified run level before CRS will try to start up.

To find out at which run level the clusterware needs to come up:

cat /etc/inittab|grep init.ohasd

h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null

Note: Oracle Linux 6 (OL6) or Red Hat Linux 6 (RHEL6) has deprecated inittab, rather, init.ohasd will be configured via upstart in /etc/init/oracle-ohasd.conf, however, the process ""/etc/init.d/init.ohasd run" should still be up. Oracle Linux 7 (and Red Hat Linux 7) uses systemd to manage start/stop services (example: /etc/systemd/system/oracle-ohasd.service)

Above example shows CRS suppose to run at run level 3 and 5; please note depend on platform, CRS comes up at different run level.

To find out current run level:

who -r

OHASD.BIN will spawn four agents/monitors to start resource:

oraagent: responsible for ora.asm, ora.evmd, ora.gipcd, ora.gpnpd, ora.mdnsd etc

orarootagent: responsible for ora.crsd, ora.ctssd, ora.diskmon, ora.drivers.acfs etc

cssdagent / cssdmonitor: responsible for ora.cssd(for ocssd.bin) and ora.cssdmonitor(for cssdmonitor itself)

If ohasd.bin can not start any of above agents properly, clusterware will not come to healthy

Case Study 2: . OLR is corrupted

In this case, the daemon log will show messages like (this is a case that ora.ctssd fails to start):

2012-07-22 00:15:16.565: [ default][1]clsvactversion:4: Retrieving Active Version from local storage.

2012-07-22 00:15:16.575: [ CTSS][1]clsctss_r_av3: Invalid active version [] retrieved from OLR. Returns [19].

2012-07-22 00:15:16.585: [ CTSS][1](:ctss_init16:): Error [19] retrieving active version. Returns [19].

2012-07-22 00:15:16.585: [ CTSS][1]ctss_main: CTSS init failed [19]

2012-07-22 00:15:16.585: [ CTSS][1]ctss_main: CTSS daemon aborting [19].

2012-07-22 00:15:16.585: [ CTSS][1]CTSS daemon aborting

The solution is to restore a good copy of OLR note 1193643.1

. Enable further tracing

Execute the following steps as root:

B1. List all crsd modules:

# $GRID_HOME/bin/crsctl lsmodules crs

B2. Find out the current trace level for all crsd modules - the output can be used to to revert back to original trace level once the issue is solved:

# $GRID_HOME/bin/crsctl get log crs all

B3. Set trace level to 5 for all modules:

# $GRID_HOME/bin/crsctl set log crs all:5

Alternatively trace level for each modules can be set individually:

# $GRID_HOME/bin/crsctl set log crs CRSPE:5

Note: The module name is case sensitive

B4. Once the issue is solved, set trace level back to original value for all crs modules with output from B2

# $GRID_HOME/bin/crsctl set log crs AGENT:1

Take pstack

When crsd.bin dumps, it will store a thread stack in $GRID_HOME/log/<nodename>/crsd/crsdOUT.log

Most of the time, there may not be a time window to take pstack as crsd.bin aborts very quick; but in case crsd.bin stays up for a short while, take a few pstack at interval of 30 seconds as root:

Find out pid of crsd.bin: ps -ef| grep crsd.bin

Take pstack:

AIX : /bin/procstack <pid-of-crsd.bin>

Linux : /usr/bin/pstack <pid-of-crsd.bin>

Solaris : /usr/bin/pstack <pid-of-crsd.bin>

Sunday, September 4, 2022

Starting Oracle Cluster Health Advisor (CHA) and CRSD manually

It just that we come across situation where we see some of rac process down .Below are 2 of process that can be started manually .

Starting CHA manually

chad stands for the Cluster Health Advisor (CHA) daemon which is is part of the Oracle Autonomous Health Framework(AHF),

It continuously monitors cluster nodes and Oracle RAC databases for performance and availability issues.

Oracle Cluster Health Advisor runs as a highly available cluster resource, ochad, on each node in the cluster. Each Oracle Cluster Health Advisor daemon (ochad) monitors the operating system on the cluster node and optionally, each Oracle Real Application Clusters (Oracle RAC) database instance on the node.

The ochad daemon receives operating system metric data from the Cluster Health Monitor and gets Oracle RAC database instance metrics from a memory-mapped file. The daemon does not require a connection to each database instance. This data, along with the selected model, is used in the Health Prognostics Engine of Oracle Cluster Health Advisor for both the node and each monitored database instance in order to analyze their health multiple times a minute.

It is sometimes found that CHA process is down and we have to start it manually

crsctl stat res -t

crsctl status res ora.chad

crsctl stat res ora.chad -t

srvctl start cha

chactl status

srvctl status cha

Starting Crsd Manually :

$GRID_HOME/bin/crsctl stat res -t -init

$GRID_HOME/bin/crsctl start res ora.crsd -init

If pid file does not exist, $GRID_HOME/log/$HOST/agent/ohasd/orarootagent_root/orarootagent_root.log will have similar like the following:

2010-02-14 17:40:57.927: [ora.crsd][1243486528] [check] PID FILE doesn't exist.

2010-02-14 17:41:57.927: [ clsdmt][1092499776]Creating PID [30269] file for home /ocw/grid host racnode1 bin crs to /ocw/grid/crs/init/

2010-02-14 17:41:57.927: [ clsdmt][1092499776]Error3 -2 writing PID [30269] to the file []

2010-02-14 17:41:57.927: [ clsdmt][1092499776]Failed to record pid for CRSD

2010-02-14 17:41:57.927: [ clsdmt][1092499776]Terminating process

2010-02-14 17:41:57.927: [ default][1092499776] CRSD exiting on stop request from clsdms_thdmai

The solution is to create a dummy pid file ($GRID_HOME/crs/init/$HOST.pid) manually as grid user with "touch" command and restart resource ora.crsd

Reference :

Troubleshooting CRSD Start up Issue (Doc ID 1323698.1)