Sunday, September 4, 2022

Starting Oracle Cluster Health Advisor (CHA) and CRSD manually



It just that we come across situation where we see some of rac process down .Below are  2 of process that can be started manually .  



Starting CHA  manually 

chad stands for the Cluster Health Advisor (CHA) daemon which is is part of the Oracle Autonomous Health Framework(AHF), 
It continuously monitors cluster nodes and Oracle RAC databases for performance and availability issues.

Oracle Cluster Health Advisor runs as a highly available cluster resource, ochad, on each node in the cluster. Each Oracle Cluster Health Advisor daemon (ochad) monitors the operating system on the cluster node and optionally, each Oracle Real Application Clusters (Oracle RAC) database instance on the node.

 The ochad daemon receives operating system metric data from the Cluster Health Monitor and gets Oracle RAC database instance metrics from a memory-mapped file. The daemon does not require a connection to each database instance. This data, along with the selected model, is used in the Health Prognostics Engine of Oracle Cluster Health Advisor for both the node and each monitored database instance in order to analyze their health multiple times a minute.
 

It is sometimes found that  CHA process is down and we have to start it manually 

crsctl stat res -t  
crsctl status res ora.chad
crsctl stat res ora.chad -t 
srvctl start cha
chactl status 
srvctl status cha 




Starting Crsd  Manually : 
 

$GRID_HOME/bin/crsctl stat res -t -init
$GRID_HOME/bin/crsctl start res ora.crsd -init



If pid file does not exist, $GRID_HOME/log/$HOST/agent/ohasd/orarootagent_root/orarootagent_root.log will have similar like the following:
 
 
2010-02-14 17:40:57.927: [ora.crsd][1243486528] [check] PID FILE doesn't exist.
..
2010-02-14 17:41:57.927: [  clsdmt][1092499776]Creating PID [30269] file for home /ocw/grid host racnode1 bin crs to /ocw/grid/crs/init/
2010-02-14 17:41:57.927: [  clsdmt][1092499776]Error3 -2 writing PID [30269] to the file []
2010-02-14 17:41:57.927: [  clsdmt][1092499776]Failed to record pid for CRSD
2010-02-14 17:41:57.927: [  clsdmt][1092499776]Terminating process
2010-02-14 17:41:57.927: [ default][1092499776] CRSD exiting on stop request from clsdms_thdmai
 
The solution is to create a dummy pid file ($GRID_HOME/crs/init/$HOST.pid) manually as grid user with "touch" command and restart resource ora.crsd



Reference : 

Troubleshooting CRSD Start up Issue (Doc ID 1323698.1)



Tuesday, August 16, 2022

Awr in Oracle Multitenant/ Pluggable Database Pdb/Cdb - awr_pdb_autoflush_enabled ,AWR_SNAPSHOT_TIME_OFFSET , AWR_PDB_MAX_PARALLEL_SLAVES


Since we were exploring enabling awr snapshots at  pdb level ,  we came across   below 3 parameters  that control  awr   generation at pluggable database level .

AWR_PDB_AUTOFLUSH_ENABLED
Awr_snapshot_time_offset
AWR_PDB_MAX_PARALLEL_SLAVES




AWR_PDB_AUTOFLUSH_ENABLED

1. AWR Snapshots and reports can be created only at the container database (CDB) level in 
   Oracle 12c R1 (12.1.0.1.0 / 12.1.0.2.0)
2. AWR Snapshots and reports can be created at the container database (CDB) level as well as pluggable database (PDB) level 
   in Oracle 12c R2 (12.2.0.1.0)
3. By default, AWR Snapshots and reports can be generated only at the container database (CDB) level


The default value of AWR_PDB_AUTOFLUSH_ENABLED is false. Thus, by default, automatic AWR snapshots are disabled for all the PDBs in a CDB.

When you change the value of AWR_PDB_AUTOFLUSH_ENABLED in the CDB root, the new value takes effect in all the PDBs in the CDB.

You can also change the value of AWR_PDB_AUTOFLUSH_ENABLED in any of the individual PDBs in a CDB, and the value that is set for each individual PDB will be honored. This enables you to enable or disable automatic AWR snapshots for individual PDBs.



for specified PDB

alter session set container=PDBtest;
alter system set awr_pdb_autoflush_enabled=true;



for all pdbs and CDB

alter session set container=CDB$ROOT;
alter system set awr_pdb_autoflush_enabled=true;



set interval and retention period by PDB or CDB$ROOT level. for all PDBSs and CDB we have to run script separately. for 30 days retention and 60 minutes interval script can be below.
 
alter session set container=PDB1;  
execute dbms_workload_repository.modify_snapshot_settings(interval => 60, retention=>64800);  
 


SQL> set lines 100
SQL> select * from cdb_hist_wr_control;

 SQL> SELECT con_id, to_char(end_interval_time, 'HH24:MI:SS') AS snap_time
  2  FROM cdb_hist_snapshot
  3  WHERE end_interval_time > to_timestamp('2020-10-02 11:45','YYYY-MM-DD HH24:MI')
  4  ORDER BY snap_time, con_id;


select * from awr_pdb_snapshot ; 






Awr_snapshot_time_offset

You might have observed those spikes on the top of every hour when all the AWR snapshots are taken and to  avoid thiswe  want snapshots taken on hour bases but within 5 minutes difference for every database


The parameter is specified in seconds. Normally, you set it to a value less than 3600. If you set the special value 1000000 (1,000,000), you get an automatic mode, in which the offset is based on the database name.

The automatic mode is an effective way of getting a reasonable distribution of offset times when you have a very large number of instances running on the same node.



SQL> alter session set container=CDB$ROOT;
Session altered.

SQL> SHOW CON_NAME

CON_NAME
------------------------------
CDB$ROOT



SQL> show parameter AWR_SNAPSHOT_TIME_OFFSET

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
awr_snapshot_time_offset             integer     0

SQL> alter system set AWR_SNAPSHOT_TIME_OFFSET=1000000 scope=both;





AWR_PDB_MAX_PARALLEL_SLAVES

From version 18c onward, with the dynamic initialization parameter AWR_PDB_MAX_PARALLEL_SLAVES, you can specify the maximum number of background processes that the database engine can concurrently use to take automatic PDB-level snapshots. Valid values go from 1 to 30; the default is 10. Even though this initialization parameter doesn’t affect the automatic snapshots in the root container, it can only be set in the root container. The following examples illustrate the behaviour with three PDBs and an interval of 15 minutes:

AWR_PDB_MAX_PARALLEL_SLAVES = 1: only one PDB-level snapshot is taken at the same time as the snapshot in the root container

AWR_PDB_MAX_PARALLEL_SLAVES = 10: 10 PDB-level snapshots are taken at the same time as the snapshot in the root container



Known issue : 

If   _cursor_stats_enabled  is set  , sql statistics wont be generated on PDB level and pdb awr  will have sql details missing in it . 



 

References : 

https://docs.oracle.com/en/database/oracle/oracle-database/12.2/refrn/AWR_PDB_AUTOFLUSH_ENABLED.html#GUID-08FA21BC-8FB1-4C51-BEEA-139C734D17A7

https://it.inf.unideb.hu/oracle/refrn/AWR_SNAPSHOT_TIME_OFFSET.html#GUID-90CD8379-DCB2-4681-BB90-0A32C6029C4E

Sunday, August 7, 2022

Troubleshooting a datapatch issue in Oracle Pluggable database using -verbose -debug

 

We had situation  where  datapatch was not applied to PDB .  Oracle Suggested to apply datapatch using  verbose  mode and  issue  was captured in log 


./dataoatch  -verbose -debug 



We can apply  datapatch on selective PDB using below 

./datapatch -verbose -pdbs PDB1,PDB2



It is always advisable to  check PDB violations after applying datapatch in pluggable environment . 

select name,cause,message from pdb_plug_in_violations;
select name,cause,type,action from pdb_plug_in_violations where status <> 'RESOLVED';



Below are other options 

$ORACLE_HOME/OPatch/datapatch -rollback all -force --pdb pdb1 
$ORACLE_HOME/OPatch/datapatch  -apply 7777 -force - bundle_series DBRU   --pdb pdb1 



Sql Used to verify : 

SET LINESIZE 500
SET PAGESIZE 1000
SET SERVEROUT ON
SET LONG 2000000
COLUMN action_time FORMAT A20
COLUMN action FORMAT A10
COLUMN status FORMAT A10
COLUMN description FORMAT A40
COLUMN source_version FORMAT A10
COLUMN target_version FORMAT A10
alter session set "_exclude_seed_cdb_view"=FALSE;
 select CON_ID,
        TO_CHAR(action_time, 'YYYY-MM-DD') AS action_time,
        PATCH_ID,
        PATCH_TYPE,
        ACTION,
        DESCRIPTION,
        SOURCE_VERSION,
        TARGET_VERSION
   from CDB_REGISTRY_SQLPATCH
  order by CON_ID, action_time, patch_id;


select patch_id, patch_uid, target_version, status, description, action_time
from dba_registry_sqlpatch
where action = 'APPLY';  


select * from OPATCH_XML_INV;




Reference: 

After running datapatch, PDB plugin or cloned db returns violations shown in PDB_PLUG_IN_VIOLATION (Doc ID 1635482.1)

Datapatch User Guide (Doc ID 2680521.1)


Friday, August 5, 2022

Oracle Cloud Exacc Basics Key-Words / Terms of OCI

 

Before we start  to Support  Oci  we need to Understand  Basic Terminology   Of OCI .  


bare metal host

Oracle Cloud Infrastructure provides you control of the physical host (“bare metal”) machine. Bare metal compute instances run directly on bare metal servers without a hypervisor. When you provision a bare metal compute instance, you maintain sole control of the physical CPU, memory, and network interface card (NIC). You can configure and utilize the full capabilities of each physical machine as if it were hardware running in your own data center. You do not share the physical machine with any other tenants.


regions and availability domains

Oracle Cloud Infrastructure is physically hosted in regions and availability domains. A region is a localized geographic area, and an availability domain is one or more data centers located within a region. A region is composed of one or more availability domains. Oracle Cloud Infrastructure resources are either region-specific, such as a virtual cloud network, or availability domain-specific, such as a compute instance.

Availability domains are isolated from each other, fault tolerant, and very unlikely to fail simultaneously or be impacted by the failure of another availability domain. When you configure your cloud services, use multiple availability domains to ensure high availability and to protect against resource failure. Be aware that some resources must be created within the same availability domain, such as an instance and the storage volume attached to it.
For more details see Regions and Availability Domains.


realm

A realm is a logical collection of regions. Realms are isolated from each other and do not share any data. Your tenancy exists in a single realm and has access to the regions that belong to that realm. Oracle Cloud Infrastructure currently offers a realm for commercial regions and two realms for government cloud regions: FedRAMP authorized and IL5 authorized.


Console

The simple and intuitive web-based user interface you can use to access and manage Oracle Cloud Infrastructure.



tenancy

When you sign up for Oracle Cloud Infrastructure, Oracle creates a tenancy for your company, which is a secure and isolated partition within Oracle Cloud Infrastructure where you can create, organize, and administer your cloud resources.


compartments

Compartments allow you to organize and control access to your cloud resources. A compartment is a collection of related resources (such as instances, virtual cloud networks, block volumes) that can be accessed only by certain groups that have been given permission by an administrator. A compartment should be thought of as a logical group and not a physical container. When you begin working with resources in the Console, the compartment acts as a filter for what you are viewing.

When you sign up for Oracle Cloud Infrastructure, Oracle creates your tenancy, which is the root compartment that holds all your cloud resources. You then create additional compartments within the tenancy (root compartment) and corresponding policies to control access to the resources in each compartment. When you create a cloud resource such as an instance, block volume, or cloud network, you must specify to which compartment you want the resource to belong.

Ultimately, the goal is to ensure that each person has access to only the resources they need.



security zones

A security zone is associated with a compartment. When you create and update cloud resources in a security zone, Oracle Cloud Infrastructure validates these operations against security zone policies. If any policy is violated, then the operation is denied. Security zones let you be confident that your resources comply with Oracle security principles.
virtual cloud network (VCN)

A virtual cloud network is a virtual version of a traditional network—including subnets, route tables, and gateways—on which your instances run. A cloud network resides within a single region but includes all the region's availability domains. Each subnet you define in the cloud network can either be in a single availability domain or span all the availability domains in the region (recommended). You need to set up at least one cloud network before you can launch instances. You can configure the cloud network with an optional internet gateway to handle public traffic, and an optional IPSec connection or FastConnect to securely extend your on-premises network.


instance

An instance is a compute host running in the cloud. An Oracle Cloud Infrastructure compute instance allows you to utilize hosted physical hardware, as opposed to the traditional software-based virtual machines, ensuring a high level of security and performance.


image

The image is a template of a virtual hard drive that defines the operating system and other software for an instance, for example, Oracle Linux. When you launch an instance, you can define its characteristics by choosing its image. Oracle provides a set of platform images you can use. You can also save an image from an instance that you have already configured to use as a template to launch more instances with the same software and customizations.


shape

In Compute, the shape specifies the number of CPUs and amount of memory allocated to the instance. Oracle Cloud Infrastructure offers shapes to fit various computing requirements. See the list of compute shapes.

In Load Balancing, the shape determines the load balancer's total pre-provisioned maximum capacity (bandwidth) for ingress plus egress traffic. Available shapes include 100 Mbps, 400 Mbps, and 8000 Mbps.


key pair

A key pair is an authentication mechanism used by Oracle Cloud Infrastructure. A key pair consists of a private key file and a public key file. You upload your public key to Oracle Cloud Infrastructure. You keep the private key securely on your computer. The private key is private to you, like a password.
Key pairs can be generated according to different specifications. Oracle Cloud Infrastructure uses two types of key pairs for specific purposes:

Instance SSH Key pair: This key pair is used to establish secure shell (SSH) connection to an instance. When you provision an instance, you provide the public key, which is saved to the instance's authorized key file. To log on to the instance, you provide your private key, which is verified with the public key.
API signing key pair: This key pair is in PEM format and is used to authenticate you when submitting API requests. Only users who will be accessing Oracle Cloud Infrastructure via the API need this key 
pair.

 



block volume

A block volume is a virtual disk that provides persistent block storage space for Oracle Cloud Infrastructure instances. Use a block volume just as you would a physical hard drive on your computer, for example, to store data and applications. You can detach a volume from one instance and attach it to another instance without loss of data.


Object Storage

Object Storage is a storage architecture that allow you to store and manage data as objects. Data files can be of any type and up to 10 TiB in size. Once you upload data to Object Storage it can be accessed from anywhere. Use Object Storage when you want to store a very large amount of data that does not change very frequently. Some typical use cases for Object Storage include data backup, file sharing, and storing unstructured data like logs and sensor-generated data.


bucket

A bucket is a logical container used by Object Storage for storing your data and files. A bucket can contain an unlimited number of objects.
Oracle Cloud Identifier (OCID)
Every Oracle Cloud Infrastructure resource has an Oracle-assigned unique ID called an Oracle Cloud Identifier (OCID). This ID is included as part of the resource's information in both the Console and API.




References:

https://docs.oracle.com/en-us/iaas/Content/GSG/Concepts/concepts.htm#:~:text=you%20are%20viewing.-,When%20you%20sign%20up%20for%20Oracle%20Cloud%20Infrastructure%2C%20Oracle%20creates,the%20resources%20in%20each%20compartment



Monday, July 25, 2022

Oracle Dataguard Snapshot Standby Testing

 
This is something every dba will know but still documenting for handy steps .



Automated 
*********

To snapshot standby -

alter database recover managed standby database cancel;
alter database convert to snapshot standby;


SQL> select CURRENT_SCN, SWITCHOVER_STATUS, DATABASE_ROLE, open_mode from v$database;

CURRENT_SCN SWITCHOVER_STATUS DATABASE_ROLE   OPEN_MODE
----------- -------------------- ---------------- --------------------
  0 NOT ALLOWED SNAPSHOT STANDBY MOUNTED



To physical standby -

alter database close;

select CURRENT_SCN, SWITCHOVER_STATUS, DATABASE_ROLE, open_mode from v$database;

CURRENT_SCN SWITCHOVER_STATUS DATABASE_ROLE   OPEN_MODE
----------- -------------------- ---------------- --------------------
  0 NOT ALLOWED SNAPSHOT STANDBY MOUNTED

alter database convert to physical standby;

shut immediate
startup mount
alter database recover managed standby database using current logfile disconnect from session;

SQL> select CURRENT_SCN, SWITCHOVER_STATUS, DATABASE_ROLE, open_mode from v$database;

CURRENT_SCN SWITCHOVER_STATUS DATABASE_ROLE   OPEN_MODE
----------- -------------------- ---------------- --------------------
     378629 NOT ALLOWED PHYSICAL STANDBY MOUNTED






MANUAL
******

PRIMARY
=======
-- Archive the current log and defer the log_archive_dest_2
alter system archive log current;
alter system set log_archive_dest_state_2=DEFER;


STANDBY
=======
-- Activating the standby

-- Stop managed recovery, create a guaranteed restore point and activate the standby. Ensure db_recovery_file_dest is set.
alter database recover managed standby database cancel;
alter system set log_archive_dest_state_2=DEFER;
create restore point before_testing guarantee flashback database;
alter database activate physical standby database;
alter database open;
select CURRENT_SCN, SWITCHOVER_STATUS, DATABASE_ROLE, open_mode from v$database;
select CONTROLFILE_TYPE from v$database; 

-- Converting back to standby
startup mount force
flashback database to restore point before_testing;
alter database convert to physical standby;
startup mount force
drop restore point before_testing;
alter database recover managed standby database using current logfile disconnect from session;




Reference -
How To Open Physical Standby For Read Write Testing and Flashback (Doc ID 805438.1)

 




Monday, July 18, 2022

Oracle database final_blocking_session when we have multiple blocking sessions


When  there are multiple blockings in database  its confusing to gind  actual blocking .  Oracle has simplified it by   final_blocking_session .


final_blocking_instance:  This column of gv$session is the instance identifier of the final blocking session. This column is valid only if final_blocking_session_status=valid.

final_blocking_session:  This column of gv$session is the session identifier of the final blocking session. This column is valid only if final_blocking_session_status=valid.



exec dbms_output.put_line('========== Session Details ==========');
set feedback on
set lines 300
set pages 1000
column TERMINAL format a30
column USERNAME format a15
column OSUSER format a20
column MACHINE format a20
column PROGRAM format a30
select inst_id,sid,FINAL_BLOCKING_SESSION ,final_blocking_instance ,serial#,username,osuser,machine,terminal,program,status,to_char(logon_time, 'dd-mon-yy hh24:mi:ss') , event, blocking_session , sql_id 
from gv$session 
where FINAL_BLOCKING_SESSION_STATUS='VALID'   or blocking_session is not NULL
order by LOGON_TIME,machine,program;


set pages 1000
set lines 120
set heading off
column w_proc format a50 tru
column instance format a20 tru
column inst format a28 tru
column wait_event format a50 tru
column p1 format a16 tru
column p2 format a16 tru
column p3 format a15 tru
column Seconds format a50 tru
column sincelw format a50 tru
column blocker_proc format a50 tru
column fblocker_proc format a50 tru
column waiters format a50 tru
column chain_signature format a100 wra
column blocker_chain format a100 wra
SELECT * 
FROM (SELECT 'Current Process: '||osid W_PROC, 'SID '||i.instance_name INSTANCE, 
 'INST #: '||instance INST,'Blocking Process: '||decode(blocker_osid,null,'<none>',blocker_osid)|| 
 ' from Instance '||blocker_instance BLOCKER_PROC,
 'Number of waiters: '||num_waiters waiters,
 'Final Blocking Process: '||decode(p.spid,null,'<none>',
 p.spid)||' from Instance '||s.final_blocking_instance FBLOCKER_PROC, 
 'Program: '||p.program image,
 'Wait Event: ' ||wait_event_text wait_event, 'P1: '||wc.p1 p1, 'P2: '||wc.p2 p2, 'P3: '||wc.p3 p3,
 'Seconds in Wait: '||in_wait_secs Seconds, 'Seconds Since Last Wait: '||time_since_last_wait_secs sincelw,
 'Wait Chain: '||chain_id ||': '||chain_signature chain_signature,'Blocking Wait Chain: '||decode(blocker_chain_id,null,
 '<none>',blocker_chain_id) blocker_chain
FROM v$wait_chains wc,
 gv$session s,
 gv$session bs,
 gv$instance i,
 gv$process p
WHERE wc.instance = i.instance_number (+)
 AND (wc.instance = s.inst_id (+) and wc.sid = s.sid (+)
 and wc.sess_serial# = s.serial# (+))
 AND (s.inst_id = bs.inst_id (+) and s.final_blocking_session = bs.sid (+))
 AND (bs.inst_id = p.inst_id (+) and bs.paddr = p.addr (+))
 AND ( num_waiters > 0
 OR ( blocker_osid IS NOT NULL
 AND in_wait_secs > 10 ) )
ORDER BY chain_id,
 num_waiters DESC)
WHERE ROWNUM < 101;

Wednesday, July 6, 2022

Oracle Database - Query opatch inventory using SQL interface

 
Listing alternate way to check opatch  details , using   sqlplus 




==> Sql to check output similar to  opatch lsinventory 

with a as (select dbms_qopatch.get_opatch_lsinventory patch_output from dual)
  select x.*
    from a,
         xmltable('InventoryInstance/patches/*'
            passing a.patch_output
            columns
               patch_id number path 'patchID',
               patch_uid number path 'uniquePatchID',
               description varchar2(80) path 'patchDescription',
               applied_date varchar2(30) path 'appliedDate',
               sql_patch varchar2(8) path 'sqlPatch',
               rollbackable varchar2(8) path 'rollbackable'
         ) x;



 select bugno, value, optimizer_feature_enable, is_default from v$system_fix_control  ;




==> Sql  to check detailed  output . similar to opatch lsinventory detail

with a as (select dbms_qopatch.get_opatch_bugs patch_output from dual)
  select x.*
    from a,
         xmltable('bugInfo/bugs/*'
            passing a.patch_output
            columns
               bug_id number path '@id',
               description varchar2(160) path 'description'
         ) x;



==> Checking Precise output 

select patch_id, patch_uid, target_version, status, description, action_time
from dba_registry_sqlpatch
where action = 'APPLY';  

Saturday, July 2, 2022

Oracle Checking Hang sessions in Rac Database -- Rac Hang Manager

 
For  Rac  ,   checking  hung session is simplified using  Rac Hung Manager . For  Non  Rac  i personally use v$sess_io  or  try enabling session tracing  


 In 12.1.0.1, hang manager can detect hang between database and asm. 2.Deadlock or Closed Chain

Deadlock or close the chain. The only way to break the deadlock chain is to let some of these sessions complete their work or be terminated. 3.Hang or Open Chain


In the Oracle database, suspend (hang) refers to the waiting state entered by a process due to the inability to obtain the requested resources, which can be lifted only after the requested resources have been obtained, and the HM implements the management of hangs, including the monitoring, analysis, recording and resolution of hang.

The wait chain is made up of blocking processes and waiting processes, while one or more root blocking processes exist in the blocking process, which blocks all other processes, and if the root blocking process is busy with some operations, then perhaps the presence of such a wait chain is normal, if the blocking process is idle, Then perhaps the emergence of this wait chain is not normal, and the way to break the wait chain is to terminate the root blocking process. HM can proactively discover the existence of the waiting chain in the database, and from the perspective of the analysis of them, if found to really affect the performance of the data block hang, depending on the specific circumstances to determine whether to solve the problem, and even if not directly resolved, the corresponding diagnostic information will be recorded and continuous monitoring.



V$hang_info: This view contains details of the hang that was found by HM.
V$hang_session_info: This view contains the session information related to hang.
V$hang_statistics: This view contains statistics related to hang.



The work of HM is composed of seven stages

Phase 1 (Collection Phase): At this stage, the DIA0 process for each instance collects hang analyze information on a regular basis.

Phase 2 (Discovery phase): At this stage, the DIA0 process for each instance analyzes the collected hang Alalyze information, locates the session where hang is present, and sends the DIA0 process to the master node.

Phase 3 (Drawing phase): At this stage, the dia0 process of the master node draws the message from each instance of the DIA0 process, drawing the wait chain.

Phase 4 (Analysis Phase): At this stage, the master node dia0 the process according to the drawn wait chain and analyzes whether hang is indeed present.

Phase 5 (Validation phase): At this stage, the master node dia0 process executes phase 1-4 again, then compares the analysis results of phase 4 with this one, and verifies that hang is really happening.

Phase 6 (Positioning phase): At this stage, the results of the master node dia0 process More validation phase are positioned to the root blocking process of the wait chain.

Phase 7 (resolution Phase): At this stage, the master node dia0 process determines whether hang can be resolved based on the value of the parameter _hang_resoluton_scope.



Trace log files for the DIA0 process

Main trace file (<SID>_DIA0_<PID>.TRC): This log file records the details of the DIA0 process, including the process of discovering, analyzing, and handling the hang.

History Tracker File (<sid>_dia0_<pid>_ N.TRC): Because the trace log file of the DIA0 process constantly generates information as the database runs, it can make the log file very large, and the DIA0 process periodically writes log information to its history log file, where n is a positive integer and increases over time.

Incident Log file: If HM resolves the hang by terminating the process, the ORA-32701 error is first recorded in the Alert.log, and because of the existence of the ADR, the DIA0 process also produces a incident log file that records the details of the problem.






Parameters of HM

_hang_detection_enabled: This parameter determines whether the HM attribute is enabled in the database, and the default value is true.

_hang_detection_interval: This parameter specifies the time interval for which HM collects hang analyze information, and the default value is 32s.

_hang_verification_interval: This parameter specifies the time interval for the HM Validation hang, and the default value is 46s.

_hang_resolution_scope: This parameter specifies the range that HM can operate when the hang is resolved, the default value is process, and the allowable values are as follows:
OFF: The HM will only continue to monitor hang, and will not do anything to fix hang.
Process: Indicates that HM can resolve hang by terminating the root blocking process, but the root blocking process here cannot be an important background process for the database because it causes the instance to crash.
Instance: Indicates that HM can resolve the hang by terminating the instance


We can get complete  list from DBA_HANG_MANAGER_PARAMETERS


Related parameters:

NAME                                               VALUE                          ISDEFAULT ISMOD      ISADJ
-------------------------------------------------- ------------------------------ --------- ---------- -----
_hang_analysis_num_call_stacks                     3                              TRUE      FALSE      FALSE
_hang_base_file_count                              5                              TRUE      FALSE      FALSE
_hang_base_file_space_limit                        10000000                       TRUE      FALSE      FALSE
_hang_bool_spare1                                  TRUE                           TRUE      FALSE      FALSE
_hang_delay_resolution_for_libcache                TRUE                           TRUE      FALSE      FALSE
_hang_detection_enabled                            TRUE                           TRUE      FALSE      FALSE
_hang_detection_interval                           32                             TRUE      FALSE      FALSE
_hang_hang_analyze_output_hang_chains              TRUE                           TRUE      FALSE      FALSE
_hang_hiload_promoted_ignored_hang_count           2                              TRUE      FALSE      FALSE
_hang_hiprior_session_attribute_list                                              TRUE      FALSE      FALSE
_hang_ignored_hang_count                           1                              TRUE      FALSE      FALSE
_hang_ignored_hangs_interval                       300                            TRUE      FALSE      FALSE
_hang_int_spare2                                   FALSE                          TRUE      FALSE      FALSE
_hang_log_verified_hangs_to_alert                  FALSE                          TRUE      FALSE      FALSE
_hang_long_wait_time_threshold                     0                              TRUE      FALSE      FALSE
_hang_lws_file_count                               5                              TRUE      FALSE      FALSE
_hang_lws_file_space_limit                         10000000                       TRUE      FALSE      FALSE
_hang_monitor_archiving_related_hang_interval      300                            TRUE      FALSE      FALSE
_hang_msg_checksum_enabled                         TRUE                           TRUE      FALSE      FALSE
_hang_resolution_allow_archiving_issue_termination TRUE                           TRUE      FALSE      FALSE
_hang_resolution_confidence_promotion              FALSE                          TRUE      FALSE      FALSE
_hang_resolution_global_hang_confidence_promotion  FALSE                          TRUE      FALSE      FALSE
_hang_resolution_policy                            HIGH                           TRUE      FALSE      FALSE
_hang_resolution_promote_process_termination       FALSE                          TRUE      FALSE      FALSE
_hang_resolution_scope                             PROCESS                        TRUE      FALSE      FALSE
_hang_short_stacks_output_enabled                  TRUE                           TRUE      FALSE      FALSE
_hang_signature_list_match_output_frequency        10                             TRUE      FALSE      FALSE
_hang_statistics_collection_interval               15                             TRUE      FALSE      FALSE
_hang_statistics_collection_ma_alpha               30                             TRUE      FALSE      FALSE
_hang_statistics_high_io_percentage_threshold      15                             TRUE      FALSE      FALSE
_hang_verification_interval                        46                             TRUE      FALSE      FALSE


Saturday, June 18, 2022

Oracle Database Parameters influencing Optimizer -- v$sys_optimizer_env , v$sql_optimizer_env and v$ses_optimizer_env

 

We  always  have question  what database  parameters influence optimizer behavior .  
Oracle  has  views  v$sys_optimizer_env ,  v$sql_optimizer_env and v$sess_optimizer_env  that reflects  database parameters  that influence database optimizer behavior 

Ideally  its useful  when comparing 2 database performance and comparing pre and post migration performance . 



Optionally we can  set  Optimizer trace 10053

How to Obtain Tracing of Optimizer Computations (EVENT 10053) (Doc ID 225598.1)



SQL> select name, isdefault from v$ses_optimizer_env
  2  where sid = 265 
  3  order by isdefault, name;

NAME                                     ISD
---------------------------------------- ---
_pga_max_size                            NO
active_instance_count                    YES
bitmap_merge_area_size                   YES
cpu_count                                YES
cursor_sharing                           YES
hash_area_size                           YES
is_recur_flags                           YES
optimizer_capture_sql_plan_baselines     YES
optimizer_dynamic_sampling               YES
optimizer_features_enable                YES
optimizer_index_caching                  YES
optimizer_index_cost_adj                 YES
optimizer_mode                           YES
optimizer_secure_view_merging            YES
optimizer_use_invisible_indexes          YES
optimizer_use_pending_statistics         YES
optimizer_use_sql_plan_baselines         YES
parallel_ddl_mode                        YES
parallel_degree                          YES
parallel_dml_mode                        YES
parallel_execution_enabled               YES
parallel_query_default_dop               YES
parallel_query_mode                      YES
parallel_threads_per_cpu                 YES
pga_aggregate_target                     YES
query_rewrite_enabled                    YES
query_rewrite_integrity                  YES
result_cache_mode                        YES
skip_unusable_indexes                    YES
sort_area_retained_size                  YES
sort_area_size                           YES
star_transformation_enabled              YES
statistics_level                         YES
transaction_isolation_level              YES
workarea_size_policy                     YES






select
        child_number, name, value
from    v$sql_optimizer_env
where
    sql_id = 'b6pkmrqrgxh2d'
order by
        child_number,
        name
;   

                                        CHILD_NUMBER NAME                                     VALUE
                                ------------ ----------------------------------------             -------------------------
             _db_file_optimizer_read_count            16
             active_instance_count                    1
             bitmap_merge_area_size                   1048576
             cpu_count                                2
             cursor_sharing                           exact
             hash_area_size                           131072
             optimizer_dynamic_sampling               2
             optimizer_features_enable                10.2.0.3
             optimizer_index_caching                  0
             optimizer_index_cost_adj                 100
             optimizer_mode                           first_rows_1
             optimizer_secure_view_merging            true
             parallel_ddl_mode                        enabled
             parallel_dml_mode                        disabled
             parallel_execution_enabled               true
             parallel_query_mode                      enabled
             parallel_threads_per_cpu                 2
             pga_aggregate_target                     204800 KB
             query_rewrite_enabled                    true
             query_rewrite_integrity                  enforced
             skip_unusable_indexes                    true
             sort_area_retained_size                  0
             sort_area_size                           65536
             sqlstat_enabled                          true
             star_transformation_enabled              false
             statistics_level                         typical
             workarea_size_policy                     auto 
             _db_file_optimizer_read_count            16
             _hash_join_enabled                       false
             active_instance_count                    1
             bitmap_merge_area_size                   1048576
             cpu_count                                2
             cursor_sharing                           exact
             hash_area_size                           131072
             optimizer_dynamic_sampling               2
             optimizer_features_enable                10.2.0.3
             optimizer_index_caching                  0
             optimizer_index_cost_adj                 100
             optimizer_mode                           first_rows_1
             optimizer_secure_view_merging            true
             parallel_ddl_mode                        enabled
             parallel_dml_mode                        disabled
             parallel_execution_enabled               true
             parallel_query_mode                      enabled
             parallel_threads_per_cpu                 2
             pga_aggregate_target                     204800 KB
             query_rewrite_enabled                    true
             query_rewrite_integrity                  enforced
             skip_unusable_indexes                    true
             sort_area_retained_size                  0
             sort_area_size                           65536
             sqlstat_enabled                          true
             star_transformation_enabled              false
             statistics_level                         typical
             workarea_size_policy                     auto 

Tuesday, June 14, 2022

Checking Ongoing Oracle rman backup and restore progress

 

During Restore we normally  have  to see progress for  big database  .   Posting  script below used  by me  to check progress . 




SELECT sid, serial#, context, sofar, totalwork,
 round(sofar/totalwork*100,2) "% Complete"
 FROM v$session_longops
 WHERE opname LIKE 'RMAN%'
 AND opname NOT LIKE '%aggregate%'
 AND totalwork != 0
 AND sofar != totalwork;



select device_type "Device", type, filename, to_char(open_time, 'mm/dd/yyyy hh24:mi:ss') open,
 to_char(close_time,'mm/dd/yyyy hh24:mi:ss') close,elapsed_time ET, effective_bytes_per_second EPS
 from v$backup_async_io;



TTITLE LEFT '% Completed. Aggregate is the overall progress:'
SET LINE 132
SELECT opname, round(sofar/totalwork*100) "% Complete"
  FROM gv$session_longops
 WHERE opname LIKE 'RMAN%'
   AND totalwork != 0
   AND sofar <> totalwork
 ORDER BY 1;



TTITLE LEFT 'Channels waiting:'
COL client_info FORMAT A15 TRUNC
COL event FORMAT A20 TRUNC
COL state FORMAT A7
COL wait FORMAT 999.90 HEAD "Min waiting"
SELECT s.sid, p.spid, s.client_info, status, event, state, seconds_in_wait/60 wait
  FROM gv$process p, gv$session s
 WHERE p.addr = s.paddr
   AND client_info LIKE 'rman%';
TTITLE LEFT 'Files currently being written to:'
COL filename FORMAT a50
SELECT filename, bytes, io_count
  FROM v$backup_async_io
 WHERE status='IN PROGRESS'
/
TTITLE OFF
SET HEAD OFF


SELECT 'Throughput: '||
       ROUND(SUM(v.value/1024/1024),1) || ' Meg so far @ ' ||
       ROUND(SUM(v.value     /1024/1024)/NVL((SELECT MIN(elapsed_seconds)
            FROM v$session_longops
            WHERE opname          LIKE 'RMAN: aggregate input'
              AND sofar           != TOTALWORK
              AND elapsed_seconds IS NOT NULL
       ),SUM(v.value     /1024/1024)),2) || ' Meg/sec'
 FROM gv$sesstat v, v$statname n, gv$session s
WHERE v.statistic# = n.statistic#
  AND n.name       = 'physical write total bytes'
  AND v.sid        = s.sid
  AND v.inst_id    = s.inst_id
  AND s.program LIKE 'rman@%'
GROUP BY n.name
/
SET HEAD ON





##### To check Restore Speed 

TTITLE OFF
SET HEAD OFF
SELECT 'Throughput: '||
       ROUND(SUM(v.value/1024/1024/1024),1) || ' Gig so far @ ' ||
       ROUND(SUM(v.value     /1024/1024)/NVL((SELECT MIN(elapsed_seconds)
            FROM v$session_longops
            WHERE opname          LIKE 'RMAN: aggregate input'
              AND sofar           != TOTALWORK
              AND elapsed_seconds IS NOT NULL
       ),SUM(v.value     /1024/1024)),2) || ' Meg/sec'
 FROM gv$sesstat v, v$statname n, gv$session s
WHERE v.statistic# = n.statistic#
  AND n.name       = 'physical write total bytes'
  AND v.sid        = s.sid
  AND v.inst_id    = s.inst_id
  AND s.program LIKE 'rman@%'
GROUP BY n.name
/



Tracing Rman :

For analyzing any issues we can  debug rman 

$ rman target / catalog rman/rman debug trace trace.log







Speeding Up Rman :


RMAN Multiplexing

RMAN uses two different types of buffers for I/O: disk and tape.RMAN multiplexing determineshow RMAN allocates disk buffers.

RMAN multiplexing is the number of files in a backup readsimultaneously and then written to the same backup piece. The degree of multiplexing depends onthe

FILESPERSET parameter of the BACKUP command as well as the MAXOPENFILES  parameterof the CONFIGURECHANNEL command or ALLOCATECHANNELcommand. 
Note: RMANmultiplexing is set at the channel level




Allocating Tape Buffers

•From SGA (large pool) with  BACKUP_TAPE_IO_SLAVES is  TRUE
.•From PGA with BACKUP_TAPE_IO_SLAVES is  FALSE


RMAN allocates the tape buffers in the System Global Area (SGA) or the Program Global Area(PGA), depending on whether I/O slaves are used. 

If the  BACKUP_TAPE_IO_SLAVES initialization parameter is set to TRUE , RMAN allocates tape buffers from the shared pool or the large pool if the LARGE_POOL_SIZE
initialization parameter is set.

If you set the parameter to FALSE  , RMAN allocates the buffers from the PGA.

If you use I/O slaves, set the LARGE_POOL_SIZE initialization parameter to set aside SGA memory that is dedicated to holding these large memory allocations. By doing this, the RMAN I/O buffers do not compete with thelibrary cache for shared pool memory.


Oracle recommends that you set the  BACKUP_TAPE_IO_SLAVES initialization parameter to  TRUE

In most circumstances, this will provide the best performance of backups to tape. Also, thissetting is required in order to perform duplexed backups. Duplexed backups are covered in the lessontitled “Using RMAN to Create Backups.”




Comparing Synchronous and Asynchronous I/O

When RMAN reads or writes data, the I/O is either synchronous orasynchronous. When the I/O is synchronous, a server process can perform only one task at a time. When it is asynchronous, a server process can begin an I/O and then perform other tasks while waiting for the I/O to complete. It can also begin multiple I/O operations before waiting for the first to complete.You can set initialization parameters that determine the type of I/O.

If you set BACKUP_TAPE_IO_SLAVES  to  TRUE  , the tape I/O is asynchronous. Otherwise, the I/O is synchronous




Tuning the  BACKUP  Command

The  MAXPIECESIZE  parameter specifies the maximum size of each backup piece created on thechannel.

The  FILESPERSET  parameter specifies the maximum number of files to place in a backup set. If you allocate only one channel, then you can use this parameter to make RMAN create multiple backup sets. For example, if you have 50 input data files and two channels, you can set FILESPERSET=5 to create 10 backup sets. This strategy can prevent you from splitting a backupset among multiple tapes.

The MAXOPENFILES   parameter setting depends on your disk subsystem characteristics. If you useASM, then set it to 1 or 2. Otherwise, if your data is not striped, then you may want to set this higher.To gain performance, increase either the number of files per backup set, or this parameter. If you arenot using ASM or striping of any kind, then try increasing MAXOPENFILES
.



Channel Tuning 

If you configure multiple channels for an SBT device, then you can specifically spread data filesacross those channels. 

Here is an example:

RUN
{ ALLOCATE CHANNEL c1 DEVICE TYPE sbt; 
ALLOCATE CHANNEL c2 DEVICE TYPE sbt; 
ALLOCATE CHANNEL c3 DEVICE TYPE sbt;
BACKUP (DATAFILE 1,2,5 CHANNEL c1) (DATAFILE 4,6 CHANNEL c2) (DATAFILE 3,7,8 CHANNEL c3);BACKUP DATABASE NOT BACKED UP;}




Monitor rman restore throughput:

set linesize 126
column Pct_Complete format 99.99
column client_info format a25
column sid format 999
column MB_PER_S format 999.99
select s.client_info,
l.sid,
l.serial#,
l.sofar,
l.totalwork,
round (l.sofar / l.totalwork*100,2) "Pct_Complete",
aio.MB_PER_S,
aio.LONG_WAIT_PCT
from v$session_longops l,
v$session s,
(select sid,
serial,
100* sum (long_waits) / sum (io_count) as "LONG_WAIT_PCT",
sum (effective_bytes_per_second)/1024/1024 as "MB_PER_S"
from v$backup_async_io
group by sid, serial) aio
where aio.sid = s.sid
and aio.serial = s.serial#
and l.opname like 'RMAN%'
and l.opname not like '%aggregate%'
and l.totalwork != 0
and l.sofar <> l.totalwork
and s.sid = l.sid
and s.serial# = l.serial#
order by 1;




########## Shell  Script to Check   backup  or  Restore Status ############# 

After setting db  environment   run below shell 




#!/bin/bash
#Set the environment
_env(){
RESTORE_PROGRESS=/tmp/restore_progress.log
touch dfsize
export SIZELOG=dfsize
#RMAN restore progress monitor in percentage from database
_restore_pct(){
while sleep 0.5;do
date_is=$(date "+%F-%H-%M-%S")
#ela_s=$(date +%s)
#echo "============================================================"+
#echo "         ----->$ORACLE_SID<-----                                |"|tr 'a-z' 'A-Z';echo "    Restore progress ($date_is)                  |"
#echo "============================================================"+
$ORACLE_HOME/bin/sqlplus -S "/ as sysdba" << EOF > dbsz.txt
set feedback off
set lines 200
set pages 1000
set termout off
col INPUT_BYTES/1024/1024 format 9999999
col OUTPUT_BYTES/1024/1024 format 9999999
col OBJECT_TYPE format a10
set serveroutput off
spool dbsz.out
variable s_num number;
BEGIN
  select sum((datafile_blocks)*8/1024) into :s_num from v\$BACKUP_DATAFILE;
  dbms_output.put_line(:s_num);
END;
/
set feedback on
select INPUT_BYTES/1024/1024 as inp_byte,OUTPUT_BYTES/1024/1024 as out_byte,OBJECT_TYPE,100*(MBYTES_PROCESSED/:s_num) as pctdone from v\$rman_status where status like '%RUNNING%';
spool off
EOF
 
#Realtime monitoring of RMAN restore which shows date and percentage of completion
pct="$(cat dbsz.txt|grep -v 'row'|tail -3|grep -v '^$'|awk '{print $5}')"
clear;
echo "$date_is|Current restore progress for $ORACLE_SID:[$pct]"
#cat $SIZELOG|grep -v 'PL'
#cat /dev/null > $SIZELOG
done
}
#ela_e=$(date +%s)
#echo "elapsed_time: $($ela_e - $ela_s)
_env
_restore_pct