Saturday, July 10, 2021

Oracle HangAnalyze and SystemState Dump

 We usually  come situation where we need to gather system state and Hanganalyze dump for performance analyze and to upload  to Oracle support .


Let see some insight .  Below is  how i personally capture hanganalyze and system state  dump . Will explain  in detail for each component in trail blog . 


Collection commands for Hanganalyze and Systemstate: RAC with fixes for bug 11800959 and bug 11827088

For 11g:
Sqlplus '/as sysdba'
Oradebug setorapname reco
Oradebug unlimit
Oradebug-g all hanganalyze 3
Oradebug-g all dump systemstate 267
oradebug tracefile_name
…….. Wait at least 1 min
Oradebug -g all hanganalyze 3
Oradebug-g all dump systemstate 267
oradebug tracefile_name
Exit


Collection commands for Hanganalyze and Systemstate: RAC without fixes for Bug 11800959 and Bug 11827088

Sqlplus '/as sysdba'
Oradebug setorapname reco
Oradebug unlimit
xOradebug-g all hanganalyze 3
Oradebug-g all dump systemstate 258
…….. Wait at least 1 min
Oradebug-g all hanganalyze 3
Oradebug-g all dump systemstate 258
Exit


For 10g, run oradebug setmypid instead of oradebug setorapname reco:
Sqlplus '/as sysdba'
Oradebug setmypid
Oradebug unlimit
Oradebug-g all hanganalyze 3
Oradebug-g all dump systemstate 258
…….. Wait at least 1 min
Oradebug-g all hanganalyze 3
Oradebug-g all dump systemstate 258
Exit

In RAC environment, a dump will be created for all RAC instances in the DIAG trace file for each instance.



##############################################
##############################################

Oracle Preliminary connection
________________________

When database i hung  at times we cannot  login to database . In  Such situation we need to  use preliminary connection to  connect to datbase .

sqlplus -prelim / as sysdba

sqlplus /nolog
set _prelim on
connect / as sysdba




##############################################
##############################################

System state Dump 

A system state dump contains the process state for every process.

Every state object for every process is dumped.

A state object represents the state of a database resource including:

processes
sessions
enqueues (locks)
buffers
State objects are held in the SGA

A system state dump does not represent a snapshot of the instance because the database is not frozen for the duration of the dump. The start point of the dump will be earlier than the end point.

Oracle recommends the use of system state dumps to diagnose:
hanging databases
slow databases
database errors
waiting processes
blocking processes
resource contention


Logon to sqlplus as sysdba
SQL> oradebug setmypid
SQL> oradebug unlimit
SQL> oradebug dump systemstate 266
…….. Wait at least 1 min
SQL> oradebug dump systemstate 266
…….. Wait at lease 1 min
SQL> oradebug dump systemstate 266
SQL> oradebug tracefile_name




With releases 11g & RAC and above You should attach to the DIAG process and change its file size limit.
sqlplus "/ as sysdba"
select SPID from v$process where program like '%DIAG%';
oradebug setospid <OS_PID> -- pid of the diag process
oradebug unlimit
oradebug dump systemstate 266
oradebug tracefile_name
exit



Collection commands for Hanganalyze and Systemstate: RAC
There are 2 bugs affecting RAC that without the relevant patches being applied on your system, make using level 266 or 267 very costly. Therefore without these fixes in place it highly unadvisable to use these level
For information on these patches see:
Document 11800959.8 Bug 11800959-a systemstate dump with level> = 10 in RAC dumps huge busy global cache elements-can hang/crash instances
Document 11827088.8 Bug 11827088-Latch 'gc element' contention, LMHB terminates the instance



Systemstate dump has multiple levels:
2: dump (excluding lock element)
10: dump
11: dump + global cache of RAC
256: short stack (function stack)
258: 256 + 2 --> short stack + dump (excluding lock element)
266: 256 + 10 --> short stack + dump
267: 256 + 11 --> short stack + dump + global cache of RAC


Level 11 and 267 will dump the global cache and generate a large trace file, which is generally not recommended. In general, if the process is not too many, we recommend that you use 266 because it can dump the function stack of the process and analyze what operations the process is performing. However, it takes more than 30 minutes to generate a short stack. If there are many processes, such as 2000. In this case, level 10 or level 258 can be generated. level 258 will collect more short stacks than level 10, but some lock element data will be collected less than level 10.
Although process-related information is collected through system state dump, how to effectively interpret relevant information and diagnose and analyze problems is a great challenge!

Reading and Understanding Systemstate Dumps (Doc ID 423153.1) 
How to Collect Systemstate Dumps When you Cannot Connect to Oracle (Doc ID 121779.1)



##############################################
##############################################

Hanganalyze 

Oracle notes that HANGANALYZE run at levels higher that 3 may generate a huge number of trace files for large systems. Do not use levels higher than 3 without discussing their effects with Oracle Technical Support.

Database HANG live is a headache. How to find the reason for HANG live is a problem that DBA must face. When the database HANG lives, most DBAs are often analyzed through the V $ SESSION_WAIT view. In fact, Oracle has a very effective tool-hanganalyze. HANGANALYZE can tell the DBA the information about HANG very clearly, which is convenient for further analysis.

From 19c Oracle has come up with  emergency monitoring report to replace Hanganalyze . 


Below are level we can enable hanganalyze 

     10     Dump all processes (IGN state)
     5      Level 4 + Dump all processes involved in wait chains (NLEAF state)
     4      Level 3 + Dump leaf nodes (blockers) in wait chains (LEAF,LEAF_NW,IGN_DMP state)
     3      Level 2 + Dump only processes thought to be in a hang (IN_HANG state)
   1-2    Only HANGANALYZE output, no process dump at all


Single Instace : 
_________________________
Logon to sqlplus as sysdba
oradebug setmypid;
oradebug unlimit;
oradebug hanganalyze 3;
oradebug tracefile_name


Rac : 
_________________________
Logon to sqlplus as sysdba
oradebug setmypid;
oradebug unlimit;
oradebug setinst all
oradebug -g def hanganalyze 3
oradebug tracefile_name



##############################################
##############################################


Tracing for a Specific ORA-nnnn Error


alter system set events '4021 errorstack(3)  systemstate_global(258)  hanganalyze_global(3) ' ; 
a;ter system set  events '4021 trace name all off'; 




##############################################
##############################################

 v$wait_chains.



From 11gR2 onwards, oracle has provided a dynamic performance view called v$wait_chains. This also contain same information which we gather by running the hanganalyze command. So instead of using hanganalyze, you can use the below query to find wait chains
This is done using dia0 background processes starts collecting hanganalyze information and stores this in memory in the “hang analysis cache”. It does this every 3 seconds for local hanganalyze information and every 10 seconds for global (RAC) hanganalyze information


There is no gv$ equivalent as v$wait_chains would report on multiple instances in a multi-instance (RAC) environment
Some queries for this view
SQL> SELECT chain_id, num_waiters, in_wait_secs, osid, blocker_osid, substr(wait_event_text,1,30)
 FROM v$wait_chains; 



SELECT decode( a.blocker_sid , NULL , '<chain id#' ||a.chain_id||'>' ) chain_id,
RPAD( '+' , LEVEL , '-' ) ||a.sid sid,
RPAD( ' ' , LEVEL , ' ' ) ||a.wait_event_text wait_event
FROM V$WAIT_CHAINS a
CONNECT BY PRIOR a.sid=a.blocker_sid
AND PRIOR a.sess_serial#=a.blocker_sess_serial#
AND PRIOR a.instance = a.blocker_instance START WITH a.blocker_is_valid='FALSE'
ORDER BY a.chain_id ,
LEVEL
/



Query for Top 100 wait chain processes
set pages 1000
set lines 120
set heading off
column w_proc format a50 tru
column instance format a20 tru
column inst format a28 tru
column wait_event format a50 tru
column p1 format a16 tru
column p2 format a16 tru
column p3 format a15 tru
column Seconds format a50 tru
column sincelw format a50 tru
column blocker_proc format a50 tru
column fblocker_proc format a50 tru
column waiters format a50 tru
column chain_signature format a100 wra
column blocker_chain format a100 wra
SELECT * 
FROM (SELECT 'Current Process: '||osid W_PROC, 'SID '||i.instance_name INSTANCE, 
 'INST #: '||instance INST,'Blocking Process: '||decode(blocker_osid,null,'',blocker_osid)|| 
 ' from Instance '||blocker_instance BLOCKER_PROC,
 'Number of waiters: '||num_waiters waiters,
 'Final Blocking Process: '||decode(p.spid,null,'',
 p.spid)||' from Instance '||s.final_blocking_instance FBLOCKER_PROC, 
 'Program: '||p.program image,
 'Wait Event: ' ||wait_event_text wait_event, 'P1: '||wc.p1 p1, 'P2: '||wc.p2 p2, 'P3: '||wc.p3 p3,
 'Seconds in Wait: '||in_wait_secs Seconds, 'Seconds Since Last Wait: '||time_since_last_wait_secs sincelw,
 'Wait Chain: '||chain_id ||': '||chain_signature chain_signature,'Blocking Wait Chain: '||decode(blocker_chain_id,null,
 '',blocker_chain_id) blocker_chain
FROM v$wait_chains wc,
 gv$session s,
 gv$session bs,
 gv$instance i,
 gv$process p
WHERE wc.instance = i.instance_number (+)
 AND (wc.instance = s.inst_id (+) and wc.sid = s.sid (+)
 and wc.sess_serial# = s.serial# (+))
 AND (s.final_blocking_instance = bs.inst_id (+) and s.final_blocking_session = bs.sid (+))
 AND (bs.inst_id = p.inst_id (+) and bs.paddr = p.addr (+))
 AND ( num_waiters > 0
 OR ( blocker_osid IS NOT NULL
 AND in_wait_secs > 10 ) )
ORDER BY chain_id,
 num_waiters DESC)
WHERE ROWNUM < 101;



##############################################
##############################################

Reference :

Reading and Understanding Systemstate Dumps (Doc ID 423153.1)
How to Collect Systemstate Dumps When you Cannot Connect to Oracle (Doc ID 121779.1)
How to Collect Diagnostics for Database Hanging Issues (Doc ID 452358.1)

No comments:

Post a Comment