Saturday, July 2, 2022

Oracle Checking Hang sessions in Rac Database -- Rac Hang Manager

 
For  Rac  ,   checking  hung session is simplified using  Rac Hung Manager . For  Non  Rac  i personally use v$sess_io  or  try enabling session tracing  


 In 12.1.0.1, hang manager can detect hang between database and asm. 2.Deadlock or Closed Chain

Deadlock or close the chain. The only way to break the deadlock chain is to let some of these sessions complete their work or be terminated. 3.Hang or Open Chain


In the Oracle database, suspend (hang) refers to the waiting state entered by a process due to the inability to obtain the requested resources, which can be lifted only after the requested resources have been obtained, and the HM implements the management of hangs, including the monitoring, analysis, recording and resolution of hang.

The wait chain is made up of blocking processes and waiting processes, while one or more root blocking processes exist in the blocking process, which blocks all other processes, and if the root blocking process is busy with some operations, then perhaps the presence of such a wait chain is normal, if the blocking process is idle, Then perhaps the emergence of this wait chain is not normal, and the way to break the wait chain is to terminate the root blocking process. HM can proactively discover the existence of the waiting chain in the database, and from the perspective of the analysis of them, if found to really affect the performance of the data block hang, depending on the specific circumstances to determine whether to solve the problem, and even if not directly resolved, the corresponding diagnostic information will be recorded and continuous monitoring.



V$hang_info: This view contains details of the hang that was found by HM.
V$hang_session_info: This view contains the session information related to hang.
V$hang_statistics: This view contains statistics related to hang.



The work of HM is composed of seven stages

Phase 1 (Collection Phase): At this stage, the DIA0 process for each instance collects hang analyze information on a regular basis.

Phase 2 (Discovery phase): At this stage, the DIA0 process for each instance analyzes the collected hang Alalyze information, locates the session where hang is present, and sends the DIA0 process to the master node.

Phase 3 (Drawing phase): At this stage, the dia0 process of the master node draws the message from each instance of the DIA0 process, drawing the wait chain.

Phase 4 (Analysis Phase): At this stage, the master node dia0 the process according to the drawn wait chain and analyzes whether hang is indeed present.

Phase 5 (Validation phase): At this stage, the master node dia0 process executes phase 1-4 again, then compares the analysis results of phase 4 with this one, and verifies that hang is really happening.

Phase 6 (Positioning phase): At this stage, the results of the master node dia0 process More validation phase are positioned to the root blocking process of the wait chain.

Phase 7 (resolution Phase): At this stage, the master node dia0 process determines whether hang can be resolved based on the value of the parameter _hang_resoluton_scope.



Trace log files for the DIA0 process

Main trace file (<SID>_DIA0_<PID>.TRC): This log file records the details of the DIA0 process, including the process of discovering, analyzing, and handling the hang.

History Tracker File (<sid>_dia0_<pid>_ N.TRC): Because the trace log file of the DIA0 process constantly generates information as the database runs, it can make the log file very large, and the DIA0 process periodically writes log information to its history log file, where n is a positive integer and increases over time.

Incident Log file: If HM resolves the hang by terminating the process, the ORA-32701 error is first recorded in the Alert.log, and because of the existence of the ADR, the DIA0 process also produces a incident log file that records the details of the problem.






Parameters of HM

_hang_detection_enabled: This parameter determines whether the HM attribute is enabled in the database, and the default value is true.

_hang_detection_interval: This parameter specifies the time interval for which HM collects hang analyze information, and the default value is 32s.

_hang_verification_interval: This parameter specifies the time interval for the HM Validation hang, and the default value is 46s.

_hang_resolution_scope: This parameter specifies the range that HM can operate when the hang is resolved, the default value is process, and the allowable values are as follows:
OFF: The HM will only continue to monitor hang, and will not do anything to fix hang.
Process: Indicates that HM can resolve hang by terminating the root blocking process, but the root blocking process here cannot be an important background process for the database because it causes the instance to crash.
Instance: Indicates that HM can resolve the hang by terminating the instance





Related parameters:

NAME                                               VALUE                          ISDEFAULT ISMOD      ISADJ
-------------------------------------------------- ------------------------------ --------- ---------- -----
_hang_analysis_num_call_stacks                     3                              TRUE      FALSE      FALSE
_hang_base_file_count                              5                              TRUE      FALSE      FALSE
_hang_base_file_space_limit                        10000000                       TRUE      FALSE      FALSE
_hang_bool_spare1                                  TRUE                           TRUE      FALSE      FALSE
_hang_delay_resolution_for_libcache                TRUE                           TRUE      FALSE      FALSE
_hang_detection_enabled                            TRUE                           TRUE      FALSE      FALSE
_hang_detection_interval                           32                             TRUE      FALSE      FALSE
_hang_hang_analyze_output_hang_chains              TRUE                           TRUE      FALSE      FALSE
_hang_hiload_promoted_ignored_hang_count           2                              TRUE      FALSE      FALSE
_hang_hiprior_session_attribute_list                                              TRUE      FALSE      FALSE
_hang_ignored_hang_count                           1                              TRUE      FALSE      FALSE
_hang_ignored_hangs_interval                       300                            TRUE      FALSE      FALSE
_hang_int_spare2                                   FALSE                          TRUE      FALSE      FALSE
_hang_log_verified_hangs_to_alert                  FALSE                          TRUE      FALSE      FALSE
_hang_long_wait_time_threshold                     0                              TRUE      FALSE      FALSE
_hang_lws_file_count                               5                              TRUE      FALSE      FALSE
_hang_lws_file_space_limit                         10000000                       TRUE      FALSE      FALSE
_hang_monitor_archiving_related_hang_interval      300                            TRUE      FALSE      FALSE
_hang_msg_checksum_enabled                         TRUE                           TRUE      FALSE      FALSE
_hang_resolution_allow_archiving_issue_termination TRUE                           TRUE      FALSE      FALSE
_hang_resolution_confidence_promotion              FALSE                          TRUE      FALSE      FALSE
_hang_resolution_global_hang_confidence_promotion  FALSE                          TRUE      FALSE      FALSE
_hang_resolution_policy                            HIGH                           TRUE      FALSE      FALSE
_hang_resolution_promote_process_termination       FALSE                          TRUE      FALSE      FALSE
_hang_resolution_scope                             PROCESS                        TRUE      FALSE      FALSE
_hang_short_stacks_output_enabled                  TRUE                           TRUE      FALSE      FALSE
_hang_signature_list_match_output_frequency        10                             TRUE      FALSE      FALSE
_hang_statistics_collection_interval               15                             TRUE      FALSE      FALSE
_hang_statistics_collection_ma_alpha               30                             TRUE      FALSE      FALSE
_hang_statistics_high_io_percentage_threshold      15                             TRUE      FALSE      FALSE
_hang_verification_interval                        46                             TRUE      FALSE      FALSE


No comments:

Post a Comment