Saturday, August 12, 2023

Oracle Rac Node failover Testing


     As part of new build we need to  do  node failover testing hence documenting some steps  known to me 

Objective

Action

Expectations

To test accidental change of
 IP addresses of the cluster

Modify etc/hosts and change
theVIP addresses - Try starting thecluster

The cluster must not be able to
bootup with error messages related to network Ips.

To test the failure of ASM instance
 (ASM Instancecrash)

On Node 1, kill the processes
 related to ASM

The cluster must not be affected
by ASM crash on node 1

To test the failover of the network
 card of the public IPaddress.

Plug out one of the cableconnecting
 the node to thepublic network

The ping from the client
 should beable to go thru

To test the stability of the db by killing  a ora server process

kill off a process related to the
oracle daemons from a terminal.This should crash the database processes.

The sql session should not be
affected and failover to the next node without any disruption

To test thecomplete failover of the session in event of a node(e.g. sudden shutdown

1. Connect from the client to the database via sqlplus.

2. power off the unix box inwhich the sqlplus connection isat.

The sql session should not be
 affected and failover to the nextnode without any disruption

To test the failover session of a
connected oracleuser in the event of an instance failureTo test the failoversession of aconnected oracleuser in the event of an instance failure

1. Connect from a client to node1 in the RAC Environment usingsqlplus.

2. Connect to the database onnode 1 and do a shutdown.

sqlplus / as sysdba

The sql connection should be able to
swing over to other instance and stillcontinue the select query

# – sqlplus connection from client todb is still available

To test Crash of Grid  Processes

Kill Lmon process on rac node

They must be restarted
 automatically

To  Test for automatically shifting of
 scan listeners

Reboot Node 1 using  the OS
reboot command

1. When noe 1 goes down all scan
listeners must shift to remaining
node

2. When node 1 comes up  scan listeners must again shift back
to original node

To test storage  crash

Forcibly make scan disk unvisible to
both nodes

The cluster must go down and
should not start since rac does not
safeguard to storage failures

TAF failoer test

to test sql and session   failover 

 

 



Tuesday, August 8, 2023

Oracle Database Performance Hub report using DBMS_PERF



  When we  have performance issues normally we have to login to  oem  to get  performance details . 

  We can get quick snap of  performance using DBMS_PERF.REPORT_PERFHUB    as mentioned below .   Its old tool  introduced in 12c but  not frequently used . 


Command : 

SET  LONG   10000000 ;
 SET LONGC 10000000 ;
 SET LINESIZE 32767 ;
 SET PAGESIZE 0 ;
 SET TRIMSPOOL ON ;
SPOOL perfhub.html
SELECT DBMS_PERF.REPORT_PERFHUB FROM DUAL;
SPOOL OFF;





.Other option used :


SET HEADING OFF LINESIZE 32767 PAGESIZE 0 TRIMSPOOL ON TRIMOUT ON LONG 9999999 VERIFY OFF LONGCHUNKSIZE 100000 FEEDBACK OFF
SET TERMOUT OFF

spool perfhub.html

SELECT DBMS_PERF.REPORT_PERFHUB (
    is_realtime         => 0           -- (  0 = dba_hist, 1 = v$ash  )
  , outer_start_time    => sysdate-1
  , outer_end_time      => sysdate
  , selected_start_time => TIMESTAMP'2023-05-15 20:15:00'
  , selected_end_time   => TIMESTAMP'2023-05-15 20:45:00'
  , type=>'ACTIVE'
)
FROM dual ;


spool off
SET TERMOUT ON HEADING ON PAGESIZE 5000 LINESIZE 999 FEEDBACK ON 



 

Historical SQL monitoring report can be extracted for particular SQL statement using SQL Details report.

Example: To generate SQL details report to get the SQL monitoring information for particular SQL statement in historical mode.

SQL> set pages 0 linesize 32767 trimspool on trim on long 1000000 longchunksize 10000000
SQL> spool sql_details_history.html
SQL> select dbms_perf.report_sql(sql_id=>'9vkyyg1xj6fgc',is_realtime=>0,type=>'active',selected_start_time=>to_date('10-SEP-18 04:00:00','dd-MON-YY hh24:mi:ss'),selected_end_time=>to_date('10-SEP-18 05:00:00','dd-MON-YY hh24:mi:ss')) from dual;
SQL> spool off

Note in 19c or higher it is better to add "outer_start_time" and  "outer_end_time" to get the desired results:

SQL> set pages 0 linesize 32767 trimspool on trim on long 1000000 longchunksize 10000000
SQL> spool sql_details_history.html
SQL> select dbms_perf.report_perfhub( is_realtime=>0, outer_start_time=>to_date('06-SEP-2022 12:00:00','dd-MON-YYYY hh24:mi:ss'), outer_end_time=>to_date('06-SEP-2022 13:00:00','dd-MON-YYYY hh24:mi:ss'), selected_start_time=>to_date('06-SEP-2022 12:00:00','dd-MON-YYYY hh24:mi:ss'), selected_end_time=>to_date('06-SEP-2022 13:00:00','dd-MON-YYYY hh24:mi:ss')) from dual;
SQL> spool off




Generating Database Performance Hub Report :

@?/rdbms/admin/perfhubrpt.sql



Syntax

DBMS_PERF.REPORT_PERFHUB ( is_realtime IN NUMBER DEFAULT NULL, outer_start_time IN DATE DEFAULT NULL, outer_end_time IN DATE DEFAULT NULL, selected_start_time IN DATE DEFAULT NULL, selected_end_time IN DATE DEFAULT NULL, inst_id IN NUMBER DEFAULT NULL, dbid IN NUMBER DEFAULT NULL, monitor_list_detail IN NUMBER DEFAULT NULL, workload_sql_detail IN NUMBER DEFAULT NULL, addm_task_detail IN NUMBER DEFAULT NULL, report_reference IN VARCHAR2 DEFAULT NULL, report_level IN VARCHAR2 DEFAULT NULL, type IN VARCHAR2 DEFAULT 'ACTIVE', base_path IN VARCHAR2 DEFAULT NULL); RETURN CLOB;



 REPORT_PERFHUB Function Parameters

ParameterDescription

is_realtime

If 1, then real-time. If NULL (default) or 0, then historical mode.

outer_start_time

Start time of outer period shown in the time selector. If NULL (default):

  • If is_realtime=0 (historical), then 24 hours before outer_end_time.

  • If is_realtime=1 (realtime mode), then 1 hour before outer_end_time.

outer_end_time

End time of outer period shown in the time selector. If NULL (default), then latest AWR snapshot.

  • If is_realtime=0 (historical), then the latest AWR snapshot

  • If is_realtime=1 (realtime mode), this is the current time (and any input is ignored)

selected_start_time

Start time period of selection. If NULL (default)

  • If is_realtime=0, then 1 hour before selected_end_time

  • If is_realtime=1, then 5 minutes before selected_end_time

selected_end_time

End time period of selection. If NULL (default)

  • If is_realtime=0, then latest AWR snapshot

  • If is_realtime=1, then current time

inst_id

Instance ID to for which to retrieve data

  • If -1, then current instance

  • If number is specified, then for that instance

  • If NULL (default), then all instances

dbid

DBID to query.

  • If NULL, then current DBID.

  • If is_realtime=1, then DBID must be the local DBID.

monitor_list_detail

Top N in SQL monitor list for which to retrieve SQL monitor details.

  • If NULL (default), then retrieves top 10

  • If 0, then retrieves no monitor list details

workload_sql_detail

Top N in Workload Top SQL list to retrieve monitor details,

  • If NULL (default), then retrieves top 10

  • If 0, then retrieves no monitor list details

addm_task_detail

Maximum N latest ADDM tasks to retrieve

  • If NULL (default), retrieves available data but no more than N

  • If 0, then retrieves no ADDM task details

report_reference

Must be NULL when used from SQL*Plus.

report_level

'typical' will get all tabs in performance hub

type

Report type:

  • 'ACTIVE' (default)

  • 'xml' returns XML

base_path

URL path for HTML resources since flex HTML requires access to external files. This is only valid for type='ACTIVE' and is typically not used. Default value will retrieve the required files from OTN.

 


Reference : 

Monitoring Database Performance Using Performance Hub Report (Doc ID 2436566.1)

https://docs.oracle.com/en/database/oracle/oracle-database/21/arpls/DBMS_PERF.html#GUID-85CD8AB9-B6E0-444F-91A5-762EB92A74E9 

How To Get Historical SQL Monitor Report For SQL Statements (Doc ID 2555350.1)

Saturday, July 15, 2023

Force cleanup of temporary segments through DROP_SEGMENT event in Oracle database


We recently had scenario where  temp segments were not getting released  even though  no  active sessions were there  and  "alter tablespace temp  coalesce;"   was not helping  .   

This needs to be done only after oracle support  recommendation .and this  feature  surprisingly exists from Oracle version  8 . 

This routine does what SMON does in the background, i.e. drops temporary segments.

Before force dropping temp segment try to gracefully clean . Sometimes temp can be resized after restart of database 



 select sum(blocks*8/1024) size_mb from v$tempseg_usage where tablespace='TEMP';


Shrink TEMP Tablespace using alter tablespace command
SQL> ALTER TABLESPACE temp SHRINK SPACE KEEP 50M;.


Shrink TEMPFILE using alter tablespace command
SQL> ALTER TABLESPACE temp SHRINK TEMPFILE '/u01/app/oracle/oradata/TEST11G/temp01.dbf' KEEP 40M;


Shrink TEMP Tablespace to the smallest possible size:
SQL> ALTER TABLESPACE temp SHRINK SPACE;



DESCRIPTION : 
Finds all the temporary segments in a tablespace which are not currently locked and drops them.


PARAMETER : Level parameter used in command 

level - tablespace number+1. If the value is 2147483647 then
temp segments in ALL tablespaces are dropped, otherwise, only
segments in a tablespace whose number is equal to the LEVEL
specification are dropped.


Manually release the temporary segment, the utilization rate will not decrease, but the content of the temporary tablespace will be released


alter session set events 'immediate trace name DROP_SEGMENTS level 4'

Description: TS# of temp tablespace is 3, So TS#+1=4

View temp content details

select se.inst_id,se.username,se.sid,su.extents,su.blocks*to_number(rtrim(p.value)) as Space, tablespace,segtype,sql_text from gv$sort_usage su,gv$parameter p,gv $session se,gv$sql s
where p.name='db_block_size' and su.session_addr=se.saddr and s.hash_value=su.sqlhash and s.address=su.sqladdr order by
se.username,se.sid;


If the query finds that most of the temp temporary table space is lob temporary fields, you can set the 60025 event

alter system set event="60025 trace name context forever" scope=spfile;

Restart to take effect




References : 

EVENT: DROP_SEGMENTS - Forcing cleanup of TEMPORARY segments (Doc ID 47400.1)

Note 177334.1 Overview of Temporary Segments 
Note 35513.1 Removing 'Stray' TEMPORARY Segments
Note 61997.1 SMON - Temporary Segment Cleanup and Free Space Coalescing
NOTE:160426.1 - TEMPORARY Tablespaces : Tempfiles or Datafiles ?
Note 102339.1 Temporary Segments: What Happens When a Sort Occurs 
Note 1039341.6 Temporary Segments Are Not Being De-Allocated After a Sort 
Note 68836.1 How To Efficiently Drop (or Truncate) A Table With Many Extents
Note 132913.1 How To Free Temporary Segment in Temporary Tablespace Dynamically
Resize Of Temporary Tablespace with KEEP keyword not working (Doc ID 1663229.1)

Saturday, June 17, 2023

Debugging any Oracle errors and events using ERRORSTACK

 
DBA has seen ORA-00600 trace file generated every time ORA-00600 or ORA-07445 occurs which has each and every detail about this error. 
Generating trace files for ORA-00600 or ORA-07445 is the default behaviour of Oracle, but the Same doesn’t happen in case of other errors.

We are using ERRORSTACK to  capture  details  for all other  oracle errors and events .  By enabling error stack we are force capturing  error  details in alert log and with its trace file details .




ERRORSTACK Parameters:

0 – Error stack only
1 – Error stack and function call stack (if implemented)
2 – As 1 plus the ProcessState
3 – As 2 plus the context area (all cursors and current cursor highlighted)


Enable

ALTER SYSTEM SET EVENT ='<ORAError> trace name errorstack level <LevelNumber>' SCOPE=spfile;

e.g To set errorstack for ORA – 1652 ,Syntax will be
 
alter system set events ‘1652 trace name errorstack level 3’;  -- system level 
alter session set events ‘1652 trace name errorstack level 3’; -- sesion level 


Disable

ALTER SYSTEM SET EVENT='<ORAError> trace name errorstack off';
ALTER SYSTEM SET EVENT='1652 trace name errorstack off';





Display Events

-- SQLPlus> show parameter event
-- oradebug eventdump session;   OR   oradebug eventdump system  
-- DBA_ENABLED_TRACES or WRI$_TRACING_ENABLED can be used to see status also:
SELECT TRACE_TYPE, PRIMARY_ID, QUALIFIER_ID1, WAITS, BINDS FROM DBA_ENABLED_TRACES;




You can also use oradebug to take errorstack for particular process (say ospid 1234) as below

sqlplus “/ as sysdba”

oradebug setospid 1234
oradebug unlimit
oradebug dump errorstack 3





References :

How to Collect Errorstacks for use in Diagnosing Performance Issues. (Doc ID 1364257.1)

Friday, June 9, 2023

Oracle picks more parallel slaves then what is defined in PARALLEL HINT

  

We  had strange scenario where  application team  defined  parallel  10  in   parallel  hint   eg    " SELECT /*+ parallel(10)*/ "   ,   however   we  were seeing  huge parallel slaves sessions  around 100+  . 


This  kind  of  behavior is normal  when parallel_degree_policy is set to true  ,   but we had  parallel_degree_policy=manual  and   still we  were seeing huge parallel slaves  far above what is defined in hint  

parallel_degree_policy=manual
parallel_degree_limit = CPU


On checking it as found that  When we dont add table name in parallel hint , it will enable parallel auto  dop   . We can see  this in  sql plan . 

  Note
-----
- cpu costing is off (consider enabling it)
- automatic DOP: Computed Degree of Parallelism is 124 because of degree limit


If we  dont   want  to enable auto dop when parallel_degree_policy -is set to  manual  , we need to  add parallel hint  on specific tables   eg .  /*+ PARALLEL(employee_table, 35) */





Parallel Statement Queuing  .

Parallel Statement Queuing has been introduced from 12c and is  key feature .  Parallel Statement Queuing is enabled by default  when we enable Parallel  auto dop  by setting parallel_degree_policy to auto .

In case when parallel_degree_policy is set to manual  , we can still enable Parallel Statement Queuing  by 2 ways .  i,e  

1) using parameter  "_parallel_statement_queuing"=true  
     
select a.ksppinm "Parameter",b.ksppstvl "Session Value",c.ksppstvl "Instance Value" from sys.x$ksppi a, sys.x$ksppcv b, sys.x$ksppsv c where a.indx = b.indx and a.indx = c.indx and a.ksppinm in ('_fix_control','_parallel_statement_queuing') order by 1;


2) using  hint  - select /*+PARALLEL(4) STATEMENT_QUEUING*/ .

STATEMENT_QUEUING Hint can be deployed through patch 

begin
sys.dbms_sqldiag_internal.i_create_patch(sql_id=>'<SQL_ID>', hint_text=>'OPT_PARAM(''_parallel_statement_queuing'' ''true'')', name=> 'SQL_Patch_parallel_statement_queuing_sql_id'); >>>>>>>>>>>>>>>REPLACE <SQL_ID> with your sql_id
end;
/




References :

Automatic Degree of Parallelism (AutoDOP) is enabled when "PARALLEL_DEGREE_POLICY=MANUAL and PARALLEL HINT" is Used at SQL Statement (Doc ID 1542556.1)

How to Achieve Parallel Statement Queuing for an SQL When PARALLEL_DEGREE_POLICY=MANUAL (Doc ID 1902069.1)

How To Enable Parallel Query For A Function? ( Doc ID 1093773.1 )

https://blogs.oracle.com/datawarehousing/post/px-server-sets-parallelizers-dfo-trees-parallel-groups-lets-talk-about-terminology

SRDC - Data Collection for Query executing in Parallel returning error (Doc ID 2076943.1)





Other Known Issues related to parallel hint in 19c  :

9c: Duplicate PQ workers created at instance level (Doc ID 2719466.1)

BUG:32342582 - WRONG RESULTS WITH PARALLEL AFTER UPGRADE TO 19.9

Bug 19849585 : PARALLEL HINT IGNORED WHEN USING GROUP BY

Issue with parallel hint on 19c CDB database where the hint is not honored and going for serial scans. Apparently seems to be Oracle bug Bug 30618478 

Number of Preallocated Parallel Workers Are More Than Parallel_min_servers (Doc ID 2654231.1)

 Bug 31195441 : 18C: LONG PARSING TIME WHEN QUERY EXECUTING IN PARALLEL



Views :


col username for a12
col "QC SID" for A6
col "SID" for A6
col "QC/Slave" for A8
col "Req. DOP" for 9999
col "Actual DOP" for 9999
col "Slaveset" for A8
col "Slave INST" for A9
col "QC INST" for A6
set pages 300 lines 300
col wait_event format a30

select
decode(px.qcinst_id,NULL,username,
' - '||lower(substr(pp.SERVER_NAME,
length(pp.SERVER_NAME)-4,4) ) )"Username",
decode(px.qcinst_id,NULL, 'QC', '(Slave)') "QC/Slave" ,
to_char( px.server_set) "SlaveSet",
to_char(s.sid) "SID",
to_char(px.inst_id) "Slave INST",
decode(sw.state,'WAITING', 'WAIT', 'NOT WAIT' ) as STATE,
case sw.state WHEN 'WAITING' THEN substr(sw.event,1,30) ELSE NULL end as wait_event ,
decode(px.qcinst_id, NULL ,to_char(s.sid) ,px.qcsid) "QC SID",
to_char(px.qcinst_id) "QC INST",
px.req_degree "Req. DOP",
px.degree "Actual DOP"
from gv$px_session px,
gv$session s ,
gv$px_process pp,
gv$session_wait sw
where px.sid=s.sid (+)
and px.serial#=s.serial#(+)
and px.inst_id = s.inst_id(+)
and px.sid = pp.sid (+)
and px.serial#=pp.serial#(+)
and sw.sid = s.sid
and sw.inst_id = s.inst_id
order by
decode(px.QCINST_ID, NULL, px.INST_ID, px.QCINST_ID),
px.QCSID,
decode(px.SERVER_GROUP, NULL, 0, px.SERVER_GROUP),
px.SERVER_SET,
px.INST_ID ;

set pages 300 lines 300
col wait_event format a30
select
sw.SID as RCVSID,
decode(pp.server_name,
NULL, 'A QC',
pp.server_name) as RCVR,
sw.inst_id as RCVRINST,
case sw.state WHEN 'WAITING' THEN substr(sw.event,1,30) ELSE NULL end as wait_event ,
decode(bitand(p1, 65535),
65535, 'QC',
'P'||to_char(bitand(p1, 65535),'fm000')) as SNDR,
bitand(p1, 16711680) - 65535 as SNDRINST,
decode(bitand(p1, 65535),
65535, ps.qcsid,
(select
sid
from
gv$px_process
where
server_name = 'P'||to_char(bitand(sw.p1, 65535),'fm000') and
inst_id = bitand(sw.p1, 16711680) - 65535)
) as SNDRSID,
decode(sw.state,'WAITING', 'WAIT', 'NOT WAIT' ) as STATE
from
gv$session_wait sw,
gv$px_process pp,
gv$px_session ps
where
sw.sid = pp.sid (+) and
sw.inst_id = pp.inst_id (+) and
sw.sid = ps.sid (+) and
sw.inst_id = ps.inst_id (+) and
p1text = 'sleeptime/senderid' and
bitand(p1, 268435456) = 268435456
order by
decode(ps.QCINST_ID, NULL, ps.INST_ID, ps.QCINST_ID),
ps.QCSID,
decode(ps.SERVER_GROUP, NULL, 0, ps.SERVER_GROUP),
ps.SERVER_SET,
ps.INST_ID ;



set pages 300 lines 300
col "Username" for a12
col "QC/Slave" for A8
col "Slaveset" for A8
col "Slave INST" for A9
col "QC SID" for A6
col "QC INST" for A6
col "operation_name" for A30
col "target" for A30

select
decode(px.qcinst_id,NULL,username,
' - '||lower(substr(pp.SERVER_NAME,
length(pp.SERVER_NAME)-4,4) ) )"Username",
decode(px.qcinst_id,NULL, 'QC', '(Slave)') "QC/Slave" ,
to_char( px.server_set) "SlaveSet",
to_char(px.inst_id) "Slave INST",
substr(opname,1,30) operation_name,
substr(target,1,30) target,
sofar,
totalwork,
units,
start_time,
timestamp,
decode(px.qcinst_id, NULL ,to_char(s.sid) ,px.qcsid) "QC SID",
to_char(px.qcinst_id) "QC INST"
from gv$px_session px,
gv$px_process pp,
gv$session_longops s
where px.sid=s.sid
and px.serial#=s.serial#
and px.inst_id = s.inst_id
and px.sid = pp.sid (+)
and px.serial#=pp.serial#(+)
order by
decode(px.QCINST_ID, NULL, px.INST_ID, px.QCINST_ID),
px.QCSID,
decode(px.SERVER_GROUP, NULL, 0, px.SERVER_GROUP),
px.SERVER_SET,
px.INST_ID ;

 




select inst_id
, status
, count(1) px_servers#
from gv$px_process
group by inst_id, status
order by inst_id, status ;

SELECT DECODE(px.qcinst_id, NULL, username, ' – ' || lower(SUBSTR(pp.SERVER_NAME, LENGTH(pp.SERVER_NAME)-4,4) ) ) "Username",
 DECODE(px.qcinst_id,NULL, 'QC', '(Slave)') "QC/Slave",
 TO_CHAR( px.server_set) "SlaveSet",
 TO_CHAR(s.sid) "SID",
 TO_CHAR(px.inst_id) "Slave INST",
 DECODE(sw.state,'WAITING', 'WAIT', 'NOT WAIT' ) AS STATE,
  CASE sw.state
    WHEN 'WAITING'
    THEN SUBSTR(sw.event,1,30)
    ELSE NULL
  END AS wait_event,
  DECODE(px.qcinst_id, NULL ,TO_CHAR(s.sid) ,px.qcsid) "QC SID",
  TO_CHAR(px.qcinst_id) "QC INST",
  px.req_degree "Req. DOP",
  px.degree "Actual DOP"
FROM gv$px_session px,
  gv$session s ,
  gv$px_process pp,
  gv$session_wait sw
WHERE px.sid   =s.sid (+)
AND px.serial# =s.serial#(+)
AND px.inst_id = s.inst_id(+)
AND px.sid     = pp.sid (+)
AND px.serial# =pp.serial#(+)
AND sw.sid     = s.sid
AND sw.inst_id = s.inst_id
ORDER BY DECODE(px.QCINST_ID, NULL, px.INST_ID, px.QCINST_ID),
  px.QCSID,
  DECODE(px.SERVER_GROUP, NULL, 0, px.SERVER_GROUP),
  px.SERVER_SET,
  px.INST_ID,
  pp.SERVER_NAME  ;


column name format a50
column value format 999,999
SELECT NAME, VALUE
FROM GV$SYSSTAT
WHERE UPPER (NAME) LIKE '%PARALLEL OPERATIONS%'
  OR UPPER (NAME) LIKE '%PARALLELIZED%'
  OR UPPER (NAME) LIKE '%PX%'
;



Select
decode(px.qcinst_id,NULL,username,
' - '||lower(substr(pp.SERVER_NAME,
length(pp.SERVER_NAME)-4,4) ) )"Username",
decode(px.qcinst_id,NULL, 'QC', '(Slave)') "QC/Slave" ,
to_char( px.server_set) "SlaveSet",
to_char(px.inst_id) "Slave INST",
substr(opname,1,30) operation_name,
substr(target,1,30) target,
sofar,
totalwork,
units,
start_time,
timestamp,
decode(px.qcinst_id, NULL ,to_char(s.sid) ,px.qcsid) "QC SID",
to_char(px.qcinst_id) "QC INST"
from gv$px_session px,
gv$px_process pp,
gv$session_longops s
where px.sid=s.sid
and px.serial#=s.serial#
and px.inst_id = s.inst_id
and px.sid = pp.sid (+)
and px.serial#=pp.serial#(+)
order by decode(px.QCINST_ID, NULL, px.INST_ID, px.QCINST_ID),px.QCSID,
decode(px.SERVER_GROUP, NULL, 0, px.SERVER_GROUP), px.SERVER_SET,px.INST_ID;


select
   ps.qcsid,
   ps.sid,
   p.spid,
   ps.inst_id,
   ps.degree,
   ps.req_degree
from
   gv$px_session ps
   join
   gv$session s
      on ps.sid=s.sid
         and
         ps.inst_id=s.inst_id
   join
   gv$process p
      on p.addr=s.paddr
         and
         p.inst_id=s.inst_id 
order by
   qcsid,
   server_group desc,
   inst_id,
   sid;



Monday, May 15, 2023

Oracle High Undo UNEXPIRED utilization due to Autotune retention causing ORA-01555



This is  very old issue where undo extends puck  up high retention ( more then defined undo_retention) and  undo block remains in unexpired state for long time .  This is because of   _undo_autotune which is  by default set to true . 

 _undo_autotune   will try to override the undo_retention parameter. When _undo_autotune is set to True (default value), based on the size of the undo tablespace Oracle will try to keep the undo segments to higher time than defined in undo_retention parameter.


SQL> select TABLESPACE_NAME,STATUS, SUM(BYTES), COUNT(*) FROM DBA_UNDO_EXTENTS GROUP BY tablespace_name,STATUS order by tablespace_name;

TABLESPACE_NAME STATUS SUM(BYTES) COUNT(*)
------------------------------ --------- ----------------- ----------

UNDOTBS1 EXPIRED 65536 1
UNDOTBS1 UNEXPIRED 10285809664 3271
UNDOTBS2 EXPIRED 142802944 6
UNDOTBS2 UNEXPIRED 4242735104 642




Suggested Solutions :


One of below solution   can  be applied . 

1)    Setting  “_undo_autotune” = false;


2) Setting   _smu_debug_mode=33554432

With this setting, V$UNDOSTAT.TUNED_UNDORETENTION is not calculated based on a percentage of the fixed size undo tablespace. Instead it is set to the maximum of (MAXQUERYLEN secs + 300) and UNDO_RETENTION.

Set   at CDB level and monitor the database



3)  Setting “_HIGHTHRESHOLD_UNDORETENTION”=900

Setting    this parameter will limit high value of undo retention.

alter system set “_HIGHTHRESHOLD_UNDORETENTION”=900 SCOPE=spfile;




Views : 

SELECT property_name, property_value FROM database_properties WHERE property_name = 'LOCAL_UNDO_ENABLED';

select TABLESPACE_NAME,retention from dba_tablespaces where TABLESPACE_NAME like '%UNDO%';

select TABLESPACE_NAME,STATUS, SUM(BYTES), COUNT(*) FROM DBA_UNDO_EXTENTS GROUP BY tablespace_name,STATUS order by tablespace_name;

select max(maxquerylen),max(tuned_undoretention) from DBA_HIST_UNDOSTAT;


SELECT
a.ksppinm "Parameter",
decode(p.isses_modifiable,'FALSE',NULL,NULL,NULL,b.ksppstvl) "Session",
c.ksppstvl "Instance",
decode(p.isses_modifiable,'FALSE','F','TRUE','T') "S",
decode(p.issys_modifiable,'FALSE','F','TRUE','T','IMMEDIATE','I','DEFERRED','D') "I",
decode(p.isdefault,'FALSE','F','TRUE','T') "D",
a.ksppdesc "Description"
FROM x$ksppi a, x$ksppcv b, x$ksppsv c, v$parameter p
WHERE a.indx = b.indx AND a.indx = c.indx
AND p.name(+) = a.ksppinm
AND lower(a.ksppinm) in ('_smu_debug_mode')
ORDER BY a.ksppinm;



  select segment_name, nvl(sum(act),0) "ACT BYTES",
  nvl(sum(unexp),0) "UNEXP BYTES",
  nvl(sum(exp),0) "EXP BYTES"
  from (select segment_name, nvl(sum(bytes),0) act,00 unexp, 00 exp
  from dba_undo_extents where status='ACTIVE' group by segment_name
  union
  select segment_name, 00 act, nvl(sum(bytes),0) unexp, 00 exp
  from dba_undo_extents where status='UNEXPIRED' group by segment_name
  union
  select segment_name, 00 act, 00 unexp, nvl(sum(bytes),0) exp
  from dba_undo_extents where status='EXPIRED' group by segment_name)
  group by segment_name
  order by 1
 /




select segment_name, nvl(sum(act),0) "ACT BYTES",
nvl(sum(unexp),0) "UNEXP BYTES",
nvl(sum(exp),0) "EXP BYTES"
from (select segment_name, nvl(sum(bytes),0) act,00 unexp, 00 exp
from dba_undo_extents where status='ACTIVE' group by segment_name
union
select segment_name, 00 act, nvl(sum(bytes),0) unexp, 00 exp
from dba_undo_extents where status='UNEXPIRED' group by segment_name
union
select segment_name, 00 act, 00 unexp, nvl(sum(bytes),0) exp
from dba_undo_extents where status='EXPIRED' group by segment_name)
group by segment_name
order by 1
/

select tablespace_name,round(sum(case when status = 'UNEXPIRED' then bytes else 0 end) / 1048675,2) unexp_MB ,
round(sum(case when status = 'EXPIRED' then bytes else 0 end) / 1048576,2) exp_MB ,
round(sum(case when status = 'ACTIVE' then bytes else 0 end) / 1048576,2) act_MB
from dba_undo_extents group by tablespace_name
/

SELECT s.sid, s.serial#, s.username, s.program, t.used_ublk, t.used_urec FROM v$session s, v$transaction t WHERE s.taddr = t.addr ORDER BY 5 desc, 6 desc, 1, 2, 3, 4;




References :

ORA-01555 for long running queries if "_undo_autotune"=true (Doc ID 2784427.1)

High Tuned_undoretention though Undo Datafiles are Autoextend on (Doc ID 2582183.1)

Customer Recommended High Undo Utilization On 19c PDB Database due to Automatic TUNED_UNDORETENTION (Doc ID 2710337.1)




Related Issues :

Bug 31113682 - ORA-1555 FOR LONG RUNNING QUERIES IF "_UNDO_AUTOTUNE"=TRUE

Active Undo Segment Keep Growing And Not Releasing/Reusing Space (Doc ID 2343397.1)



Dump Undo Block :

for our  analysis we  can  try to  dump  one of undo block to get more information . 

select SEGMENT_NAME, sum(bytes)/1024/1024/1024 from DBA_UNDO_EXTENTS where status='UNEXPIRED' and segment_name in (
SELECT a.name from
v$rollname a,
v$rollstat b,
dba_rollback_segs c,
v$transaction d,
v$session e
WHERE
a.usn=b.usn AND
a.name=c.segment_name AND
a.usn=d.xidusn AND
d.addr=e.taddr ) group by SEGMENT_NAME order by 2;


alter session set tracefile_identifier='XID';
alter session set max_dump_file_size = unlimited;
alter system dump undo header '_SYSSMU31_4199861047$';

 





Tuesday, May 9, 2023

Analyzing ORA-4031 In Oracle Database 19c

 

Possible Reason :
1) Too many Child Cursors 
2)  In memory  enabled 
3)  Undersized Sga 
4)  Pinned objects  in shared pool 
5)  Memory Fragmentation 
6) Too many hard parsing ,  

 

Possible Solution : 
1) Use  bind variables 
2)   Set enough  Sga 




References : 
1) https://docs.oracle.com/en/database/oracle/oracle-database/19/tgsql/improving-rwp-cursor-sharing.html#GUID-971F4652-3950-4662-82DE-713DDEED317C 

2) Troubleshooting: High Version Count Issues (Doc ID 296377.1)




#####################################
Logs to Collect 
#####################################

1) Tfa 
2) Using customized  sql 
2)  Using  srdc_db_ora4031.sql  retrieved from 2232371.1



Use Tfa 

$TFA_HOME/bin/tfactl diagcollect -srdc ora4031




Use your  customized  sql 

spool /tmp/memory_status.txt <<<<<< Call out the database name

set linesize 200
col VALUE for a50
set pagesize 100
set pagesize 100
col Parameter for a50
col "Session Value" for a50
col "Instance Value" for a50
column component format a25
column Initial format 99,999,999,999
column Final format 99,999,999,999
column Started format A25

select name,value from v$system_parameter where name in ( 'memory_max_target', 'memory_target', 'sga_max_size', 'sga_target', 'shared_pool_size', 'db_cache_size', 'large_pool_size', 'java_pool_size', 'pga_aggregate_target', 'workarea_size_policy', 'streams_pool_size' ) ;

select a.ksppinm "Parameter", b.ksppstvl "Session Value", c.ksppstvl "Instance Value"
from sys.x$ksppi a, sys.x$ksppcv b, sys.x$ksppsv c
where a.indx = b.indx and a.indx = c.indx and a.ksppinm in
('__shared_pool_size','__db_cache_size','__large_pool_size','__java_pool_size','__streams_pool_size','__pga_aggregate_target','__sga_target','memory_target');


select * from v$sgastat where pool like '%shared%' order by bytes;

select NAMESPACE,GETHITRATIO,PINHITRATIO,RELOADS,INVALIDATIONS from v$librarycache;

select HS.BEGIN_INTERVAL_TIME,ss.*
from DBA_HIST_SGASTAT ss ,dba_hist_snapshot hs
where pool=’shared pool’ and name=’free memory’
and SS.SNAP_ID=HS.SNAP_ID
and SS.INSTANCE_NUMBER=HS.INSTANCE_NUMBER
and ss.instance_number=1
–and HS.BEGIN_INTERVAL_TIME between to_date(’17-09-2019 13:00:00′,’dd-mm-yyyy hh24:mi:ss’) and to_date(’17-09-2019 15:30:00′,’dd-mm-yyyy hh24:mi:ss’)
order by ss.snap_id desc;


SELECT COMPONENT ,OPER_TYPE,INITIAL_SIZE "Initial",FINAL_SIZE "Final",to_char(start_time,'dd-mon hh24:mi:ss') Started FROM V$SGA_RESIZE_OPS;

SELECT COMPONENT ,OPER_TYPE,INITIAL_SIZE "Initial",FINAL_SIZE "Final",to_char(start_time,'dd-mon hh24:mi:ss') Started FROM V$MEMORY_RESIZE_OPS;

select * from v$pga_target_advice;

spool off;