Monday, May 20, 2024

Oracle Handling Distributed, Pending , In-Doubt and Remote Transaction

 

A distributed transaction modifies data related to two or more databases, it contains DML statements than span many nodes. For a distributed transaction to be succesful all or none of the database nodes involved in the transaction need to commit or rollback the whole transaction.

Note the difference between a distributed and a remote transaction; a remote transaction contains one or more DML statements that are executed on the SAME remote node,


Manual Resolution of In-Doubt Transactions
–The in-doubt transaction has locks on critical data or undo segments.
–The cause of the system, network, or software failure cannot be repaired quickly.




Commit or Rollback Pending Transaction -- UNSTUCK TRANSACTION : 


SQL> select local_tran_id,global_tran_id, state,mixed, commit# from dba_2pc_pending;

If the state of the transaction is “prepared” and there is no inconsistency, the transaction can be forced to rollback, or maybe if the underlying problem which caused the in-doubt transaction is resolved the transaction can be forced to commit as follows:

SQL> ROLLBACK FORCE  '97.33.166765' /* ->replace with ur own trx_id */
or
SQL> COMMIT FORCE  '97.33.166765' /* ->replace with ur own trx_id */


If the state of the transaction is “collecting” and you execute the above command, you may see an error like:

ERROR at line 1:
ORA-02058: no prepared transaction found with ID 97.33.166765




 DBA_2PC_PENDING view have entries about our transaction but there is no transaction in reality

If the state of the transaction (in DBA_2PC_PENDING) is committed, rollback forced or commit forced then it can be cleaned by:

SQL> EXECUTE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('97.33.166765'); /* ->replace with ur own trx_id */





HANDLING STUCK TRANSACTIONS

Our ultimate goal is not seeing the transaction in X$KTUXE table; and ensuring that the dictionary tables like PENDING_TRANS$ to be consistent with this information.

Stuck transactions can be examined under the below conditions:



2.1. Condition 1: DBA_2PC_PENDING view have entries about our transaction but there is no transaction in reality

The condition is that; when we issue select to the dictionary views like the DBA_2PC_PENDING, PENDING_TRANS$, etc. we see the transaction, but the transaction does not exist in X$KTUXE view.

The actual transaction entry view is X$KTUXE ([K]ernel [T]ransaction [U]ndo Transa[X]tion [E]ntry) where the columns correspond to the following sections of the transaction id:

KTUXEUSN.KTUXESLT.KTUXESQN = 96.22.163456   (The concat of KTUXEUSN, KTUXESLT and KTUXESQN gives us the transacion number)
KTUXEUSN=96
KTUXESLT=22
KTUXESQN=163456

Therefore, the condition1 holds when DBA_2PC_PENDING has the entry but X$KTUXE does not.

SQL> SELECT LOCAL_TRAN_ID, GLOBAL_TRAN_ID, STATE, MIXED, COMMIT# FROM DBA_2PC_PENDING;

    -- Returns: 96.22.163456 TESTDB.723a8559.96.22.163456    prepared    no    56759427464

SQL> SELECT * FROM X$KTUXE WHERE KTUXEUSN=96 AND KTUXESLT=22 AND KTUXESQN =163456;
  
  -- Returns: No Rows

Solution 1 to Condition 1: If the state of the transaction (in DBA_2PC_PENDING) is committed, rollback forced or commit forced then it can be cleaned by:

SQL> EXECUTE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('96.22.163456'); 
Solution 2 to Condition 1: If the state of the transaction is prepared, we have to clean manually as follows:

SQL> DELETE FROM SYS.PENDING_TRANS$ WHERE LOCAL_TRAN_ID = '96.22.163456'; 
SQL> DELETE FROM SYS.PENDING_SESSIONS$ WHERE LOCAL_TRAN_ID ='96.22.163456' ; 
SQL> DELETE FROM SYS.PENDING_SUB_SESSIONS$ WHERE LOCAL_TRAN_ID = '96.22.163456';
SQL> COMMIT;




2.2. Condition 2: DBA_2PC_PENDING view does NOT have entries about our transaction but there IS A transaction.

This is something like a orphan transaction that the dictionary is not aware of.
Trying to force commit or rollback this transaction may result in error like below, since the dictionary is not aware:

SQL> ROLLBACK FORCE '96.22.163456'

-- ORA-02058: no prepared transaction found with ID 96.22.163456
Solution to Condition 2: What we need to do at this point is; recovering our transaction from being an orphan by inserting some dummy records into dictionay tables (so the views…) and then force a rollback or commit: You do not have to change the parameters in the insert command other than the transaction id.

SQL> ALTER SYSTEM DISABLE DISTRIBUTED RECOVERY;
    
SQL> INSERT INTO PENDING_TRANS$ (LOCAL_TRAN_ID, GLOBAL_TRAN_FMT, GLOBAL_ORACLE_ID, STATE, STATUS, SESSION_VECTOR, RECO_VECTOR, TYPE#, FAIL_TIME,RECO_TIME)
    VALUES
    (
         '96.22.163456', 
         299354,
         'XXXXXXX.12345.1.2.3',
         'prepared','P',
         hextoraw( '00000001' ),
         hextoraw( '00000000' ),
         0, sysdate, sysdate
    );
    
SQL> INSERT INTO PENDING_SESSIONS$
    VALUES
    (
     '96.22.163456', 
     1, hextoraw('05004F003A1500000104'),
     'C', 0, 30258592, '',
     146
    );
SQL>  COMMIT;

Now, we should be able to rollback or commit.


SQL> ROLLBACK FORCE '96.22.163456' 
or
SQL> COMMIT FORCE '96.22.163456'

Lastly, we remove the dummy entry from the dictionary:

SQL> ALTER SYSTEM ENABLE DISTRIBUTED RECOVERY;
    
SQL> ALTER SYSTEM SET "_smu_debug_mode" = 4;
    
SQL> COMMIT;
    
SQL> EXECUTE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('96.22.163456');
    
SQL> ALTER SYSTEM SET "_smu_debug_mode" = 0;
    
SQL> COMMIT;
Check to see whether the transaction has gone:

SQL> SELECT * FROM X$KTUXE WHERE KTUXEUSN=96 AND KTUXESLT=22 AND KTUXESQN =163456;
    -- Returns: No Rows    



2.3. Condition 3: DBA_2PC_PENDING has entry and there is a transaction but COMMIT or ROLLBACK HANGS!

In the situation, where COMMIT FORCE or ROLLBACK FORCE hangs,
Trying to purge the transaction will give an error like:

SQL> EXECUTE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('96.22.163456');
    -- ERROR at line 1:
    -- ORA-06510: PL/SQL: unhandled user-defined exception
    -- ORA-06512: at "SYS.DBMS_TRANSACTION", line 94
    -- ORA-06512: at line 1


Solution to Condition 3: The solution is the combination of Cond1 and Cond2:

First, delete the dictionary entries:

SQL> DELETE FROM SYS.PENDING_TRANS$ WHERE LOCAL_TRAN_ID = '96.22.163456'; 
    
SQL> DELETE FROM SYS.PENDING_SESSIONS$ WHERE LOCAL_TRAN_ID ='96.22.163456' ; 
    
SQL> DELETE FROM SYS.PENDING_SUB_SESSIONS$ WHERE LOCAL_TRAN_ID = '96.22.163456';
    
SQL> COMMIT;
Then, insert dummy record, force commit and finally purge the transaction:

SQL> ALTER SYSTEM DISABLE DISTRIBUTED RECOVERY;
    
SQL> INSERT INTO PENDING_TRANS$ (LOCAL_TRAN_ID, GLOBAL_TRAN_FMT, GLOBAL_ORACLE_ID,
STATE, STATUS, SESSION_VECTOR, RECO_VECTOR, TYPE#, FAIL_TIME,RECO_TIME)
    VALUES
    (
         '96.22.163456',
         306206,
         'XXXXXXX.12345.1.2.3',
         'prepared','P',
         hextoraw( '00000001' ),
         hextoraw( '00000000' ),
         0, sysdate, sysdate
    );
     
SQL> INSERT INTO PENDING_SESSIONS$
    VALUES
    (
         '96.22.163456',
         1, hextoraw('05004F003A1500000104'),
         'C', 0, 30258592, '',
         146
    );
SQL> COMMIT;
    
SQL> COMMIT FORCE '96.22.163456';
    
SQL> EXECUTE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('96.22.163456');







Two-Phase Commit (2PC)

The two phase-commit mechanism is used to ensure data integrity in a distributed transaction. It is automatically used during all distributed transactions and coordinates either the commit or roll back of all the changes in the transaction as a single, self-contained unit.

In short, there are three phases for the 2PC:
PREPARE: The initiating node ask each of its referenced nodes to promise to perform a commit or rollback when told to do so. The preparing node will flush the redo log buffer to the online redo log. It converts locks to in-doubt transaction locks on the data blocks and passes its highest SCN value to the initiating node.
COMMIT: The initiating node commits and writes to its redo log the committed SCN. The Data Block locks are released.
FORGET:  Pending transactions tables are related database views are cleared (dba_2pc_pending/dba_2pc_neighbors)
All the above phases take place quickly and transparently to the application where the transaction originated.

NOTE:
A crash during the PREPARE Phase results in a ROLLBACK
A crash during the COMMIT Phase results in either COMMIT or ROLLBACK





Recoverer Process (RECO)

RECO is a mandatory background process that, in a distributed database, automatically resolves failures in distributed transactions. The RECO process of a node automatically connects to other databases involved in an in-doubt distributed transaction; it can use an existing connection or establish a new connection to other nodes involved in the failed transaction.. When RECO reestablishes a connection between the databases, it automatically resolves all in-doubt transactions, removing from each database's pending transaction table any rows that correspond to the resolved transactions.

At exponentially growing time intervals, the RECO background process of a node attempts to recover the local portion of an in-doubt distributed transaction.

init.ora parameter distributed_transactions > 0  (for RECO to start)

If init.ora open_links is set to 0, then no distributed transactions are allowed.

The RECO process is present only if the instance permits distributed transactions.

You can enable and disable RECO using the ALTER SYSTEM statement with the ENABLE/DISABLE DISTRIBUTED RECOVERY options.

You can disable distributed recovery if the database is mounted but not open.

select dbid,name,log_mode,open_mode from v$database;
select instance_number,instance_name,status from v$instance;


Disabling and Enabling RECO

You can enable and disable RECO using the ALTER SYSTEM statement with the ENABLE/DISABLE DISTRIBUTED RECOVERY options. For example, you can temporarily disable RECO to force the failure of a two-phase commit and manually resolve the in-doubt transaction.

-To disable RECO:
ALTER SYSTEM DISABLE DISTRIBUTED RECOVERY

-To enable RECO and let it automatically resolve indoubt transactions
ALTER SYSTEM ENABLE DISTRIBUTED RECOVERY;




Database Parameters related to distributed Transactions :

DISTRIBUTED_LOCK_TIMEOUT  : 
DISTRIBUTED_LOCK_TIMEOUT is used to specify maximum time in  seconds   that instance waits for locked resources . Default is 60 seconds 



COMMIT_POINT_STRENGTH :
In a two phase commit process, one site or node is the designated “Commit Point Site”  . This  is determined by the values of the
COMMIT_POINT_STRENGTH parameter in all the database instances involved in the transaction which are adjacent to and include the Global Coordinator
 
The values range from 0 to 255 and if not set then the is determined by the software

The COMMIT_POINT_STRENGTH parameter in nonadjacent nodes is only used to resolve who will be the recursive“Commit Point Site” when a Local
Coordinator is the main “Commit Point Site"





Known Issues :

ORA-02053: transaction committed, some remote DBs may be in-doubt
The transaction has been locally committed, however we have lost communication with one or more local coordinators.

ORA-02054: transaction in-doubt
The transaction is neither committed or rolled back locally, and we have lost communication with the global coordinator.

ORA-02050: transaction rolled back, some remote DBs may be in-doubt
Indicates that a communication error ocurred during the two-phase commit

ORA-01591: lock held by in-doubt distributed transaction
Encountering the above error and users/applications unable to proceed with their work. In this case, Oracle automatically rolls back the user attempted transaction and the DBA has now to manually commit or rollback the in-doubt transaction.






Views  : 

DBA_2PC_PENDING  and DBA_2PC_NEIGHBORS   are  2 main views used for checking details 



DBA_2PC_PENDING 

view to determine the global commit number for a particular transaction ID. Lists all in-doubt distributed transactions. 
The view is empty until populated by an in-doubt transaction. After the transaction is resolved, the view is purged


Note: When LOCAL_TRAN_ID and GLOBAL_TRAN_ID value is same then node is the global coordinator of the transaction.

1) LOCAL_TRAN_ID : Local transaction identifier
2) GLOBAL_TRAN_ID : Global database identifier
3) STATE : –Collecting: node is currently collecting information from other database servers before it can decide whether it can prepare.
–Prepared: Node has prepared and holding locks on resources. May or not acknowledge to local coordinator
–Committed: Node has committed the transaction but other node may or may not done.
–Forced Commit: Administrator manually forced to commit the pending transaction at local node.
–Forced rollback: A pending transaction forced to rollback at local node.

4) MIXED: YES means part of transaction committed on one node and rollback at other node.
5) TRAN_COMMENT: Transaction comment
6) HOST: Hostname
7) COMMIT# : Global commit number for committed transaction



COL LOCAL_TRAN_ID FORMAT A13
COL GLOBAL_TRAN_ID FORMAT A30
COL STATE FORMAT A8
COL MIXED FORMAT A3
COL HOST FORMAT A10
COL COMMIT# FORMAT A10
SELECT LOCAL_TRAN_ID, GLOBAL_TRAN_ID, STATE, MIXED, HOST, COMMIT# FROM DBA_2PC_PENDING;




DBA_2PC_NEIGHBORS :

Lists all incoming and outgoing in-doubt distributed transactions. It also indicates whether the local node is the commit point site in the transaction.


1) LOCAL_TRAN_ID: Local transaction identifier
2) IN_OUT: IN for incoming transactions; OUT for outgoing transactions
3) DATABASE: For incoming transactions, the name of the client database for outgoing transactions, the name of the database link used to access information on a remote server.
4) DBUSER_OWNER: For incoming transactions, the local account used to connect, for outgoing transactions: the owner of the database link.
5) INTERFACE: C is a commit message; N is either indicating a prepared state or read-only commit.

COL LOCAL_TRAN_ID FORMAT A13
COL IN_OUT FORMAT A6
COL DATABASE FORMAT A25
COL DBUSER_OWNER FORMAT A15
COL INTERFACE FORMAT A3
SELECT LOCAL_TRAN_ID, IN_OUT, DATABASE, DBUSER_OWNER, INTERFACE FROM DBA_2PC_NEIGHBORS





Other views : 

SQL> SELECT * FROM GLOBAL_NAME;
SQL> select * from DBA_2PC_PENDING;
SQL> select * from DBA_2PC_NEIGHBORS;
SQL> select * from sys.pending_trans$;
SQL> select * from SYS.PENDING_SESSIONS$;
SQL> select * from SYS.PENDING_SUB_SESSIONS$;
SQL> select * from SYS.V$GLOBAL_TRANSACTION  ;

SELECT
    ktuxeusn            usn#     -- 65535 = no-undo transaction
  , ktuxeslt            slot#    -- 65535 = invalid slot#
  , ktuxesqn            seq#
  , ktuxesta            status
  , ktuxecfl            flags
  , ktuxesiz            undo_blks
  , ktuxerdbf           curfile 
  , ktuxerdbb           curblock
  , ktuxescnw * power(2, 32) + ktuxescnb cscn -- commit/prepare commit SCN
  , ktuxeuel            
  -- distributed xacts
  --, ktuxeddbf           r_rfile
  --, ktuxeddbb           r_rblock
  --, ktuxepusn           r_usn#
  --, ktuxepslt           r_slot#
  --, ktuxepsqn           r_seq#
FROM
    x$ktuxe
WHERE ktuxesta != 'INACTIVE'
ORDER BY
    ktuxeusn
  , ktuxeslt
/


SELECT
    "AREA NAME"                               -- VARCHAR2(32)
  , "NUMAPG"                                  -- NUMBER
  , "PAGESIZE" / 1024 mempage_kb                 -- NUMBER
  , ROUND("SEGMENT SIZE" / 1048576) segsize_mb -- NUMBER
  , ROUND("SIZE" / 1048576) area_size_mb            -- NUMBER
  , ROUND("REMAINING ALLOC SIZE" / 1048576) remain_mb                   -- NUMBER
  , "SHMID"                                   -- NUMBER
  , "SEGMENT DISTRIBUTED"                     -- VARCHAR2(20)
  , "AREA FLAGS"                              -- NUMBER
  , "SEGMENT DEFERRED"                        -- VARCHAR2(20)
  , "SEG_START ADDR"                          -- RAW(8)
  , "START ADDR"                              -- RAW(8)
FROM
    x$ksmssinfo
/





Before  Rollback or committing  please capture below details  as If these transaction were XA transactions, the responsibility of cleaning them up would be the external transaction manager.



***********************
.
SET MARKUP HTML ON SPOOL ON HEAD "<TITLE>2PC - INFO </title> -
<STYLE TYPE='TEXT/CSS'><!--BODY {background: ffffc6} --></STYLE>"
SET ECHO OFF
REM Spooling to html file
REM Insert the queries to be executed between the SPOOL commands.
.
define COLLNAME='SUP_'
define SRDCNAME='2PC'
column SRDCSPOOLNAME new_val SRDCSPOOLNAME
--select 'SRDC_'||upper('&&SRDCNAME')||'_'||upper(instance_name)||'_'||to_char(sysdate,'YYYYMMDD_HH24MISS')||'.htm' SRDCSPOOLNAME from v$instance;
select '&&COLLNAME'||upper('&&SRDCNAME')||'_'||upper(instance_name)||'_CDB_'||upper(cdb)||'_'||to_char(sysdate,'YYYYMMDD_HH24MISS')||'.htm' SRDCSPOOLNAME from v$instance, v$database;
spool &SRDCSPOOLNAME

ALTER SESSION set NLS_DATE_FORMAT='MM/DD/YY HH24:Mi:SS';
set pagesize 1000
select sysdate from dual;

SELECT SYS_CONTEXT ('USERENV', 'CON_NAME') "Current Container" FROM DUAL;
select con_id, name, cdb, current_scn from gv$database order by inst_id;
select * from gv$containers order by inst_id,con_id;
select * from gv$pdbs order by inst_id,con_id;
select * from gv$version;


select * from global_name;
select * from dba_2pc_pending;
select * from dba_2pc_neighbors;
select * from sys.pending_trans$;
select * from sys.pending_sessions$;
select * from sys.pending_sub_sessions$;

select * from v$session_wait order by SECONDS_IN_WAIT;

SELECT KTUXEUSN, KTUXESLT, KTUXESQN, /* Transaction ID */
KTUXESTA Status,
KTUXECFL Flags
FROM x$ktuxe
WHERE ktuxesta!='INACTIVE';

prompt +++ In memory transaction +++
select /*+ ORDERED */
'----------------------------------------'||'
Curent Time : '|| substr(to_char(sysdate,' HH24.MI.SS'),1,22) ||'
'||'TX start_time: '||t.KTCXBSTM||'
'||'FORMATID: '||g.K2GTIFMT ||'
'||'GTXID: '||g.K2GTITID_EXT ||'
'||'Branch: '||g.K2GTIBID ||'
Local_Tran_Id ='||substr(t.KXIDUSN||'.'||t.kXIDSLT||'.'||t.kXIDSQN,1,15)||'
'||'KTUXESTA='|| x.KTUXESTA ||'
'||'KTUXEDFL='|| x.KTUXECFL ||'
Lock_Info: ID1: ' || ((t.kXIDUSN*64*1024)+ t.kXIDSLT)
||' ID2: '|| t.kXIDSQN
XA_transaction_INFO
from x$k2gte g, x$ktcxb t, x$ktuxe x
where g.K2GTDXCB =t.ktcxbxba and
x.KTUXEUSN = t.KXIDUSN(+) and
x.KTUXESLT = t.kXIDSLT(+) and
x.KTUXESQN =t.kXIDSQN(+);

prompt +++ Timed out, prepared XA transactions +++

select global_tran_fmt, global_foreign_id, branch_id,state,
tran.local_tran_id
from sys.pending_trans$ tran, sys.pending_sessions$ sess
where tran.local_tran_id = sess.local_tran_id
and tran.state = 'prepared'
and dbms_utility.is_bit_set(tran.session_vector, sess.session_id)=1;

select sid, type, id1, id2, lmode, request from GV$lock
where type = 'TM' or type = 'TX';

select LOCAL_TRAN_ID, GLOBAL_TRAN_ID,to_char(FAIL_TIME,'dd-mm-yyyy HH24:MI:SS') FAIL_TIME,STATE, MIXED from DBA_2PC_PENDING;

select NAME, VALUE from V$PARAMETER where upper(NAME) = 'COMPATIBLE';
select NAME, VALUE from V$PARAMETER where upper(NAME) = 'DISTRIBUTED_LOCK_TIMEOUT';

select OWNER, OBJECT_NAME, OBJECT_TYPE, STATUS from DBA_OBJECTS where OBJECT_NAME like 'DBMS_XA%' order by OWNER, OBJECT_NAME, OBJECT_TYPE;

select blocking_session, sid, serial#,event, wait_class, seconds_in_wait
from v$session
where blocking_session is not NULL
order by blocking_session;

spool off
.
SET MARKUP HTML OFF
SET ECHO ON
.
***********************








References : 

Distributed Database, Transactions and Two Phase Commit (Doc ID 13229.1)

https://dbatrain.wordpress.com/wp-content/uploads/2009/01/database-links-masterclass_2.pdf

https://docs.oracle.com/en/database/oracle/oracle-database/12.2/admin/managing-distributed-transactions.html#GUID-AB19431E-BF9F-4181-BC57-699F8A06149D

Primary Note for Troubleshooting Oracle Managed Distributed Transactions (Doc ID 100664.1)

https://www.linkedin.com/pulse/resolve-in-doubt-transactions-handling-unstuck-mohsen-taheri/

https://oraclefiles.com/2019/03/04/resolving-in-doubt-transactions/

No comments:

Post a Comment