Saturday, August 12, 2023

Oracle Rac Node failover Testing


     As part of new build we need to  do  node failover testing hence documenting some steps  known to me 

Objective

Action

Expectations

To test accidental change of
 IP addresses of the cluster

Modify etc/hosts and change
theVIP addresses - Try starting thecluster

The cluster must not be able to
bootup with error messages related to network Ips.

To test the failure of ASM instance
 (ASM Instancecrash)

On Node 1, kill the processes
 related to ASM

The cluster must not be affected
by ASM crash on node 1

To test the failover of the network
 card of the public IPaddress.

Plug out one of the cableconnecting
 the node to thepublic network

The ping from the client
 should beable to go thru

To test the stability of the db by killing  a ora server process

kill off a process related to the
oracle daemons from a terminal.This should crash the database processes.

The sql session should not be
affected and failover to the next node without any disruption

To test thecomplete failover of the session in event of a node(e.g. sudden shutdown

1. Connect from the client to the database via sqlplus.

2. power off the unix box inwhich the sqlplus connection isat.

The sql session should not be
 affected and failover to the nextnode without any disruption

To test the failover session of a
connected oracleuser in the event of an instance failureTo test the failoversession of aconnected oracleuser in the event of an instance failure

1. Connect from a client to node1 in the RAC Environment usingsqlplus.

2. Connect to the database onnode 1 and do a shutdown.

sqlplus / as sysdba

The sql connection should be able to
swing over to other instance and stillcontinue the select query

# – sqlplus connection from client todb is still available

To test Crash of Grid  Processes

Kill Lmon process on rac node

They must be restarted
 automatically

To  Test for automatically shifting of
 scan listeners

Reboot Node 1 using  the OS
reboot command

1. When noe 1 goes down all scan
listeners must shift to remaining
node

2. When node 1 comes up  scan listeners must again shift back
to original node

To test storage  crash

Forcibly make scan disk unvisible to
both nodes

The cluster must go down and
should not start since rac does not
safeguard to storage failures

TAF failoer test

to test sql and session   failover 

 

 



No comments:

Post a Comment