Abdul hafeez kalsekar -- OCS/OCE/OCP: Oracle Rac Node failover Testing

Saturday, August 12, 2023

Oracle Rac Node failover Testing

As part of new build we need to do node failover testing hence documenting some steps known to me

Objective	Action	Expectations
To test accidental change of IP addresses of the cluster	Modify etc/hosts and change theVIP addresses - Try starting thecluster	The cluster must not be able to bootup with error messages related to network Ips.
To test the failure of ASM instance (ASM Instancecrash)	On Node 1, kill the processes related to ASM	The cluster must not be affected by ASM crash on node 1
To test the failover of the network card of the public IPaddress.	Plug out one of the cableconnecting the node to thepublic network	The ping from the client should beable to go thru
To test the stability of the db by killing a ora server process	kill off a process related to the oracle daemons from a terminal.This should crash the database processes.	The sql session should not be affected and failover to the next node without any disruption
To test thecomplete failover of the session in event of a node(e.g. sudden shutdown	1. Connect from the client to the database via sqlplus. 2. power off the unix box inwhich the sqlplus connection isat.	The sql session should not be affected and failover to the nextnode without any disruption
To test the failover session of a connected oracleuser in the event of an instance failureTo test the failoversession of aconnected oracleuser in the event of an instance failure	1. Connect from a client to node1 in the RAC Environment usingsqlplus. 2. Connect to the database onnode 1 and do a shutdown. sqlplus / as sysdba	The sql connection should be able to swing over to other instance and stillcontinue the select query # – sqlplus connection from client todb is still available
To test Crash of Grid Processes	Kill Lmon process on rac node	They must be restarted automatically
To Test for automatically shifting of scan listeners	Reboot Node 1 using the OS reboot command	1. When noe 1 goes down all scan listeners must shift to remaining node 2. When node 1 comes up scan listeners must again shift back to original node
To test storage crash	Forcibly make scan disk unvisible to both nodes	The cluster must go down and should not start since rac does not safeguard to storage failures
TAF failoer test	to test sql and session failover

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)