Monday, June 04, 2007

What To Do and Not To Do When 'shutdown immediate' Hangs

Subject: What To Do and Not To Do When 'shutdown immediate' Hangs
Doc ID: Note:375935.1 Type: HOWTO
Last Revision Date: 02-JUN-2007 Status: MODERATED

In this Document
Goal
Solution
References



--------------------------------------------------------------------------------


This document is being delivered to you via Oracle Support's Rapid Visibility (RaV) Rapid Visibility (RaV) process, and therefore has not been subject to an independent technical review.



Applies to:
Oracle Server - Enterprise Edition - Version: 8.1.7 to 10.2
Information in this document applies to any platform.

Goal
What to do when shutdown immediate appears to hang.
Sometimes, the message 'Waiting for smon to disable tx recovery' is posted in the alert log.
This note only addresses situations when the apparent hang occurs when the database is going from OPEN to MOUNT, which is actually the most common situation.
If the apparent hang occurs at a different step, then this note does not apply.

Solution
The big problem in these situations is that it is noticed only after the shutdown immediate has been issued.
This kind of situation is mostly caused by 2 things:
1. a large query running at the shutdown moment.
2. a large transaction running at the shutdown moment.
Both have to complete in order for the database to be brought down when shutdown immediate is issued.
Actually, the files cannot be closed consistently because of one of the 2 possibilities above and, as such, the transition from OPEN to MOUNT is postponed until the files are closed, which means that either the large query completes or the large transaction is rolled back. This is not a hang, it is the expected behavior.

So, before issuing the shutdown immediate, it would be recommended to check the following views, especially when the database needs to be brought down for a very short period of time:
1. for large queries:

select count(*) from v$session_longops where time_remaining>0;
2. for large transactions:

select sum(used_ublk) from v$transaction;
A result greater than 0 for the first query and a large value returned for the second one would mean a relatively long time to wait until the shutdown immediate completes.
For the second situation, please also check step 9 in Note 117316.1 to "guestimate" the time to rollback the transactions.

1. For the large queries situation, when the shutdown immediate is hanging, you can just bring down the database using: shutdown abort, as the database could be easily brought to a consistent state by:
startup restrict followed by shutdown immediate.
One should take the backup and/or do whatever else need to be done after the shutdown immediate.
2. For the second situation, the workaround cannot be applied, especially when it's needed to take a cold backup. The database must be closed in a consistent state in order to do this and the consistent state cannot be achieved until all the transactions have completed one way or another (commit/rollback).

As such, it's up to the local personnel to decide what to do, depending on the local needs.
It is very important to realize that: BY SHUTTING DOWN A DATABASE YOU DO NOT SOLVE A PERFORMANCE PROBLEM CAUSED BY A LARGE TRANSACTION. You are only making things worse.
There are situations when the database is brought down even when a large transaction/large recovery is taking place. Then it's brought up again and a new shutodwn is tried. Again, the shutdown immediate is hanging, for a very simple reason - the large recovery is still going on.
At this moment, the v$transaction view is not displaying anything.
Hoever, it is still possible to check the recovery operation by checking the:

select * from v$fast_start_transactions;and/orselect * from v$fast_start_servers;
views. They are the ones that display the recovery status.
As such, when a large transaction is taking place, do not try successive shutdown aborts, startups and shutdown immediate. The hang will reoccur. The database must be consistent when the database is dismounted - performing successive shutdowns/startups is not helping at all, it's only making the recovery even more lengthy.

You should prevent these situations by notifying the users a shutdown will be done and no large operations should be started.
If a large operation has already started at the moment when you want to shutdown immedate, assess what would be faster - rollback the current situation or allow it to complete.

No comments: