Oracle Backup and Recovery on Windows, Part II - Database Recovery using RMAN:
3. Recovery from missing or corrupted datafile(s):
Case 1: Recovery from corrupted or missing datafile
This scenario deals with a situation where a datafile has gone missing, or
is corrupted beyond repair. For concreteness, we look at a case where a datafile is
missing. Below is a transcript of an SQL Plus session that attempts to open
a database with a missing datafile (typed commands in bold, lines in italics are my
comments, all other lines
are feedback from SQL Plus):
--open SQL Plus from the command line without
--logging on to database
C:\>sqlplus /nolog
SQL*Plus: Release 9.2.0.4.0 - Production on Tue Jan 25 14:52:41 2005
Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
--Connect to the idle Oracle process as a privileged user and start up instance
SQL> connect / as sysdba
Connected to an idle instance.
SQL> startup
ORACLE instance started.
Total System Global Area 131555128 bytes
Fixed Size 454456 bytes
Variable Size 88080384 bytes
Database Buffers 41943040 bytes
Redo Buffers 1077248 bytes
Database mounted.
ORA-01157: cannot identify/lock data file 4 - see DBWR trace file
ORA-01110: data file 4: 'D:\ORACLE_DATA\DATAFILES\ORCL\USERS01.DBF'
SQL>
The error message tells us that file# 4 is missing. Note that although the
startup command has failed, the database is in the mount state. Thus, the
database control file, which is also the RMAN repository can be accessed by
the instance and by RMAN. We now recover the missing
file using RMAN. The transcript of the recovery session is reproduced
below (bold lines are typed commands, comments in italics,
the rest is feedback from RMAN):
--logon to RMAN
C:\>rman target /
Recovery Manager: Release 9.2.0.4.0 - Production
Copyright (c) 1995, 2002, Oracle Corporation. All rights reserved.
connected to target database: ORCL (DBID=1507972899)
--restore missing datafile
RMAN> restore datafile 4;
Starting restore at 26/JAN/05
using target database controlfile instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=14 devtype=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: sid=15 devtype=DISK
channel ORA_DISK_1: starting datafile backupset restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
restoring datafile 00004 to D:\ORACLE_DATA\DATAFILES\ORCL\USERS01.DBF
channel ORA_DISK_1: restored backup piece 1
piece handle=D:\BACKUP\0QGB0UEC_1_1.BAK tag=TAG20050124T152708 params=NULL
channel ORA_DISK_1: restore complete
Finished restore at 26/JAN/05
--recover restored datafile - RMAN applies all logs automatically
RMAN> recover datafile 4;
Starting recover at 26/JAN/05
using channel ORA_DISK_1
using channel ORA_DISK_2
starting media recovery
archive log thread 1 sequence 4 is already on disk as file E:\ORACLE_ARCHIVE\ORCL\1_4.ARC
archive log thread 1 sequence 5 is already on disk as file C:\ORACLE_ARCHIVE\ORCL\1_5.ARC
archive log thread 1 sequence 6 is already on disk as file E:\ORACLE_ARCHIVE\ORCL\1_6.ARC
archive log thread 1 sequence 7 is already on disk as file E:\ORACLE_ARCHIVE\ORCL\1_7.ARC
archive log filename=E:\ORACLE_ARCHIVE\ORCL\1_4.ARC thread=1 sequence=4
archive log filename=C:\ORACLE_ARCHIVE\ORCL\1_5.ARC thread=1 sequence=5
media recovery complete
Finished recover at 26/JAN/05
--open database for general use
RMAN> alter database open;
database opened
RMAN>
In the above scenario, the database is already in the mount state before the RMAN session
is initiated. If the database is not mounted, you should issue a "startup mount" command before
attempting to restore the missing datafile. The database must be mounted before any datafile
recovery can be done.
If the database is already open when datafile corruption is detected, you can recover the
datafile without shutting down the database. The only additional step is to take the
relevant tablespace offline before starting recovery. In this case you would perform
recovery at the tablespace level. The commands are:
C:\>rman target /
Recovery Manager: Release 9.2.0.4.0 - Production
Copyright (c) 1995, 2002, Oracle Corporation. All rights reserved.
connected to target database: ORCL (DBID=1507972899)
--offline affected tablespace
RMAN> sql 'alter tablespace USERS offline immediate';
using target database controlfile instead of recovery catalog
sql statement: alter tablespace USERS offline immediate
--recover offlined tablespace
RMAN> recover tablespace USERS;
Starting recover at 26/JAN/05
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=14 devtype=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: sid=12 devtype=DISK
starting media recovery
media recovery complete
Finished recover at 26/JAN/05
--online recovered tablespace
RMAN> sql 'alter tablespace USERS online';
sql statement: alter tablespace USERS online
RMAN>
Here we have used the SQL command, which allows us to execute arbitrary
SQL from within RMAN.
Case 2: Recovery from block corruption
It is possible to recover corrupted blocks using RMAN backups. This is a
somewhat exotic scenario, but it can be useful in certain circumstances,
as illustrated by the following example. Here's the situation: a user connected
to SQLPlus gets a data block corruption error when she queries a table.
Here's a part of the session transcript:
SQL> connect testuser/testpassword
Connected.
SQL> select count(*) from test_table;
select count(*) from test_table
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 4, block # 2015)
ORA-01110: data file 4: 'D:\ORACLE_DATA\DATAFILES\ORCL\USERS01.DBF'
Since we know the file and block number, we can perform block level recovery
using RMAN. This is best illustrated by example:
C:\>rman target /
Recovery Manager: Release 9.2.0.4.0 - Production
Copyright (c) 1995, 2002, Oracle Corporation. All rights reserved.
connected to target database: ORCL (DBID=1507972899)
--restore AND recover specific block
RMAN> blockrecover datafile 4 block 2015;
Starting blockrecover at 26/JAN/05
using target database controlfile instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=19 devtype=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: sid=20 devtype=DISK
channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00004
channel ORA_DISK_1: restored block(s) from backup piece 1
piece handle=E:\BACKUP\0QGB0UEC_1_1.BAK tag=TAG20050124T152708 params=NULL
channel ORA_DISK_1: block restore complete
starting media recovery
media recovery complete
Finished blockrecover at 26/JAN/05
RMAN>
Now our user should be able to query the table from her SQLPlus session. Here's
her session transcript after block recovery.
SQL> select count(*) from test_table;
COUNT(*)
----------
217001
SQL>
A couple of important points regarding block recovery:
-
Block recovery can only be done using RMAN.
-
The entire database can be open while performing block recovery.
-
Check all database files for corruption. This is important - there
could be other corrupted blocks. Verification of database files can
be done using RMAN or the dbverify utility. To verify using RMAN
simply do a complete database backup with default settings. If RMAN
detects block corruption, it will exit with an error message pointing
out the guilty file/block.
4. Recovery from missing or corrupted redo log group:
Case 1: A multiplexed copy of the missing log is available.
If a redo log is missing, it should be restored from a multiplexed copy, if
possible.
This is the only way to recover without any losses. Here's
an example,
where I attempt to startup from SQLPlus when a redo log is missing:
SQL> startup
ORACLE instance started.
Total System Global Area 131555128 bytes
Fixed Size 454456 bytes
Variable Size 88080384 bytes
Database Buffers 41943040 bytes
Redo Buffers 1077248 bytes
Database mounted.
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: 'D:\ORACLE_DATA\LOGS\ORCL\REDO03A.LOG'
SQL>
To fix this we simply copy REDO03A.LOG from its multiplexed location on E: to
the above location on E:. After copying the file, we issue an "alter database
open" from the above SQLPlus session:
SQL> alter database open;
Database altered.
SQL>
That's it - the database is open for use.
Case 2: All members of a log group lost.
In this case an incomplete recovery is the best we can do. We will lose
all transactions from the missing log and all subsequent logs. We illustrate
using the same example as above. The error message indicates that members of
log group 3 are missing. We don't have a copy of this file, so we know that
an incomplete recovery is required.
The first step is to determine
how much can be recovered. In order to do this, we query the V$LOG
view (when in the mount state) to find the system change number (SCN) that we
can recover to (Reminder: the SCN is a monotonically increasing number that is incremented
whenever a commit is issued):
--The database should be in the mount state for v$log access
SQL> select first_change# from v$log where group#=3 ;
FIRST_CHANGE#
-------------
370255
SQL>
The FIRST_CHANGE# is the first SCN stamped in the missing log. This implies that
the last SCN stamped in the previous log is 370254 (FIRST_CHANGE#-1). This is the
highest SCN that we can recover to. In order to do the recovery we must first
restore ALL datafiles to this SCN, followed by recovery (also up to this SCN). This
is an incomplete recovery, so we must open the database resetlogs after we're done.
Here's a transcript of the recovery session (typed commands in bold, comments in
italics, all other lines
are RMAN feedback):
C:\>rman target /
Recovery Manager: Release 9.2.0.4.0 - Production
Copyright (c) 1995, 2002, Oracle Corporation. All rights reserved.
connected to target database: ORCL (DBID=1507972899)
--Restore ENTIRE database to determined SCN
RMAN> restore database until scn 370254;
Starting restore at 26/JAN/05
using channel ORA_DISK_1
using channel ORA_DISK_2
channel ORA_DISK_1: starting datafile backupset restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
restoring datafile 00001 to D:\ORACLE_DATA\DATAFILES\ORCL\SYSTEM01.DBF
restoring datafile 00004 to D:\ORACLE_DATA\DATAFILES\ORCL\USERS01.DBF
channel ORA_DISK_2: starting datafile backupset restore
channel ORA_DISK_2: specifying datafile(s) to restore from backup set
restoring datafile 00002 to D:\ORACLE_DATA\DATAFILES\ORCL\UNDOTBS01.DBF
restoring datafile 00003 to D:\ORACLE_DATA\DATAFILES\ORCL\TOOLS01.DBF
channel ORA_DISK_2: restored backup piece 1
piece handle=E:\BACKUP\13GB14IB_1_1.BAK tag=TAG20050124T171139 params=NUL
channel ORA_DISK_2: restore complete
channel ORA_DISK_1: restored backup piece 1
piece handle=E:\BACKUP\14GB14IB_1_1.BAK tag=TAG20050124T171139 params=NUL
channel ORA_DISK_1: restore complete
Finished restore at 26/JAN/05
--Recover database
RMAN> recover database until scn 370254;
Starting recover at 26/JAN/05
using channel ORA_DISK_1
using channel ORA_DISK_2
starting media recovery
archive log thread 1 sequence 9 is already on disk as file E:\ORACLE_ARCHIVE\ORCL\1_9.ARC
archive log thread 1 sequence 10 is already on disk as file E:\ORACLE_ARCHIVE\ORCL\1_10.ARC
archive log thread 1 sequence 11 is already on disk as file E:\ORACLE_ARCHIVE\ORCL\1_11.ARC
archive log thread 1 sequence 12 is already on disk as file E:\ORACLE_ARCHIVE\ORCL\1_12.ARC
archive log filename=E:\ORACLE_ARCHIVE\ORCL\1_9.ARC thread=1 sequence=9
archive log filename=E:\ORACLE_ARCHIVE\ORCL\1_10.ARC thread=1 sequence=10
media recovery complete
Finished recover at 26/JAN/05
--open database with RESETLOGS (see comments below)
RMAN> alter database open resetlogs;
database opened
RMAN>
The following points should be noted:
-
The entire database must be restored to the SCN that has been determined by querying
v$log.
-
All changes beyond that SCN are lost. This method
of recovery should be used only if you are sure that you cannot do better. Be sure to
multiplex your redo logs, and (space permitting) your archived logs!
-
The database must be opened with RESETLOGS, as a required log has not been applied.
This resets the log sequence to zero, thereby
rendering all prior backups worthless. Therefore, the first step after opening a database
RESETLOGS is to take a fresh backup. Note that the RESETLOGS option must be used for any
incomplete recovery.
5. Recovery from missing or corrupted control file:
Case 1: A multiplexed copy of the control file is available.
On startup Oracle must read the control file in order to find out where
the datafiles and online logs are located. Oracle expects to find control
files at locations
specified in the CONTROL_FILE initialisation parameter. The instance will fail to mount
the database if any one of the control files are missing or corrupt. A brief error message
will be displayed, with further details recorded in the alert log. The exact error
message will vary depending on what has gone wrong. Here's an example:
SQL> startup
ORACLE instance started.
Total System Global Area 135338868 bytes
Fixed Size 453492 bytes
Variable Size 109051904 bytes
Database Buffers 25165824 bytes
Redo Buffers 667648 bytes
ORA-00205: error in identifying controlfile, check alert log for more info
SQL>
On checking the alert log, as suggested, we find the following:
ORA-00202: controlfile: 'e:\oracle_dup_dest\controlfile\ORCL\control02.ctl'
ORA-27046: file size is not a multiple of logical block size
OSD-04012: file size mismatch (OS 5447783)
The above corruption was introduced by manually editing the control file when the database was
closed.
The solution is simple, provided you have at least one uncorrupted control file - replace the
corrupted control file with a copy using operating system commands. Remember to rename the
copied file. The database should now start up without any problems.
Case 2: All control files lost
What if you lose all your control files? In that case you have no
option but to use a backup control file. The recovery needs to be performed from within RMAN,
and requires that all logs (archived and current online logs) since the last backup are
available. The logs are required because all datafiles must also be restored from backup.
The database will then have to be recovered up to the time the control files went missing. This
can only be done if all intervening logs are available. Here's an annotated transcript of
a recovery session (as usual, lines in bold are commands to be typed, lines in italics
are explanatory comments, other lines are RMAN feedback):
-- Connect to RMAN
C:\>rman target /
Recovery Manager: Release 9.2.0.4.0 - Production
Copyright (c) 1995, 2002, Oracle Corporation. All rights reserved.
connected to target database: ORCL (not mounted)
-- set DBID - get this from the name of the controlfile autobackup.
-- For example, if autobackup name is
-- CTL_SP_BAK_C-1507972899-20050124-00 the the DBID is
-- 1507972899. This step will not be required if the instance is
-- started up from RMAN
RMAN> set dbid 1507972899
executing command: SET DBID
--restore controlfile from autobackup. The backup is not at the default
--location so the path must be specified
RMAN> restore controlfile from 'e:\backup\CTL_SP_BAK_C-1507972899-20050124-00';
Starting restore at 26/JAN/05
using channel ORA_DISK_1
channel ORA_DISK_1: restoring controlfile
channel ORA_DISK_1: restore complete
replicating controlfile
input filename=D:\ORACLE_DATA\CONTROLFILE\ORCL\CONTROL01.CTL
output filename=E:\ORACLE_DUP_DEST\CONTROLFILE\ORCL\CONTROL02.CTL
output filename=C:\ORACLE_DUP_DEST\CONTROLFILE\ORCL\CONTROL03.CTL
Finished restore at 26/JAN/05
-- Now that control files have been restored, the instance can mount the
-- database.
RMAN> mount database;
database mounted
-- All datafiles must be restored, since the controlfile is older than the current
-- datafiles. Datafile restore must be followed by recovery up to the current log.
RMAN> restore database;
Starting restore at 26/JAN/05
using channel ORA_DISK_1
channel ORA_DISK_1: starting datafile backupset restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
restoring datafile 00001 to D:\ORACLE_DATA\DATAFILES\ORCL\SYSTEM01.DBF
restoring datafile 00004 to D:\ORACLE_DATA\DATAFILES\ORCL\USERS01.DBF
channel ORA_DISK_1: restored backup piece 1
piece handle=E:\BACKUP\0DGB0I79_1_1.BAK tag=TAG20050124T115832 params=NULL
channel ORA_DISK_1: restore complete
channel ORA_DISK_1: starting datafile backupset restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
restoring datafile 00002 to D:\ORACLE_DATA\DATAFILES\ORCL\UNDOTBS01.DBF
restoring datafile 00003 to D:\ORACLE_DATA\DATAFILES\ORCL\TOOLS01.DBF
channel ORA_DISK_1: restored backup piece 1
piece handle=E:\BACKUP\0CGB0I78_1_1.BAK tag=TAG20050124T115832 params=NULL
channel ORA_DISK_1: restore complete
Finished restore at 26/JAN/05
--Database must be recovered because all datafiles have been restored from
-- backup
RMAN> recover database;
Starting recover at 26/JAN/05
using channel ORA_DISK_1
starting media recovery
archive log thread 1 sequence 2 is already on disk as file E:\ORACLE_ARCHIVE\ORCL\1_2.ARC
archive log thread 1 sequence 4 is already on disk as file D:\ORACLE_DATA\LOGS\ORCL\REDO02A.LOG
archive log thread 1 sequence 5 is already on disk as file D:\ORACLE_DATA\LOGS\ORCL\REDO01A.LOG
archive log thread 1 sequence 6 is already on disk as file D:\ORACLE_DATA\LOGS\ORCL\REDO03A.LOG
archive log filename=E:\ORACLE_ARCHIVE\ORCL\1_2.ARC thread=1 sequence=2
archive log filename=E:\ORACLE_ARCHIVE\ORCL\1_3.ARC thread=1 sequence=3
archive log filename=E:\ORACLE_DATA\LOGS\ORCL\REDO02A.LOG thread=1 sequence=4
archive log filename=E:\ORACLE_DATA\LOGS\ORCL\REDO01A.LOG thread=1 sequence=5
archive log filename=E:\ORACLE_DATA\LOGS\ORCL\REDO03A.LOG thread=1 sequence=6
media recovery complete
Finished recover at 26/JAN/05
-- Recovery completed. The database must be opened with RESETLOGS
-- because a backup control file was used. Can also use
-- "alter database open resetlogs" instead.
RMAN> open resetlogs database;
database opened
Several points are worth emphasising.
-
Recovery using a backup controlfile should be done only if a current control file is
unavailable.
-
All datafiles must be restored from backup. This means the database will need to be
recovered using archived and online redo logs. These MUST be available for recovery
until the time of failure.
-
As with any database recovery involving RESETLOGS, take a fresh backup immediately.
-
Technically the above is an example of complete recovery - since all committed transactions were
recovered. However, some references consider this to be incomplete recovery
because the database log sequence had to be reset.
After recovery using a backup controlfile, all temporary files associated with locally-managed
tablespaces are no longer available. You can check that this is so by querying the view V$TEMPFILE -
no rows will be returned. Therefore tempfiles must be added (or recreated) before the database is made
available for general use. In the case at hand, the tempfile already exists so we merely add it
to the temporary tablespace. This can be done using SQLPlus or any tool of your choice:
SQL> alter tablespace temp add tempfile
'D:\oracle_data\datafiles\ORCL\TEMP01.DBF';
Tablespace altered.
SQL>
Check that the file is available by querying v$TEMPFILE.
6. Wrap up:
In an article this size it is impossible to cover all possible recovery scenarios
that one might
encounter in real life. The above examples will, I hope, provide you with
some concrete situations to try out on your test box.
The best preparation for real-life recovery is practice. Simulate as many
variations of the above situations, and others, as you can think up. Then try recovering from
them. The exercise will improve your recovery skills, clarify conceptual issues
and highlight deficiencies in your backup strategy.