Data Management
Project Management
Archived stuff
Cryptic crossword.
A bit about me.
Read my blog
Recent additions
Home
Oracle Articles

Part I    Part II

Oracle Backup and Recovery on Windows, Part III - Disaster Recovery using RMAN:


1. Introduction:

This is the final article of a three part series, introducing Oracle's RMAN (Recovery MANager) backup and recovery tool to novice DBAs. The first article dealt with taking RMAN backups and the second one covered some recovery scenarios. This article discusses disaster recovery - i.e. a situation in which your database server has been destroyed and has taken all your database files (control files, logs and data files) with it. Obviously, recovery from a disaster of this nature is dependent on what you have in terms of backups and hardware resources. We assume you have the following available after the disaster:
  • A server with the same disk layout as the original.
  • The last full hot backup on tape.
With the above items at hand, it is possible to recover all data up to the last full backup. One can do better if subsequent archive logs (after the last backup) are available. In our case these aren't available, since our only archive destination was on the destroyed server (see Part I ). Oracle provides methods to achieve better data protection. We will discuss some of these towards the end of the article.

Now on with the task at hand. The high-level steps involved in disaster recovery are:
  • Build replacement server.
  • Restore backup from tape.
  • Install database software.
  • Create Oracle service.
  • Restore and recover database.
It sounds quite straightforward and it is reasonably so. However, there are some details that are often missed in books and documentation. The devil, as always, is in these details. Here we hope to provide you with enough detail to plan and practice disaster recovery in your test environment.

2. Build the server

You need a server to host the database, so the first step is to acquire or build the new machine. This is not strictly a DBA task, so we won't delve into details here. The main point to keep in mind is that the replacement server should, as far as possible, be identical to the old one. In particular, pay attention to the following areas:
  1. Disk layout and capacity: Ideally the server should have the same number of disks as the original. This avoids messy renaming of files during recovery. Obviously, the new disks should also have enough space to hold all software and data that was on the original server.
  2. Operating system, service pack and patches: The operating system environment should be the same as the original, right up to service pack and patch level.
  3. Memory: The new server must have enough memory to cater to Oracle and operating system / other software requirements. Oracle memory structures (Shared pool, db buffer caches etc) will be sized identically to the original database instance. Use of the backup server parameter file will ensure this.
Although you probably won't build and configure the machine yourself. It is important to work with your systems people so that the above items are built to the recovery server specs.

3. Restore backup from tape

The next step is to get your backup from tape on to disk. In our example from Part I, the directory to be restored is e:\backup. The details of this depend on the backup product used, so we can't go into it any further. This task would normally be performed your local sysadmin.

4. Install Oracle Software

Now we get to the meat of the database recovery process. The next step is to install Oracle software on the machine. The following points should be kept in mind when installing the software:
  1. Install the same version of Oracle as was on the destroyed server. The version number should match right down to the patch level, so this may be a multi-step process involving installation followed by the application of one or more patchsets and patches.
  2. Do not create a new database at this stage.
  3. Create a listener using the Network Configuration Assistant. Ensure that it has the same name and listening ports as the original listener. Relevant listener configuration information can be found in the backed up listener.ora file.
4. Create directory structure for database files

After software installation is completed, create all directories required for datafiles, (online and archived) logs, control files and backups. All directory paths should match those on the original server. This, though not mandatory, saves additional steps associated with renaming files during recovery.

Don't worry if you do not know where the database files should be located. You can obtain the required information from the backup spfile and control file at a later stage. Continue reading - we'll come back to this later.

5. Create Oracle service

As described in section 2 of Part II, an Oracle service must be exist before a database is created. The service is created using the oradim utility, which must be run from the command line. The following commands show how to create and modify a service (comments in italics, typed commands in bold):

--create a new service with manual startup

C:\>oradim -new -sid ORCL -startmode m

--modify service to startup automatically

C:\>oradim -edit -sid ORCL -startmode a

Unfortunately oradim does not give any feedback, but you can check that the service exists via the Services administrative panel. The service has been configured to start automatically when the computer is powered up. Note that oradim offers options to delete, startup and shutdown a service. See the documentation for details.

6. Restore and recover database

Now it is time to get down to the nuts and bolts of database recovery. There are several steps, so we'll list them in order:
  1. Copy password and tnsnames file from backup: The backed up password file and tnsnames.ora files should be copied from the backup directory (e:\backup, in our example) to the proper locations. Default location for password and tnsnames files are ORACLE_HOME\database ORACLE_HOME\network\admin respectively.

  2. Set ORACLE_SID environment variable: ORACLE_SID should be set to the proper SID name (ORCL in our case). This can be set either in the registry (registry key: HKLM\Software\Oracle\HOME<X>\ORACLE_SID) or from the system applet in the control panel.

  3. Invoke RMAN and set the DBID: We invoke rman and connect to the target database as usual. No login credentials are required since we connect from an OS account belonging to ORA_DBA. Note that RMAN accepts a connection to the database although the database is yet to be recovered. RMAN doesn't as yet "know" which database we intend to connect to. We therefore need to identify the (to be restored) database to RMAN. This is done through the database identifier (DBID). The DBID can be figured out from the name of the controlfile backup. Example: if you use the controlfile backup format suggested in Part I, your controlfile backup name will be something like "CTL_SP_BAK_C-1507972899-20050228-00". In this case the DBID is 1507972899. Here's a transcript illustrating the process of setting the DBID:

    C:\>rman

    Recovery Manager: Release 9.2.0.4.0 - Production

    Copyright (c) 1995, 2002, Oracle Corporation. All rights reserved.

    RMAN> connect target /

    connected to target database (not started)

    RMAN> set dbid 1507972899

    executing command: SET DBID

    RMAN>

  4. Restore spfile from backup: To restore the spfile, you first need to startup the database in the nomount state. This starts up the database using a dummy parameter file. After that you can restore the spfile from the backup (which has been restored from tape in Section 3). Finally you restart the database in nomount state. The restart is required in in order to start the instance using the restored parameter file. Here is an example RMAN transcript for the foregoing procedure. Note the difference in SGA size and components between the two startups:

    RMAN> startup nomount

    startup failed: ORA-01078: failure in processing system parameters
    LRM-00109: could not open parameter file 'C:\ORACLE\ORA92\DATABASE\INITORCL.ORA'

    trying to start the Oracle instance without parameter files ...
    Oracle instance started

    Total System Global Area 97590928 bytes

    Fixed Size 454288 bytes
    Variable Size 46137344 bytes
    Database Buffers 50331648 bytes
    Redo Buffers 667648 bytes

    RMAN> restore spfile from 'e:\backup\CTL_SP_BAK_C-1507972899-20050228-00';

    Starting restore at 01/MAR/05

    using target database controlfile instead of recovery catalog
    allocated channel: ORA_DISK_1
    channel ORA_DISK_1: sid=9 devtype=DISK
    channel ORA_DISK_1: autobackup found: e:\backup\CTL_SP_BAK_C-1507972899-20050228-00
    channel ORA_DISK_1: SPFILE restore from autobackup complete
    Finished restore at 01/MAR/05

    RMAN> startup force nomount

    Oracle instance started

    Total System Global Area 1520937712 bytes

    Fixed Size 457456 bytes
    Variable Size 763363328 bytes
    Database Buffers 754974720 bytes
    Redo Buffers 2142208 bytes

    RMAN>

    The instance is now started up with the correct initialisation parameters.

    We are now in a position to determine the locations of control file and archive destination, as this information sits in the spfile. This is done via SQL Plus as follows:

    C:\>sqlplus /nolog

    ....output not shown

    SQL>connect / as sysdba
    Connected.
    SQL> show parameter control_file

    ....output not shown

    SQL> show parameter log_archive_dest
    ....output not shown

    The directories listed in the CONTROL_FILES and LOG_ARCHIVE_DEST_N parameters should be created at this stage if they haven't been created earlier.

  5. Restore control file from backup: The instance now "knows" where the control files should be restored, as this is listed in the CONTROL_FILES initialisation parameter. Therefore, the next step is to restore these files from backup. Once the control files are restored, the instance should be restarted in mount mode. A restart is required because the instance must read the initialisation parameter file in order to determine the control file locations. At the end of this step RMAN also has its proper configuration parameters, as these are stored in the control file.

    Here is a RMAN session transcript showing the steps detailed here:

    RMAN> restore controlfile from 'e:\backup\CTL_SP_BAK_C-1507972899-20050228-00';

    Starting restore at 01/MAR/05

    allocated channel: ORA_DISK_1
    hannel ORA_DISK_1: sid=13 devtype=DISK
    channel ORA_DISK_1: restoring controlfile
    channel ORA_DISK_1: restore complete
    replicating controlfile
    input filename=D:\ORACLE_DATA\CONTROLFILE\ORCL\CONTROL01.CTL
    output filename=E:\ORACLE_DATA\CONTROLFILE\ORCL\CONTROL02.CTL
    output filename=C:\ORACLE_DUP_DEST\CONTROLFILE\ORCL\CONTROL03.CTL
    Finished restore at 01/MAR/05

    RMAN> shutdown

    Oracle instance shut down

    RMAN> exit

    Recovery Manager complete.

    C:\>rman target /

    Recovery Manager: Release 9.2.0.4.0 - Production

    Copyright (c) 1995, 2002, Oracle Corporation. All rights reserved.

    connected to target database (not started)

    RMAN> startup mount;

    Oracle instance started
    database mounted

    Total System Global Area 1520937712 bytes

    Fixed Size 457456 bytes
    Variable Size 763363328 bytes
    Database Buffers 754974720 bytes
    Redo Buffers 2142208 bytes

    RMAN> show all;

    using target database controlfile instead of recovery catalog
    RMAN configuration parameters are:
    CONFIGURE RETENTION POLICY TO REDUNDANCY 2;
    CONFIGURE BACKUP OPTIMIZATION OFF; # default
    CONFIGURE DEFAULT DEVICE TYPE TO DISK; # default
    CONFIGURE CONTROLFILE AUTOBACKUP ON;
    CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO 'e:\backup\ctl_sp_bak_%F';
    CONFIGURE DEVICE TYPE DISK PARALLELISM 2;
    CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
    CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
    CONFIGURE CHANNEL 1 DEVICE TYPE DISK FORMAT 'e:\backup\%U.bak' MAXPIECESIZE 4G;
    CONFIGURE CHANNEL 2 DEVICE TYPE DISK FORMAT 'e:\backup\%U.bak' MAXPIECESIZE 4G;
    CONFIGURE MAXSETSIZE TO UNLIMITED; # default
    CONFIGURE SNAPSHOT CONTROLFILE NAME TO 'C:\ORACLE\ORA92\DATABASE\SNCFORCL.ORA'; # default

    RMAN>

    At this stage we can determine the locations of data files and redo logs if we don't know where they should go. This is done from SQL Plus as follows:

    C:\>sqlplus /nolog

    ...output not shown

    SQL> connect / as sysdba
    Connected.
    SQL> select name from v$datafile;

    ...output not shown

    SQL> select member from v$logfile;

    ...output not shown

    SQL>

    The directories shown in the output should be created manually if this hasn't been done earlier.

  6. Restore all datafiles: This is easy. Simply issue a "restore database" command from RMAN, and it will do all the rest for you:

    RMAN> restore database;

    Starting restore at 01/MAR/05

    using target database controlfile instead of recovery catalog
    allocated channel: ORA_DISK_1
    channel ORA_DISK_1: sid=11 devtype=DISK
    allocated channel: ORA_DISK_2
    channel ORA_DISK_2: sid=8 devtype=DISK
    channel ORA_DISK_1: starting datafile backupset restore
    channel ORA_DISK_1: specifying datafile(s) to restore from backup set
    restoring datafile 00001 to D:\ORACLE_DATA\DATAFILES\ORCL\SYSTEM01.DBF
    restoring datafile 00003 to D:\ORACLE_DATA\DATAFILES\ORCL\USERS01.DBF
    restoring datafile 00004 to D:\ORACLE_DATA\DATAFILES\ORCL\USERS02.DBF
    channel ORA_DISK_2: starting datafile backupset restore
    channel ORA_DISK_2: specifying datafile(s) to restore from backup set
    restoring datafile 00002 to D:\ORACLE_DATA\DATAFILES\ORCL\UNDOTBS01.DBF
    restoring datafile 00005 to D:\ORACLE_DATA\DATAFILES\ORCL\TOOLS01.DBF
    restoring datafile 00006 to D:\ORACLE_DATA\DATAFILES\ORCL\TOOLS02.DBF
    channel ORA_DISK_2: restored backup piece 1
    piece handle=E:\BACKUP\80G6E1TT_1_1.BAK tag=TAG20041130T222501 params=NULL
    channel ORA_DISK_1: restored backup piece 1
    piece handle=E:\BACKUP\81G6E1TU_1_1.BAK tag=TAG20041130T222501 params=NULL
    channel ORA_DISK_2: restored backup piece 2
    piece handle=E:\BACKUP\80G6E1TT_2_1.BAK tag=TAG20041130T222501 params=NULL
    channel ORA_DISK_1: restored backup piece 2
    piece handle=E:\BACKUP\81G6E1TU_2_1.BAK tag=TAG20041130T222501 params=NULL
    channel ORA_DISK_1: restored backup piece 3
    piece handle=E:\BACKUP\81G6E1TU_3_1.BAK tag=TAG20041130T222501 params=NULL
    channel ORA_DISK_1: restore complete
    channel ORA_DISK_2: restored backup piece 3
    piece handle=E:\BACKUP\80G6E1TT_3_1.BAK tag=TAG20041130T222501 params=NULL
    channel ORA_DISK_2: restore complete
    Finished restore at 01/MAR/05

    RMAN>

  7. Recover database: The final step is to recover the database. Obviously recovery is dependent on the available archived (and online) redo logs. Since we have lost our database server and have no remote archive destination, we can recover only up to the time of the backup. Further, since this is an incomplete recovery, we will have to open the database with resetlogs. Here's a sample RMAN session illustrating this:

    RMAN> recover database;

    Starting recover at 01/MAR/05
    using channel ORA_DISK_1
    using channel ORA_DISK_2

    starting media recovery

    unable to find archive log archive log thread=1 sequence=1388

    RMAN-00571: ==============================
    RMAN-00569: =ERROR MESSAGE STACK FOLLOWS =
    RMAN-00571: ===============================
    RMAN-03002: failure of recover command at 04/01/2005 14:14:43
    RMAN-06054: media recovery requesting unknown log: thread 1 scn 32230460

    RMAN> alter database open resetlogs;

    database opened

    RMAN>

    Note that RMAN automatically applies all available archive logs. It first applies the backed up log and then searches for subsequent logs in the archive destination. This opens the door for further recovery if the necessary logs are available. In our case, however, we have no more redo so we open the database with resetlogs. The error message above simply indicates that RMAN has searched, unsuccessfully, for subsequent logs.
That's it. The database has been recovered, from scratch, to the last available backup. Now having done this, it is worth spending some time in discussing how one can do better - i.e. recover up to a point beyond the backup. We do this in the next section.

7. Options for better recovery

The above recovery leaves one with a sense of dissatisfaction: one could have done much better had the necessary logs been available. Clearly, one would have to copy the logs to a remote machine in order to guarantee access in a disaster situation. A couple of ways to do this include:
  1. Copy archive logs to a remote destination using OS scripts: This is achieved simply by a script scheduled to run every hour or so. The script copies, to a remote network computer, all the archive logs generated since the script last ran. This achieves better recoverability than before. However, it does not achieve up to the minute recovery. Further, one has to ensure that the a log switch is performed before the logs are copied, so as to ensure that redo associated with recent transactions (within the last hour) is copied to the remote destination.The log switch can be performed using the "alter system archive log current" command.
  2. Configure the Oracle ARC process to copy logs to the remote destination: This is done by defining a secondary archive destination via one of the LOG_ARCHIVE_DEST_N initialisation parameters. This method is not recommended because it is somewhat fragile - see this discussion on Tom Kyte's site, for example. Be sure to use a mapped network drive as the archive destination if you choose to go down this path. Oracle 9i will not recognise archive log destinations specified using UNC (Universal Naming Convention).

Finally, any article on disaster recovery should mention Oracle Data Guard - which is Oracle Corporation's recommended disaster recovery solution for mission critical systems. This is essentially a standby database that is kept synchronised with the primary through the continuous application of redo. There are different levels of synchronisation depending on required availability, performance and (most important!) the acceptable data loss. There are two types of standby databases depending on how redo is applied: 1) Physical standby - which is an identical, block for block, copy of the primary, and 2) Logical standby - which is kept synchronised by applying SQL mined from redo logs. Interested readers are referred to the Oracle documentation on Data Guard for further details, as it is a vast subject, appropriate for a book rather than an article this size. A good starting point is Oracle Data Guard Concepts and Administration guide.

8. Concluding Remarks

This brings me to the end of the three part series on RMAN. I hope the material covered helps you plan your backup and recovery strategy. Remember this: your backup strategy should be dictated by business requirements rather than the latest and greatest technology. It is technically not hard to implement up-to-the-minute recovery, given the appropriate hardware and network bandwidth. However, unless you work for a bank or similar business, your end-users will likely accept some data loss once they hear the costs involved in up-to-the-minute recovery. If your requirements do allow for some data loss, it should be possible to implement a robust recovery strategy using the concepts and procedures discussed in this series of articles.


Back to the top    Part I    Part II