Saturday 20 October 2012

Rebuilding New Standby Oracle 11g DataGuard

Enable FSFO/TestCases 


Current Configuration


Server Name                        Database Name        DB_unique_Name
dg1.localdomain.com            DG1                           DG1_A  --Primary   
dg2.localdomain.com            DG1                           DG1_B  --Standby
dg3.localdomain.com            DG1                           DG1_C  --To be Configured   

For the DG setup, check the following links -

ORACLE11g-dataguard-unfolded
How-to-recreate-standby-redologs

What are we upto?

We are rebuilding the new standby database from the current Primary Standby which is FSFO enabled. This can be used in case of HOST migration.

Process

Let quickly check that we have a proper DG setup already-

select  SEQUENCE# ,STANDBY_DEST,COMPLETION_TIME from v$archived_log order by COMPLETION_TIME

 SEQUENCE# STA COMPLETION_TIME
---------- --- --------------------
        41 YES 11-OCT-2012 23:10:33
        41 NO  11-OCT-2012 23:10:33
        42 YES 11-OCT-2012 23:51:13
        42 NO  11-OCT-2012 23:51:13

select  SEQUENCE# ,STANDBY_DEST,COMPLETION_TIME from v$archived_log order by COMPLETION_TIME       
       
        40 NO  11-OCT-2012 23:10:31
        41 NO  11-OCT-2012 23:10:33
        42 NO  11-OCT-2012 23:51:13


Switch the log file on primary and verify the redo transport -

SQL> alter system switch logfile;

System altered.

SQL> /

System altered.

select  SEQUENCE# ,STANDBY_DEST,COMPLETION_TIME from v$archived_log order by COMPLETION_TIME

 SEQUENCE# STA COMPLETION_TIME
---------- --- --------------------
        43 YES 12-OCT-2012 00:25:39
        43 NO  12-OCT-2012 00:25:40
        44 YES 12-OCT-2012 00:25:41
        44 NO  12-OCT-2012 00:25:41

From Standby        

select  SEQUENCE# ,STANDBY_DEST,COMPLETION_TIME from v$archived_log order by COMPLETION_TIME

 SEQUENCE# STA COMPLETION_TIME
---------- --- --------------------
        43 NO  12-OCT-2012 00:25:39
        44 NO  12-OCT-2012 00:25:41

 
Looks perfect so lets get into creating a new standby Database.
I am doing the password less ssh configuration between servers which is not mandatory for the setup but always beneficial for obvious reasons. So do that (easy process)

Passwordless Ssh configuration-


Generate the RSA keys in the new machine which is dg3.localdomain.com
Use

/usr/bin/ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/opt/app/oracle/.ssh/id_rsa):
Created directory '/opt/app/oracle/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /opt/app/oracle/.ssh/id_rsa.
Your public key has been saved in /opt/app/oracle/.ssh/id_rsa.pub.
The key fingerprint is:
36:14:60:fc:c0:f2:f3:d3:9f:d7:e4:8b:27:b3:f5:7e

The key entry would be there in directory you mentioned above /opt/app/oracle/.ssh/id_rsa

cat id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAmzYYJU+qnANZQh/5Uh2CVs9SbpP+NulfbHK5z/IlQbPIJa4GOKWXB8f7z6GQIyGe5ivibHK3bTPCNEKJIJS7HWb1X+HzeVwaqYVUjDMH3bohNvjpIx8wqezOysyXOmGNi6uRy8XDiiQguRgdF/cKxGCrBen2pQecLQQujqVt5fFCiga0DuZaQik+m6sVsVDmqArTeLETBB5g/pOqcAt8CrVzT8z8T6gC4Be7IrT/gI+AQnsR1teBhbaWS+j+PZv9Ced/mH/neM3nL7ZknXtKaUpjYhIuvMKHta97unxG6JH9DXV+OM4QnNfU7KG3+Ql8peF2KkXOXJMSxOARESZUHw== oracle@****.desktop.localdomain.com

Add this entry in the other server's authorized_keys file (you have to create the file first time)
Good to go ..check you are able to access the servers without the password. Same procedure vice versa.
Note: DO this using 'oracle' login or with the account where you want the ssh to be passwordless

Check the File locations from Primary

SQL> select db_unique_name, protection_mode from v$database;

DB_UNIQUE_NAME                 PROTECTION_MODE
------------------------------ --------------------
dg1_b                          MAXIMUM AVAILABILITY

SQL> select name from v$datafile;

NAME
--------------------------------------------------------------------------------
/opt/app/oracle/11.2.0.2/A13db/oradata/dg1/system01.dbf
/opt/app/oracle/11.2.0.2/A13db/oradata/dg1/sysaux01.dbf
/opt/app/oracle/11.2.0.2/A13db/oradata/dg1/undotbs01.dbf
/opt/app/oracle/11.2.0.2/A13db/oradata/dg1/AUBX_DATA_01.dbf
/opt/app/oracle/11.2.0.2/A13db/oradata/dg1/AUBX_IDX_01.dbf
/opt/app/oracle/11.2.0.2/A13db/oradata/dg1/users01.dbf


SQL> select name from v$controlfile;

NAME
--------------------------------------------------------------------------------
/opt/app/oracle/11.2.0.2/A13db/oradata/dg1/dg1_control01.ctl
/opt/app/oracle/11.2.0.2/A13db/oradata/dg1/dg1_control02.ctl
/opt/app/oracle/11.2.0.2/A13db/oradata/dg1/dg1_control03.ctl


SQL> select member from v$logfile;

MEMBER
--------------------------------------------------------------------------------
/opt/app/oracle/11.2.0.2/A13db/oradata/dg1/dg1_log1a.dbf
/opt/app/oracle/11.2.0.2/A13db/oradata/dg1/dg1_log2a.dbf
/opt/app/oracle/11.2.0.2/A13db/oradata/dg1/dg1_log3a.dbf
/opt/app/oracle/11.2.0.2/A13db/oradata/dg1/dg1_standby_redo01.log
/opt/app/oracle/11.2.0.2/A13db/oradata/dg1/dg1_standby_redo02.log
/opt/app/oracle/11.2.0.2/A13db/oradata/dg1/dg1_standby_redo03.log
/opt/app/oracle/11.2.0.2/A13db/oradata/dg1/dg1_standby_redo04.log


NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
spfile                               string      /opt/app/oracle/11.2.0.2/A13db
                                                 /dbs/spfiledg1.ora
       
Create the directories in the dg3 server-
dg3.localdomain$ mkdir -p /opt/app/oracle/11.2.0.2/A13db/oradata/dg1/
dg3.localdomain$ mkdir -p /opt/app/oracle/11.2.0.2/A13db/dbs/
dg3.localdomain$ mkdir -p /opt/app/oracle/11.2.0.2/A13db/network/admin/dg1/diag/rdbms/dg1_b/dg1/trace
dg3.localdomain$ mkdir -p /opt/app/oracle/11.2.0.2/A13db/network/admin/dg1/diag/rdbms/dg1_b/dg1/cdump
dg3.localdomain$ mkdir -p /opt/app/oracle/11.2.0.2/A13db/oradata/dg1/FRA
dg3.localdomain$ mkdir -p /opt/app/oracle/11.2.0.2/A13db/network/admin/dg1
dg3.localdomain$ mkdir -p /opt/app/oracle/11.2.0.2/A13db/oradata/dg1/arch


*.db_recovery_file_dest='/opt/app/oracle/11.2.0.2/A13db/oradata/dg1/FRA'
*.standby_archive_dest='/opt/app/oracle/11.2.0.2/A13db/oradata/dg1/arch'
*.diagnostic_dest='/opt/app/oracle/11.2.0.2/A13db/network/admin/dg1'

Shutdown the standby database -

SQL> shut immediate
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.

Start the File transfer, following files are needed to be copied and transferred after the standby shutdown-
DATAFILE,
REDOLOG FILE (can be recreated also), CONTROLFILE, SPFILE, DATAGUARD BROKER FILE, PASSWORD FILE, LISTENER , TNSNAMES,

dg2.localdomain$ cd /opt/app/oracle/11.2.0.2/A13db/oradata/dg1/
dg2.localdomain$

scp * oracle@172.27.154.66:/opt/app/oracle/11.2.0.2/A13db/oradata/dg1/
scp spfiledg1.ora dr2dg1_b.dat dr1dg1_b.dat orapwdg1 oracle@172.27.154.66:/opt/app/oracle/11.2.0.2/A13db/dbs/
scp tnsnames.ora listener.ora oracle@172.27.154.66:/opt/app/oracle/11.2.0.2/A13db/network/admin/

dg2.localdomain$ pwd
/opt/app/oracle/11.2.0.2/A13db/dbs
dg2.localdomain$
dg2.localdomain$
dg2.localdomain$ scp spfiledg1.ora dr2dg1_b.dat dr1dg1_b.dat orapwdg1 oracle@172.27.154.66:/opt/app/oracle/11.2.0.2/A13db/dbs/

Now we will do some transactions in Primary database to make it out of sync with the standby.

SQL> create user test identified by test;

User created.

SQL> grant dba to test;

Grant succeeded.

SQL> create table aa as select * from all_objects;

Table created.

SQL> insert into aa  select * from aa;

12679 rows created.
....keep on doing the same for large data entry

Check the current log sequence
SQL> conn / as sysdba
Connected.
SQL>
OR
select  SEQUENCE# ,STANDBY_DEST,COMPLETION_TIME from v$archived_log order by COMPLETION_TIME

244 NO  12-OCT-2012 04:47:38
245 NO  12-OCT-2012 04:47:53

--As the standby is down you can see there are no duplicates for the sequence#

33710 rows selected.

SQL> archive log list
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence     245
Next log sequence to archive   247
Current log sequence           247



dg3.localdomain$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.2.0 Production on Fri Oct 12 03:59:40 2012

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

Connected to an idle instance.

SQL>
SQL> startup
ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance
ORA-00119: invalid specification for system parameter LOCAL_LISTENER
ORA-00130: invalid listener address '(ADDRESS=(PROTOCOL=TCP)(HOST=dg3.localdomain.com)(PORT=1521))'
SQL> exit


To resolve the following error add the hostname entry in the /etc/hosts
from the new standby server dg3 start the oracle and check the last log sequence
SQL> startup mount
ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance
ORACLE instance started.

Total System Global Area  271437824 bytes
Fixed Size                  2225584 bytes
Variable Size             213912144 bytes
Database Buffers           50331648 bytes
Redo Buffers                4968448 bytes
Database mounted.

select  SEQUENCE# ,STANDBY_DEST,COMPLETION_TIME from v$archived_log order by COMPLETION_TIME
 SEQUENCE# STA COMPLETION_TIME
---------- --- --------------------
        49 NO  12-OCT-2012 03:07:22
        50 NO  12-OCT-2012 03:07:25


Check the DGMGRL configuration from the primary

DGMGRL> connect sys/system
Connected.
DGMGRL>
DGMGRL>
DGMGRL> show configuration

Configuration - dg1_conf

  Protection Mode: MaxAvailability
  Databases:
    dg1_a - Primary database
      Error: ORA-16825: multiple errors or warnings, including fast-start failover-related errors or warnings, detected for the database

    dg1_b - (*) Physical standby database
      Error: ORA-01034: ORACLE not available

Fast-Start Failover: ENABLED

Configuration Status:
ERROR

Update tnsnames in primary to change the hostname to dg3

dg1_b =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = dg3.localdomain.com)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = dg1)
    )
  )

Make the changes in the tnsnames, listener for dg3 server (hostname needs to be modified )

DGMGRL> disable FAST_START FAILOVER;
Error: ORA-12545: Connect failed because target host or object does not exist

Failed.

DGMGRL> disable configuration;
Error: ORA-16654: fast-start failover is enabled

 
Oops I am not able to turn off fast start failover, we Should have done that before shutting down the standby.
I am going to start the old standby and do the changes and then stop it once again.

DGMGRL> disable FAST_START FAILOVER;
DGMGRL> disable configuration;
       

SQL> alter database recover managed standby database disconnect from session;

Database altered.

Logs are not applied to dg3, errors in the alert logs are as follows

Errors in file /opt/app/oracle/11.2.0.2/A13db/network/admin/dg1/diag/rdbms/dg1_b/dg1/trace/dg1_rvwr_8362.trc:
ORA-38701: Flashback database log 3 seq 210 thread 1: "/opt/app/oracle/11.2.0.2/A13db/oradata/dg1/FRA/DG1_B/flashback/o1_mf_85fxk962_.flb"
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory


So  as message states some of the archive-logs are missing, check whats the difference in SCN for the primary and standby database

SQL> select current_scn from v$database;

CURRENT_SCN
-----------
    4037233

You have to check this query from the Primary database as standby is in mount state, run the same in Primary with the standby SCN to note the time difference-
SQL>  select scn_to_timestamp(4037233) from dual;

SCN_TO_TIMESTAMP(4037233)
---------------------------------------------------------------------------
15-OCT-12 12.37.14.000000000 AM

From Standby Server

SQL> select current_scn from v$database;

CURRENT_SCN
-----------
    3758124

 SQL> select scn_to_timestamp(3758124) from dual;  


SCN_TO_TIMESTAMP(3758124)
---------------------------------------------------------------------------
12-OCT-12 03.18.18.000000000 AM

Have to resolve this GAP, we can do that by taking the incremental backups from Primary and applying the same to Standby. So login to Primary DB and run the following -       
dg1.localdomain$ rman target /

Recovery Manager: Release 11.2.0.2.0 - Production on Mon Oct 15 00:42:00 2012

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

connected to target database: DG1 (DBID=1774279666)
RMAN> run {
2> allocate channel c1 type disk format '/opt/app/oracle/%U.rmb';
3> backup incremental from scn 3758124 database;
4> }
.........

Create the Standby database control file from the Primary and transfer the control file and incremental backup from Primary to standby server-

alter database create standby controlfile as '/opt/app/oracle/Standby_control.ctl';

scp 0innpa51_1_1.rmb 0jnnpa8p_1_1.rmb Standby_control.ctl oracle@172.27.154.66:/opt/app/oracle/

dg3.localdomain$ cp Standby_control.ctl /opt/app/oracle/11.2.0.2/A13db/oradata/dg1/dg1_control01.ctl
dg3.localdomain$ cp Standby_control.ctl /opt/app/oracle/11.2.0.2/A13db/oradata/dg1/dg1_control02.ctl
dg3.localdomain$ cp Standby_control.ctl /opt/app/oracle/11.2.0.2/A13db/oradata/dg1/dg1_control03.ctl
dg3.localdomain$
dg3.localdomain$
dg3.localdomain$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.2.0 Production on Mon Oct 15 01:25:41 2012

Copyright (c) 1982, 2010, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> alter database mount standby database;

Database altered.

exit

keep the backups in a seperate directory, catalog the same with the standby rman and recover the database.
dg3.localdomain$ rman target /

Recovery Manager: Release 11.2.0.2.0 - Production on Mon Oct 15 01:27:01 2012

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

connected to target database: DG1 (DBID=1774279666, not open)

RMAN> catalog start with '/opt/app/oracle/backup_from_prim';

RMAN> recover database
2> ;

Starting recover at 15-OCT-12
using channel ORA_DISK_1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 10/15/2012 01:29:47
RMAN-06094: datafile 7 must be restored

RMAN>

We Have added one datafile to the system tablespace after restoring the standby database , So need to restore the same first.

RMAN> restore datafile 7;

Starting restore at 15-OCT-12
using channel ORA_DISK_1

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00007 to /opt/app/oracle/11.2.0.2/A13db/oradata/dg1/system02.dbf
channel ORA_DISK_1: reading from backup piece /opt/app/oracle/backup_from_prim/0innpa51_1_1.rmb
channel ORA_DISK_1: piece handle=/opt/app/oracle/backup_from_prim/0innpa51_1_1.rmb tag=TAG20121015T011304
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:25
Finished restore at 15-OCT-12

RMAN>

RMAN> recover database
2> ;

Starting recover at 15-OCT-12
using channel ORA_DISK_1
channel ORA_DISK_1: starting incremental datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
destination for restore of datafile 00001: /opt/app/oracle/11.2.0.2/A13db/oradata/dg1/system01.dbf
destination for restore of datafile 00002: /opt/app/oracle/11.2.0.2/A13db/oradata/dg1/sysaux01.dbf
destination for restore of datafile 00003: /opt/app/oracle/11.2.0.2/A13db/oradata/dg1/undotbs01.dbf
destination for restore of datafile 00004: /opt/app/oracle/11.2.0.2/A13db/oradata/dg1/AUBX_DATA_01.dbf
destination for restore of datafile 00005: /opt/app/oracle/11.2.0.2/A13db/oradata/dg1/AUBX_IDX_01.dbf
destination for restore of datafile 00006: /opt/app/oracle/11.2.0.2/A13db/oradata/dg1/users01.dbf
channel ORA_DISK_1: reading from backup piece /opt/app/oracle/backup_from_prim/0innpa51_1_1.rmb
channel ORA_DISK_1: piece handle=/opt/app/oracle/backup_from_prim/0innpa51_1_1.rmb tag=TAG20121015T011304
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:35

starting media recovery

unable to find archived log
archived log thread=1 sequence=339
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 10/15/2012 01:43:51
RMAN-06054: media recovery requesting unknown archived log for thread 1 with sequence 339 and starting SCN of 4039532

The above error is fine as we have not applied all the archive logs yet, Lets start the managed recovery and see if this catches up with Primary DB


SQL> alter database recover managed standby database disconnect from session;

Database altered.

SQL> select  SEQUENCE# ,STANDBY_DEST,COMPLETION_TIME from v$archived_log order by COMPLETION_TIME;

no rows selected

/
no rows selected


SQL> archive log list
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence     337
Next log sequence to archive   339
Current log sequence           339

Doing log switch on primary is not transporting the redo properly as we cant see new sequence#. Last one was 339 which is been applied through the incremental backups.
Checking the primary DB alert log, Ahh ..here is the culprit-

TNS-12545: Connect failed because target host or object does not exist
    ns secondary err code: 12560
    nt main err code: 515

I have not made the entry of the dg3 in the /etc/hosts file of the primary server yet. Making the same entry, lets c now -

cat /etc/hosts

172.27.152.123          dg1.localdomain.com dg1
172.27.154.139          dg2.localdomain.com dg2
172.27.154.66           dg3.localdomain.com dg3  --> made this entry was missing


SQL> select  SEQUENCE# ,STANDBY_DEST,COMPLETION_TIME from v$archived_log order by COMPLETION_TIME;

 SEQUENCE# STA COMPLETION_TIME
---------- --- --------------------
       341 NO  15-OCT-2012 02:00:29
       340 NO  15-OCT-2012 02:00:29
       342 NO  15-OCT-2012 02:00:33
       339 NO  15-OCT-2012 02:00:33
       343 NO  15-OCT-2012 02:00:34

Bingo :) Standby is upto date with Primary now. let do the broker configuration once again

DGMGRL> connect sys/system
Connected.
DGMGRL>
DGMGRL>
DGMGRL>
DGMGRL> show configuration

Configuration - dg1_conf

  Protection Mode: MaxAvailability
  Databases:
    dg1_a - Primary database
    dg1_b - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
DISABLED

DGMGRL> enable configuration;
Enabled.
DGMGRL>
DGMGRL>
DGMGRL>
DGMGRL>  show configuration

Configuration - dg1_conf

  Protection Mode: MaxAvailability
  Databases:
    dg1_a - Primary database
    dg1_b - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS


DGMGRL> show database verbose dg1_b

Database - dg1_b

  Role:            PHYSICAL STANDBY
  Intended State:  APPLY-ON
  Transport Lag:   0 seconds
  Apply Lag:       0 seconds
  Real Time Query: OFF
  Instance(s):
    dg1

  Properties:
    DGConnectIdentifier             = 'dg1_b'
    ObserverConnectIdentifier       = ''
    LogXptMode                      = 'SYNC'
    DelayMins                       = '0'
    Binding                         = 'OPTIONAL'
    MaxFailure                      = '0'
    MaxConnections                  = '1'
    ReopenSecs                      = '300'
    NetTimeout                      = '10'
    RedoCompression                 = 'DISABLE'
    LogShipping                     = 'ON'
    PreferredApplyInstance          = ''
    ApplyInstanceTimeout            = '0'
    ApplyParallel                   = 'AUTO'
    StandbyFileManagement           = 'AUTO'
    ArchiveLagTarget                = '0'
    LogArchiveMaxProcesses          = '30'
    LogArchiveMinSucceedDest        = '1'
    DbFileNameConvert               = 'dg1_b, dg1_a'
    LogFileNameConvert              = 'dg1_b, dg1_a'
    FastStartFailoverTarget         = 'dg1_a'
    InconsistentProperties          = '(monitor)'
    InconsistentLogXptProps         = '(monitor)'
    SendQEntries                    = '(monitor)'
    LogXptStatus                    = '(monitor)'
    RecvQEntries                    = '(monitor)'
    SidName                         = 'dg1'
    StaticConnectIdentifier         = '(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=dg3.localdomain.com)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=dg1_b_DGMGRL.localdomain.com)(INSTANCE_NAME=dg1)(SERVER=DEDICATED)))'
    StandbyArchiveLocation          = 'USE_DB_RECOVERY_FILE_DEST'
    AlternateLocation               = ''
    LogArchiveTrace                 = '0'
    LogArchiveFormat                = '%t_%s_%r.arc'
    TopWaitEvents                   = '(monitor)'

Database Status:
SUCCESS

All looks good, lets try the switchover and see if this is smooth-
check the switchover using DGMGRL

DGMGRL> switchover to dg1_b
Performing switchover NOW, please wait...
New primary database "dg1_b" is opening...
Operation requires shutdown of instance "dg1" on database "dg1_a"
Shutting down instance "dg1"...
ORACLE instance shut down.
Operation requires startup of instance "dg1" on database "dg1_a"
Starting instance "dg1"...
ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance
ORACLE instance started.
Database mounted.
Switchover succeeded, new primary is "dg1_b"

So far so good :) lets switch it back and try to enable the fast start fail over

Lets enable the FSFO now -

DGMGRL> ENABLE FAST_START FAILOVER;
Error: ORA-16651: requirements not met for enabling fast-start failover

:-) One more error (if u are wondering why a happy smile so a good DBA "minimize errors on Prod..welcome them on test" )       

Things to check before doing FSFO
  • The protection mode should be either in max availability or max protection mode
  • logxptmode ( log transport mode) for both primary and standby should be 'SYNC'
  • Both primary and standby should be in flash back enabled mode.
  • FastStartFailoverTarget if you have more than one standby database, not valid for our case.

In our case the issue is with flashback, its not enabled for the standby database. Enabling the same

SQL> select flashback_on from  v$database;

FLASHBACK_ON
------------------
NO

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;

Database altered.

SQL> alter database flashback on;

Database altered.

SQL>  select flashback_on from  v$database;

FLASHBACK_ON
------------------
YES

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;

Database altered.


DGMGRL> enable fast_start failover;
Enabled.
DGMGRL>

start the observer-

DGMGRL> start observer
Observer started

DGMGRL>
DGMGRL>  show configuration;

Configuration - dg1_conf

  Protection Mode: MaxAvailability
  Databases:
    dg1_a - Primary database
    dg1_b - (*) Physical standby database
      Warning: ORA-16826: apply service state is inconsistent with the DelayMins property

Fast-Start Failover: ENABLED

Configuration Status:
WARNING

One more error and a very common one -

DGMGRL> show database verbose dg1_b

Database - dg1_b

  Role:            PHYSICAL STANDBY
  Intended State:  APPLY-ON
  Transport Lag:   0 seconds
  Apply Lag:       2 minutes 52 seconds
  Real Time Query: OFF
  Instance(s):
    dg1

  Database Warning(s):
    ORA-16826: apply service state is inconsistent with the DelayMins property

  Properties:
    DGConnectIdentifier             = 'dg1_b'
    ObserverConnectIdentifier       = ''
    LogXptMode                      = 'SYNC'
    DelayMins                       = '0'
    Binding                         = 'OPTIONAL'
    MaxFailure                      = '0'

Why this Error??

Redo Apply is running without USING CURRENT LOGFILE option while DelayMins=0
That means broker property DelayMins is set to 0, but you didn't start recover managed standby database as real apply service.
So DelayMins=0 cannot be achieved. we have to start the apply from the current redo log files

SQL> alter database recover managed standby database cancel;

Database altered.

SQL>

SQL>
SQL>  ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT FROM SESSION;

Database altered.

Now check the status of the verbose

DGMGRL> show database verbose dg1_b

Database - dg1_b

  Role:            PHYSICAL STANDBY
  Intended State:  APPLY-ON
  Transport Lag:   0 seconds
  Apply Lag:       0 seconds
  Real Time Query: OFF
  Instance(s):
    dg1

  Properties:
    DGConnectIdentifier             = 'dg1_b'
    ObserverConnectIdentifier       = ''
    LogXptMode                      = 'SYNC'
    DelayMins                       = '0'

Now Its Fine. see the
Apply Lag:       0 seconds

DGMGRL> show configuration

Configuration - dg1_conf

  Protection Mode: MaxAvailability
  Databases:
    dg1_a - Primary database
    dg1_b - (*) Physical standby database

Fast-Start Failover: ENABLED

Configuration Status:
SUCCESS

SQL>  select DB_UNIQUE_NAME,DATABASE_ROLE,CURRENT_SCN,OPEN_MODE,FS_FAILOVER_STATUS,FS_FAILOVER_CURRENT_TARGET FSFO_CURR_TARGET from v$database;
DB_UNIQUE_NAME                 DATABASE_ROLE    CURRENT_SCN OPEN_MODE            FS_FAILOVER_STATUS     FSFO_CURR_TARGET
------------------------------ ---------------- ----------- -------------------- ---------------------- ------------------------------
DG1_A                          PRIMARY              4089498 READ WRITE           SYNCHRONIZED           dg1_b


Standby-

SQL>  select DB_UNIQUE_NAME,DATABASE_ROLE,CURRENT_SCN,OPEN_MODE,FS_FAILOVER_STATUS,FS_FAILOVER_CURRENT_TARGET FSFO_CURR_TARGET from v$database;

DB_UNIQUE_NAME                 DATABASE_ROLE    CURRENT_SCN OPEN_MODE            FS_FAILOVER_STATUS     FSFO_CURR_TARGET
------------------------------ ---------------- ----------- -------------------- ---------------------- ------------------------------
dg1_b                          PHYSICAL STANDBY     4088545 MOUNTED              SYNCHRONIZED           dg1_b

Test the Failover process

Shutdown the primary database and lets check if standby takes it place

SQL> select db_unique_name from v$database;

DB_UNIQUE_NAME
------------------------------
DG1_A

SQL>
SQL>
SQL> shut abort
ORACLE instance shut down.


Verifying the standby database alert logs -

RFS[3]: Possible network disconnect with primary database
Mon Oct 15 03:34:01 2012
Attempting Fast-Start Failover because the threshold of 30 seconds has elapsed.
Mon Oct 15 03:34:01 2012
Data Guard Broker: Beginning failover
Mon Oct 15 03:34:01 2012
..
.....
.......
Failover succeeded. Primary database is now dg1_b.


Login to Sql plus and check the same -

SQL> select DB_UNIQUE_NAME,DATABASE_ROLE,CURRENT_SCN,OPEN_MODE,FS_FAILOVER_STATUS,FS_FAILOVER_CURRENT_TARGET FSFO_CURR_TARGET from v$database;

DB_UNIQUE_NAME                 DATABASE_ROLE    CURRENT_SCN OPEN_MODE            FS_FAILOVER_STATUS     FSFO_CURR_TARGET
------------------------------ ---------------- ----------- -------------------- ---------------------- ------------------------------
dg1_b                          PRIMARY              4090200 READ WRITE           REINSTATE REQUIRED     dg1_a

As you can see dg1_b is the new PRIMARY and FSFO target is dg1_a.
But "REINSTATE REQUIRED" what needs to be done fot this, nothing just start the old primary (new standby) in the mount state

SQL> startup mount
ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance
ORACLE instance started.

Total System Global Area  835104768 bytes
Fixed Size                  2231088 bytes
Variable Size             536872144 bytes
Database Buffers          289406976 bytes
Redo Buffers                6594560 bytes
Database mounted.

Check the query once again

SQL>  select DB_UNIQUE_NAME,DATABASE_ROLE,CURRENT_SCN,OPEN_MODE,FS_FAILOVER_STATUS,FS_FAILOVER_CURRENT_TARGET FSFO_CURR_TARGET from v$database;

DB_UNIQUE_NAME                 DATABASE_ROLE    CURRENT_SCN OPEN_MODE            FS_FAILOVER_STATUS     FSFO_CURR_TARGET
------------------------------ ---------------- ----------- -------------------- ---------------------- ------------------------------
dg1_b                          PRIMARY              4090321 READ WRITE           SYNCHRONIZED           dg1_a


So thats all guys..if I make it any more somebody is gonna find me and beat me to death so lets wrap up.Will take some other things in my next blog...till thn bbye!!!



Wednesday 3 October 2012

How to ReCreate the standby Redologs in DataGuard



Q. Why someone want to change the size of standby redo log?

Incase you wanted to change the size of the primary redo logs as a result you need to change the standby redolog size or you have created the standby redologs with different sizes than the primary redo logs (by mistake).

Q. Why you need standby redologs?

All the changes in the priamry redo is propagated to the standby site and written to standby redologs so you have minimal dataloss in case of failover. From 10g onwards you can have real time apply which means you can apply the changes to standby database using the standby redologs (need not to wait for the archived log to be created and then apply).
 
Q. What you must have for using the standby redologs?

You must create the standby redologs with size same as the primary redologs. RFS process doesn't use the standby redologs if you dont have them sized same as the primary redo.
No Standby redolog -- RFS make use of Archived logs (only fully written not open or currently writing) ---Apply this to the Standby done by MRP process--Incase of failure a certain data loss as it applies only the fully archived logs to the standby
Standby redolog -- RFS will write to standby redolog--on log switch these will be archived while MRP apply these changes to the Standby Database (REAL TIME APPLY). So all info is available with Standby redo logs minimizing the dataloss.
 
Q. Will switchover, data shipping happen properly when RFS not using the standby redologs?

Yes, most of the things works without standby redologs, the only benefit would be that you can minimize dataloss using the standby redolog. See my following blog, this setup is with the different sizes for primary and standby logs yet switchover , data transfer happen to be as good as it should be. No issues what so ever with the DG setup, configuration and functionality other than the one I mentioned (which is a huge factor as we use DG setup for highly available system and I don't think we can afford to have dataloss so use of standby redo logs is  mandatory)
http://abhinavsarin.blogspot.com/2012/09/oracle-11g-dataguard-unfolded.html

Q. Whats not working when RFS is not using the standby redologs ?

FSFO - Fast start Fail Over
As there is a gap in applying the redologs from the archive logs to the standby database you can't enable the FSFO. The configuration will throw some error regarding the GAP, before using the standby redologs lets see the standby database verbose -


DGMGRL>  show database verbose dg1_b
Database - dg1_b
  Role:            PHYSICAL STANDBY
  Intended State:  APPLY-ON
  Transport Lag:   4 minutes 30 seconds
  Apply Lag:       4 minutes 30 seconds
  Real Time Query: OFF
  Instance(s):
    dg1

  Properties:
    DGConnectIdentifier             = 'dg1_b'
    ObserverConnectIdentifier       = ''
    LogXptMode                      = 'ASYNC'
    DelayMins                       = '0'
    Binding                         = 'OPTIONAL'
    MaxFailure                      = '0'
    MaxConnections                  = '1'
    ReopenSecs                      = '300'
    NetTimeout                      = '30'
    RedoCompression                 = 'DISABLE'
    LogShipping                     = 'ON'
    PreferredApplyInstance          = ''
    ApplyInstanceTimeout            = '0'
    ApplyParallel                   = 'AUTO'
    StandbyFileManagement           = 'AUTO'
    ArchiveLagTarget                = '0'
    LogArchiveMaxProcesses          = '30'
    LogArchiveMinSucceedDest        = '1'
    DbFileNameConvert               = 'dg1_b, dg1_a'
    LogFileNameConvert              = 'dg1_b, dg1_a'
    FastStartFailoverTarget         = ''
    InconsistentProperties          = '(monitor)'
    InconsistentLogXptProps         = '(monitor)'
    SendQEntries                    = '(monitor)'
    LogXptStatus                    = '(monitor)'
    RecvQEntries                    = '(monitor)'
    SidName                         = 'dg1'
    StaticConnectIdentifier         = '(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=dg2.amazon.com)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=dg1_b_DGMGRL.amazon.com)(INSTANCE_NAME=dg1)(SERVER=DEDICATED)))'
    StandbyArchiveLocation          = 'USE_DB_RECOVERY_FILE_DEST'
    AlternateLocation               = ''
    LogArchiveTrace                 = '0'
    LogArchiveFormat                = '%t_%s_%r.arc'
    TopWaitEvents                   = '(monitor)'

Database Status:
SUCCESS



Test Case

I have a primary and standby with primary redo log sizes as 100mb and standby redolog sizes as 50mb.

SQL> select GROUP#, BYTES/1024/1024 , STATUS,MEMBERS from  v$log;
    GROUP# BYTES/1024/1024 STATUS              MEMBERS
---------- --------------- ---------------- ----------
         1             100 CURRENT                   1
         3             100 CLEARING                  1
         2             100 CLEARING                  1


SQL> select GROUP#,BYTES/1024/1024/1024 , STATUS, FIRST_TIME,NEXT_TIME from v$standby_log;
    GROUP# BYTES/1024/1024/1024 STATUS     FIRST_TIME           NEXT_TIME
---------- -------------------- ---------- -------------------- --------------------
         4           .048828125 UNASSIGNED
         5           .048828125 UNASSIGNED
         6           .048828125 UNASSIGNED
         7           .048828125 UNASSIGNED



As you can see there is no standby redolog which is active all unassigned this implies that RFS is not attaching the standby redo logs, can see the different in sizes as well.
Now let's change the size of the standby redologs and see if the RFS (Remote File Server) use the same -

Step By Step Re Creating the Standby Redologs

1. Stop the log transport from the primary.

SQL> alter system set log_archive_dest_state_2 = defer scope = memory;
System altered.

2. Stop the recover from the standby databas.

SQL> alter database recover managed standby database cancel;
Database altered.

3. Verify that you are not getting the data in standby by doing the logswitch in primry.

SQL> select thread#,max(sequence#) from v$archived_log group by thread#;

4. Drop the standby redologs in the standby and primary (for the switchover purpose).

SQL>  ALTER DATABASE DROP STANDBY LOGFILE GROUP 4;
Database altered.
SQL>  ALTER DATABASE DROP STANDBY LOGFILE GROUP 5;
Database altered.
SQL> ALTER DATABASE DROP STANDBY LOGFILE GROUP 6;
Database altered.
SQL> ALTER DATABASE DROP STANDBY LOGFILE GROUP 7;
Database altered.
same in primary.

5. Recreate the stnadby redologs.

SQL>  alter database add standby logfile THREAD 1 group 4 ('/opt/app/oracle/11.2.0.2/A13db/oradata/dg1/dg1_standby_redo01.log') size 100m;
Database altered.
SQL> alter database add standby logfile THREAD 1 group 5 ('/opt/app/oracle/11.2.0.2/A13db/oradata/dg1/dg1_standby_redo02.log') size 100m;
Database altered.
SQL>  alter database add standby logfile THREAD 1 group 6 ('/opt/app/oracle/11.2.0.2/A13db/oradata/dg1/dg1_standby_redo03.log') size 100m;
Database altered.
SQL> alter database add standby logfile THREAD 1 group 7
  2  ('/opt/app/oracle/11.2.0.2/A13db/oradata/dg1/dg1_standby_redo04.log') size 100m;


6. Enable the log transport from the primary database.

SQL> alter system set log_archive_dest_state_2 = enable scope = both;
System altered.
7. Start the managed recovery from the standby database.

SQL> alter database recover managed standby database using current logfile disconnect;
Database altered.

8. Check the status of the standby redologs

SQL>  select GROUP#,BYTES/1024/1024/1024 , STATUS, FIRST_TIME,NEXT_TIME from v$standby_log;
    GROUP# BYTES/1024/1024/1024 STATUS     FIRST_TIME           NEXT_TIME
---------- -------------------- ---------- -------------------- --------------------
         4            .09765625 ACTIVE     03-OCT-2012 02:45:11 03-OCT-2012 02:45:49
         5            .09765625 UNASSIGNED
         6            .09765625 UNASSIGNED
         7            .09765625 UNASSIGNED

SQL> /
    GROUP# BYTES/1024/1024/1024 STATUS     FIRST_TIME           NEXT_TIME
---------- -------------------- ---------- -------------------- --------------------
         4            .09765625 ACTIVE     03-OCT-2012 02:45:11 03-OCT-2012 02:49:09
         5            .09765625 UNASSIGNED
         6            .09765625 UNASSIGNED
         7            .09765625 UNASSIGNED


Its active , so RFS attached the standby redologs now. Let's verify the verbose once again to see if the gap has been resolved-

DGMGRL>  show database verbose dg1_b
Database - dg1_b
  Role:            PHYSICAL STANDBY
  Intended State:  APPLY-ON
  Transport Lag:   0 seconds
  Apply Lag:       0 seconds

  Real Time Query: OFF
  Instance(s):
    dg1

  Properties:
    DGConnectIdentifier             = 'dg1_b'
    ObserverConnectIdentifier       = ''
    LogXptMode                      = 'ASYNC'
    DelayMins                       = '0'
    Binding                         = 'OPTIONAL'
    MaxFailure                      = '0'
    MaxConnections                  = '1'
    ReopenSecs                      = '300'
    NetTimeout                      = '10'
    RedoCompression                 = 'DISABLE'
    LogShipping                     = 'ON'
    PreferredApplyInstance          = ''
    ApplyInstanceTimeout            = '0'
    ApplyParallel                   = 'AUTO'
    StandbyFileManagement           = 'AUTO'
    ArchiveLagTarget                = '0'
    LogArchiveMaxProcesses          = '30'
    LogArchiveMinSucceedDest        = '1'
    DbFileNameConvert               = 'dg1_b, dg1_a'
    LogFileNameConvert              = 'dg1_b, dg1_a'
    FastStartFailoverTarget         = 'dg1_a'
    InconsistentProperties          = '(monitor)'
    InconsistentLogXptProps         = '(monitor)'
    SendQEntries                    = '(monitor)'
    LogXptStatus                    = '(monitor)'
    RecvQEntries                    = '(monitor)'
    SidName                         = 'dg1'
    StaticConnectIdentifier         = '(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=dg2.amazon.com)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=dg1_b_DGMGRL.amazon.com)(INSTANCE_NAME=dg1)(SERVER=DEDICATED)))'
    StandbyArchiveLocation          = 'USE_DB_RECOVERY_FILE_DEST'
    AlternateLocation               = ''
    LogArchiveTrace                 = '0'
    LogArchiveFormat                = '%t_%s_%r.arc'
    TopWaitEvents                   = '(monitor)'

Database Status:
SUCCESS


So all good, no GAP as its using the standby redologs to apply the changes to the standby database.

Will have another post for FSFO sometime soon.