Sunday, August 10, 2008

Fast-Start Failover - It is reliable

Last Friday there was one post on OTN forums that brought my attention. The OP was wondering “Is Data Guard Buggy” with attention to Fast-Start Failover (FSFO) feature that provides an ability of automatic failover to the standby database if the primary database is not available for certain time.

He had some concerns about FSFO being unreliable, very difficult to be implemented and actually doesn't work properly.

The OP got some immediate response from the OTN users. I couldn't agree more with Joseph Meeks's comment on the topic.

In my opinion, FSFO is very nice feature that plays big role in Oracle's Maximum Availability Architecture. There might be some valid reasons not to implemented it, but if the automatic failover is a requirement, FSFO is the way to go. Should one have any problems implementing it, the best way to go is to get Oracle Support involved.

In this post, I'd like to show that implementation of the FSFO should not be a difficult task once you have the Data Guard environment set up.

I configured data guard physical standby environment on my laptop. Due to hardware limitations, I'll have the observer running on the same machine with the standby databases. Keep in mind this is just an example. In practice, the primary database, the standby database and the observer should run on different hosts. This example also answers one of the questions OP asked: Will it be possible to set it up on one machine? The answer would be it is possible, as shown in the example :-), but it is not the right way to go.

I use DGMGRL utility in the example..

So, I configured a Data Guard environment where DB1.MYDOMAIN.COM is primary database and STDB.MYDOMAIN.COM is physical standby. Both databases are 10.2.x
By default, protection mode of the Data Guard configuration is set to MAXIMUM PERFORMANCE

DGMGRL> connect sys/***@db1
Connected.
DGMGRL> show configuration

Configuration
Name: DRTest
Enabled: YES
Protection Mode: MaxPerformance
Fast-Start Failover: DISABLED
Databases:
db1 - Primary database
stdb - Physical standby database

Current status for "DRTest":
SUCCESS


Enabling Fast-Start Failover requires the following pre-requisites to be met:
Flashback Database feature is enabled on both the primary and the standby database.
The protection mode of the configuration must be set to MAXIMUM AVAILABILITY
tnsnames.ora in the ORACLE_HOME where the observer runs must be set to see both databases, the primary and the standby.
DGMGRL must be available on the observer host.

Enable Flashback Database
I will enable flashback database feature on both databases. This assumes that the flash recovery area is configured.
The flashback database feature provides an ability for an easy reinstatement of the failed primary database to new standby database.

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL>

SQL> startup mount
ORACLE instance started.

Total System Global Area 285212672 bytes
Fixed Size 1261348 bytes
Variable Size 100663516 bytes
Database Buffers 176160768 bytes
Redo Buffers 7127040 bytes
Database mounted.
SQL> alter database flashback on;

Database altered.

SQL> alter database open;

Database altered.


Set the protection mode to MAXIMUM AVAILABILITY
The Fast-Start Failover can be enabled only if the protection mode is set to MAXIMUM AVAILABILITY.
LogXptMode has to be set to 'SYNC'

DGMGRL> edit database 'db1'
> set property LogXptMode='SYNC';
Property "logxptmode" updated
DGMGRL> edit database 'stdb'
> set property LogXptMode='SYNC';
Property "logxptmode" updated

DGMGRL> edit configuration set protection mode as MaxAvailability;
Succeeded.
DGMGRL> show configuration

Configuration
Name: DRTest
Enabled: YES
Protection Mode: MaxAvailability
Fast-Start Failover: DISABLED
Databases:
db1 - Primary database
stdb - Physical standby database

Current status for "DRTest":
SUCCESS

SQL> conn sys/***@db1 as sysdba
Connected.
SQL> select protection_mode, protection_level from v$database;

PROTECTION_MODE PROTECTION_LEVEL
-------------------- --------------------
MAXIMUM AVAILABILITY MAXIMUM AVAILABILITY

SQL> conn sys/***@stdb as sysdba
Connected.
SQL> select protection_mode, protection_level from v$database;

PROTECTION_MODE PROTECTION_LEVEL
-------------------- --------------------
MAXIMUM AVAILABILITY MAXIMUM AVAILABILITY

SQL>


I ensured that tnsnames.ora are set correctly as well DGMGRL is installed.

Now, since all the prerequisites are met, lets move forward and enable the FSFO.
Before enabling it, make sure each of the databases in the configuration has set a fast start failover target. This is achieved by setting the FastStartFailoverTarget parameter.


DGMGRL> edit database 'db1' set property faststartfailovertarget='stdb';
Property "faststartfailovertarget" updated
DGMGRL> edit database 'stdb' set property faststartfailovertarget='db1';
Property "faststartfailovertarget" updated


Another important parameter that has to be set is FastStartFailoverThreshold. This parameter specifies the amount of time (in seconds) the observers attempts to reconnect to the primary database before starting the fast-start failover to the standby database. The default value is set to 30 seconds. In the example I set this parameter to 120 seconds.


DGMGRL> edit configuration set property FastStartFailoverThreshold=120;
Property "faststartfailoverthreshold" updated


Now lets enable the fast start failover:


DGMGRL> ENABLE FAST_START FAILOVER;
Enabled.


If you check the configuration at this moment you may find the following warning: ORA-16608: one or more databases have warnings.
If you check the status of one of the databases, you can see this warning: ORA-16819: Fast-Start Failover observer not started


DGMGRL> show configuration

Configuration
Name: DRTest
Enabled: YES
Protection Mode: MaxAvailability
Fast-Start Failover: ENABLED
Databases:
db1 - Primary database
stdb - Physical standby database
- Fast-Start Failover target

Current status for "DRTest":
Warning: ORA-16608: one or more databases have warnings


DGMGRL> show database verbose 'db1'

Database
Name: db1
Role: PRIMARY
Enabled: YES
Intended State: ONLINE
Instance(s):
db1

Properties:
InitialConnectIdentifier = 'db1.mydomain.com'
LogXptMode = 'SYNC'
Dependency = ''
DelayMins = '0'
Binding = 'OPTIONAL'
MaxFailure = '0'
MaxConnections = '1'
ReopenSecs = '300'
NetTimeout = '180'
LogShipping = 'ON'
PreferredApplyInstance = ''
ApplyInstanceTimeout = '0'
ApplyParallel = 'AUTO'
StandbyFileManagement = 'auto'
ArchiveLagTarget = '0'
LogArchiveMaxProcesses = '2'
LogArchiveMinSucceedDest = '1'
DbFileNameConvert = 'stdb, db1'
LogFileNameConvert = 'stdb, db1'
FastStartFailoverTarget = 'stdb'
StatusReport = '(monitor)'
InconsistentProperties = '(monitor)'
InconsistentLogXptProps = '(monitor)'
SendQEntries = '(monitor)'
LogXptStatus = '(monitor)'
RecvQEntries = '(monitor)'
HostName = 'miki-laptop'
SidName = 'db1'
LocalListenerAddress = '(ADDRESS=(PROTOCOL=tcp)(HOST=miki-laptop)(PORT=1521))'
StandbyArchiveLocation = 'dgsby_db1'
AlternateLocation = ''
LogArchiveTrace = '0'
LogArchiveFormat = '%t_%s_%r.dbf'
LatestLog = '(monitor)'
TopWaitEvents = '(monitor)'

Current status for "db1":
Warning: ORA-16819: Fast-Start Failover observer not started


So lets start the observer. I will repeat again, the observer should run on a different host, however for the sake of this example it will run on the same machine as the databases.
In order to start the observer one should start DGMGRL utility and login to the data guard configuration.
Once logged in, issue START OBSERVER command. This will start the observer.
Optionally you can set a log file destination while invoking DGMGRL utility and specify name for the observer configuration file (for more information check Data Guard Command-Line Interface Reference).
Once the observer is started, the control is not returned to the user until the observer is stopped.


DGMGRL for Linux: Version 10.2.0.3.0 - Production

Copyright (c) 2000, 2005, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys/***@db1
Connected.
DGMGRL> start observer;
Observer started


So lets check the configuration now.


$ dgmgrl
DGMGRL for Linux: Version 10.2.0.3.0 - Production

Copyright (c) 2000, 2005, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys/***@db1
Connected.
DGMGRL> show configuration verbose;

Configuration
Name: DRTest
Enabled: YES
Protection Mode: MaxAvailability
Fast-Start Failover: ENABLED
Databases:
db1 - Primary database
stdb - Physical standby database
- Fast-Start Failover target

Fast-Start Failover
Threshold: 120 seconds
Observer: miki-laptop

Current status for "DRTest":
SUCCESS

DGMGRL>


As of this moment my configuration has FSFO enabled.

Now lets test if the FSFO really works.
One should be aware of the conditions that must be satisfied for the observer to attempt FSFO.
The full list of conditions can be found in Data Guard Broker user guide under 5.5.2.1 What Happens When Fast-Start Failover and the Observer Are Running?

I will issue SHUTDOWN ABORT to the primary database (SHUTDOWN NORMAL/IMMEDIATE/TRANSACTIONAL would not trigger the failover).
Once I crash the database, the status of the configuration will return Error: ORA-16625: cannot reach the database. Be aware that since the primary database is down, the only way to check the configuration is to connect using the standby database credentials.


SQL> conn sys/***@db1 as sysdba
Connected.
SQL> shutdown abort
ORACLE instance shut down.
SQL>

DGMGRL> connect sys/***@stdb
Connected.
DGMGRL> show configuration

Configuration
Name: DRTest
Enabled: YES
Protection Mode: MaxAvailability
Fast-Start Failover: ENABLED
Databases:
db1 - Primary database
stdb - Physical standby database
- Fast-Start Failover target

Current status for "DRTest":
Error: ORA-16625: cannot reach the database


After waiting for two minutes (FSFO threshold was set to 120 seconds), I checked the observer log file and found out that it started the failover.


[W000 08/09 17:58:49.75] Observer started.

18:05:38.31 Saturday, August 09, 2008
Initiating fast-start failover to database "stdb"...
Performing failover NOW, please wait...
Failover succeeded, new primary is "stdb"
18:06:16.82 Saturday, August 09, 2008



DGMGRL> show configuration verbose

Configuration

Name: DRTest
Enabled: YES
Protection Mode: MaxAvailability
Fast-Start Failover: ENABLED
Databases:
db1 - Physical standby database (disabled)
- Fast-Start Failover target
stdb - Primary database

Fast-Start Failover

Threshold: 120 seconds
Observer: miki-laptop

Current status for "DRTest":

Warning: ORA-16608: one or more databases have warnings

DGMGRL>


So the observer started the fast-start failover and has successfully converted the former standby to new primary database.

Once the failover is done the observer will attempt to automatically reinstate the former primary database to new physical standby. Certain requirements have to be met for the automatic reinstatement to take place. If not, the database can be manually reinstated. Once the reinstatement is successful you can see that the former primary database became new physical standby.

DGMGRL> show configuration

Configuration

Name: DRTest
Enabled: YES
Protection Mode: MaxAvailability
Fast-Start Failover: ENABLED
Databases:

db1 - Physical standby database
- Fast-Start Failover target
stdb - Primary database

Current status for "DRTest":

SUCCESS

DGMGRL>


Finally, you can do switchover to perform role change and to place the configuration in its initial state.

The example above shows that enabling fast-start failover is pretty straight forward. There are many good documents with information how to enable FSFO feature. Some of them are listed below:

Oracle® Data Guard Broker 10g Release 2 (10.2) - 5.5 Fast-Start Failover
What Happens When Fast-Start Failover and the Observer Are Running?

7.6 Scenario 5: Enabling Fast-Start Failover and the Observer

Metalink Note#359555.1 IMPLEMENTING FAST-START FAILOVER IN 10GR2 DATAGUARD BROKER ENVIRONMENT

Oracle Maximum Availability Architecture – MAA

I hope I didn't miss something.

Cheers!

12 comments:

Anonymous said...

Hi Mihajlo. I covered this article in Log Buffer #110.

Cheers,
Dave.

Arnoud Roth said...

Hi Mihajlo,
Thanks for this great article. I am sure it will be a great help in setting up FSFO for many of us out there.
To add on this very nice article, Harald van Breederode has a very good presentation on the subject of client connectivity in a Dataguard environment. You can get it here:
http://ukoug.netxtra.net/calendar/show_presentation.jsp?id=8479
Very valuable stuff indeed.
Regards,
Arnoud Roth

Teko said...

Dave, I am really glad to see this article listed in Log Buffer #110. Thank You !

Arnoud, thanks for the nice words. I am glad you like the article. I am looking forward to read Harald's presentation.

Anonymous said...

This is a great write-up. We're in the middle of implementing fsfo and were wondering how to automatically redirect the clients when the database has failed over. I cannot access the white paper of Harald van breederode about client connectivity because I am not a member of UKoug...I am in the US. Would it be possible to email it to me. I will understand if it is not possible. Thanks! Veronica

Teko said...

Hi Veronica,

Thanks for reading the article. In order to download the paper, you have to be an UKOUG member. There is bronze membership option which is very convinient for those who are not in the UK (UKOUG membership details).
Regarding FSFO implementation and the client/application failover, it all depends on your application architecture. I will try to cover something in some of the next posts.

You may also want to read the following White Paper (if you haven't read it already):
Client Failover Best Practices for Highly Available Oracle Databases: Oracle Database 10g Release 2

Cheers,
Mihajlo

hoffmandirt said...

Great article. Do you have an example of the JDBC URL that you would use to connect to an Oracle database with Fast Start Failover enabled? Does the URL need to know about both hosts (primary and standby)?

Teko said...

Hi hoffmandirt,

Thanks for reading?

What kind of JDBC driver are you referring to? Thick(OCI) or Thin?

I assume thin.

If my assumption is correct, my choice would be to implement Fast Connection Failover.

The following URLs can bring you to the documentation about this failover method:
http://download.oracle.com/docs/cd/B19306_01/java.102/b14355/fstconfo.htm#CIHJBFFC

http://download.oracle.com/docs/cd/B19306_01/server.102/b25159/configbp.htm#CHDIFGHE

Since I am talking about the thin driver and FCF, you should include the whole connect descriptor in the JDBC URL so it would look like:

jdbc:oracle:thin:@(DESCRIPTION=
(LOAD_BALANCE=yes)
(ADDRESS=(PROTOCOL=TCP)(HOST=server1.localdomain)(PORT=1521))
(ADDRESS=(PROTOCOL=TCP)(HOST=server2.localdomain)(PORT=1521))
(CONNECT_DATA=(service_name=serviceJDBC)))


Be aware that you also need to enable FCF on the client side.

You can take a look to Metalink Note 433827.1 where you can find an example about how to setup FCF with JDBC Thin Client in a RAC environment.

If your application is making connections to Non-RAC Data Guard environment, the configuration should be more or less the same. For more details about the differences you can refer to this grat White Paper available on OTN:
Client Failover Best Practices for Highly Available Oracle Databases: Oracle Database 10g Release 2


Regarding your question if the URL needs to know about both of the databases, my answer would be YES. It is up to you whether you will define it once in an external file and then load it (preferred) or you will explicitly define it in the application.

I tried to provide you with a general answer. If this was not the answer you were looking for, let me know and provide more information about the challenges/issues you are facing so I can be more specific.

Cheers,
Mihajlo

vbarun said...

The presentation can be found at http://prutser.files.wordpress.com/2008/12/client_connectivity.pdf

Bins said...

Hi ya,

The protection_level is showing as resynchronization even though there are no gaps. I can see the logs are also being applied but the protection_level is resynchronization instead of Max Availability. Ny ideas y???

Anonymous said...

Fast-start failover observer is no longer observing this database

The observer is definately started and no firewalls are present? Should the be a special configuration in the /etc/hosts file?

Pc2Linux said...

Hi Guys:
I have a scenario like as you said, but I could not start observer. It worked some days ago, before failover was performed.

ORIGINAL SITUATION:
orcl3 was primary database
orcl4 was standby database

ACTUAL SITUATION:

DGMGRL> show configuration

Configuration
Name: dg_iwayprod
Enabled: YES
Protection Mode: MaxAvailability
Fast-Start Failover: ENABLED
Databases:
orcl3 - Physical standby database
- Fast-Start Failover target
orcl4 - Primary database

Current status for "dg_iwayprod":
Warning: ORA-16608: one or more databases have warnings


DGMGRL> show database orcl3

Database
Name: orcl3
Role: PHYSICAL STANDBY
Enabled: YES
Intended State: ONLINE
Instance(s):
orcl3

Current status for "orcl3":
Warning: ORA-16819: Fast-Start Failover observer not started

I tried to start observer as follow:
DGMGRL> start observer;
Error: ORA-16556: error message is in XML already

Failed.

I can't start it. However both databases seem to be syncronized as I see:

SQL> select name,open_mode,fs_failover_status,db_unique_name from v$database;

NAME OPEN_MODE FS_FAILOVER_STATUS DB_UNIQUE_NAME
--------- ---------- --------------------- ------------------------------
ORCL3 READ WRITE SYNCHRONIZED orcl4

SQL> select name,open_mode,fs_failover_status,db_unique_name from v$database;

NAME OPEN_MODE FS_FAILOVER_STATUS DB_UNIQUE_NAME
--------- ---------- --------------------- ------------------------------
ORCL3 MOUNTED SYNCHRONIZED orcl3

Pc2Linux said...

I've solved problem as it did not start cos there was a damaged observer config file.

$ ps -fea | grep obs
oracle 24798 1 0 12:58:10 ? 0:04 dgmgrl -silent sys/sys@orcl3 start observer FILE='/export/home/oracle/fsfo-orcl.dat'

by starting observer on another BROKER server, this process generated a new fsfo-orcl.dat file. I transfer this file to the original broker server, I tried to start it worked.
Thank you.