DR strategy and fault tolerant recommendations.

DR strategy and fault tolerant recommendations.

book

Article ID: KB0085895

calendar_today

Updated On:

Products Versions
TIBCO ActiveSpaces -
Not Applicable -

Description

Resolution:
Description:
===========
DR (Disaster Recovery) strategy and fault tolerant recommendations using ActiveSpaces.



Environment:
===========
AS 2.1.*
All operating systems

Resolution:
===========
DR strategy and fault tolerant recommendations

1). Increasing the number of AS-Agents :ActiveSpaces provides a pre-built Agent process called as-agent. Users can run as-agent on any host. The sole purpose of as-agent is to join all distributed spaces in the specified metaspace as a seeder. Agents can also be used to ensure that the desired degree of replication specified for a space can be achieved.

2). Performing Space Replication: In order to provide fault-tolerance and to prevent loss of tuples if one of the seeders of a space suddenly disappears from the metaspace, it is possible to specify a degree of replication for a space.

3). Persistence: ActiveSpaces allows you to persist data to disk storage and recover data if data loss occurs or there is a problem with cluster startup.

ActiveSpaces provides two types of persistence:

- Shared-Nothing Persistence - Each node that joins a space as a seeder maintains a copy of the space data on disk. Each node that joins as a seeder writes its data to disk and reads the data when needed for recovery and for cache misses.

- Shared All Persistence - All nodes share a single persister or a set of persisters.

With all of the above you can overcome any system failure and network failures within the same metaspace. Even once a seeder goes down other seeders will be available for space operations. Proper replication degree will make sure there is no data lose (tuples) in the space. Persistence will take care of data store and recovery.

There will be some cases where the complete network or power failure will cause the metaspace to disappear. In this case you can maintain two identical metaspaces in two different locations and make sure any space operations happened (add/update/delete) in one space can clone to other space using AS APIs (space events, listener and remote client).

For single site disaster AS provides the ability to have disk persistence, either shared all (to something like a DB for example), or built-in shared-nothing distributed persistence to local files on the seeders.

For multiple-sites there are a couple of options:

If your data is relatively static and active-passive setup is OK you can use shared nothing persistence and have the seeders persist to a file system mounted from a SAN or NAS that is replicated to the other site. When you start the second site the seeder machines on that site should be able to see those files from it's local SAN/NAS and be able to reload from them.

Otherwise we would recommend that you deploy a cluster (metaspace) at each site and implement a form of routing to replicate the changes happening on one site to the other. In this case you can leverage the routing interface introduced in AS 2.1.2 which allows you to create a class implementing the routing interface and forward those changes to the data onto the other metaspace (leveraging for example EMS as a reliable store and forward transport mechanism).

Issue/Introduction

DR strategy and fault tolerant recommendations.