BC 5.x Fault Tolerance and Load Balancing Architecture

Products	Versions
TIBCO BusinessConnect	-
Not Applicable	-

Description

Resolution:
In the BC 5.x architecture, Load Balancing and Fault Tolerance are now two separate implementations. How Load Balancing and Fault Tolerance can be achieved is dependent on the deployment configuration.

  - Single Server Mode: Only Fault Tolerance.
  - Dual Firewall DMZ Mode: Both Load Balancing and Fault Tolerance

In each Fault Tolerant group, only one BC engine acts as the primary server. All other engines within the same group act as hot backups and remain idle. The engine that starts first will become the primary server within the FT group. The user must start the engines in the proper order to ensure that the desired server becomes the primary server for BC.

A/ For Fault Tolerance, refer to the example below:

Given a fault tolerant group: M1 with two engines, BC1 and BC1#, that belong to fault tolerant group M1.  BC engines within the same Fault Tolerant group work in Fault Tolerant mode.  These servers perform these roles:

  - BC1       Primary Engine in the FT group M1
  - BC1#    Hot Backup Engine in the FT group M1

Engine BC1 will process all transactions, and Engine BC1# will remain idle. Engine BC1# will detect the failure of BC1, by the failure of the exchange of heartbeat messages between the two engines.

Upon BC1 failure, BC1# will then become the primary Engine and will start processing:

  - All new Request/Transactions
  - All incomplete jobs of BC1

In the event of all engines in a FT group failing, the Jobs being processed by this FT group will remain as is, and will be processed only when either engines BC1 or BC1# restarts.

B/ Load Balancing

Load Balancing with BC 5.x can be achieved only using the DMZ Server in “Dual Firewall DMZ Mode” When the BC engines in each group start, they run an election process to select a scheduler. All other engines will become workers.

For example, consider a configuration with three fault tolerant groups, M1, M2, and M3, and three engines BC1, BC2 and BC3.  BC1 belongs to M1, BC2 belongs to B2, and BC3 belongs to M3.

On startup, engines BC1, BC2, and BC3 from group M1, M2, M3 respectively, will run an
election process and decide the scheduler and worker via RVCMQ.  For this example, engine BC1 is elected scheduler.

The following sequence of events will occur when an inbound transaction is received:

  1. When the DMZ Server receives an inbound transaction from a trading partner, it sends an RV message to the internal network.
  2. The scheduler (BC1) receives the message and performs either of the two following action:
       2a. It assigns the request to one of the workers, or
       2b. If all workers are busy, it processes the job itself

If any of the BC engines fail, the scheduler detects the missing server and the message is assigned to another worker. If the scheduler itself fails, another scheduler election occurs, and another of the worker engines will promote itself as the new scheduler. In either case, all jobs being processed by the failed BC engine will remain “stopped” until this engine is restarted again.

Example:
If engine BC2 dies, the scheduler detects the change and will have all the messages received from the DMZ server processed by BC3 (and itself).  Jobs being processed by BC2 will remain idle until BC2 engine starts up again.

Combining Load Balancing and Fault Tolerance with BC 5.x

C/ Fault Tolerance and Load Balancing can be combined in BC 5.X as show in the following
example:

Consider three fault tolerant groups: M1, M2, M3 with the following BC Engines:

  - BC1 and BC1# belong to same FT Group M1
  - BC2 and BC2# belong to same FT Group M2
  - BC3 and BC3# belong to same FT Group M3

In each Fault Tolerant group, there is only one BC engine acting as the primary server. All other engines within the same group act as Hot Backups, remaining idle. The engine that starts first will become the primary server within the same FT group.  For our example, the following servers are acting as primary servers: BC1, BC2, BC3 in the groups M1, M2, M3 respectively. BC1#, BC2#, BC3# are Hot Backup servers in their respective groups M1, M2, and M3.

BC1, BC2, and BC3 engines process all the transactions in Load Balanced mode,
and engines BC1#, BC2#, and BC3# will remain idle (*)

(*) They will start processing only on the event of failure of the primary server belonging to their respective FT Group (i.e if BC1 fails, BC1# will become the primary server).

If all engines fail in a FT group, all jobs being processed in that group will be suspended until one of the engines in the group starts up. These jobs will not be transferred to another fault tolerant group.

D/ External Load Balancer + DMZ + Interior BC servers.

An external Http Load Balancer in the above-mentioned Architecture can be used as follows:

An external load balancer can be used in front of multiple DMZ Servers, and configured to split the load among multiple BC DMZ instances.  An external Load Balancer cannot be configured to communicate directly with the Interior BC servers because of the necessity of using the RVCMQ protocol to load balance jobs between the
interior servers.

E/ HTTP Reverse Proxy Load Balancer + DMZ + Interior BC Servers.

An HTTP Reverse Proxy is used when we need to split the incoming request to different servers depending on system resources.  For example, with two servers running BC DMZ components, in which SSL-related operations are to be performed by Server1, and normal HTTP related operations are to be performed by Server 2, the Reverse Proxy Server can to be placed before the DMZ Server, and configured to split the request among the two DMZ Servers running depending on the public transport used (http vs. https).  An HTTP Reverse Proxy Load Balancer cannot be configured to communicate directly with the Interior BC servers because of the necessity of using the RVCMQ protocol to load balance jobs between the interior servers.

Issue/Introduction

Welcome to "KB Articles"