book
Article ID: KB0090998
calendar_today
Updated On:
Description
Resolution:
In a properly designed process there is NO chance for message loss but there is a slightest possibility for re-processing(which may cause duplication of messages).
Consider case where engine1 and engine2 are deployed in fault tolerant mode. If you are using file based checkpoint and it is distributed on two different machines, then the engine1 cannot pick up the checkpoints from engine2 and vise versa. It may looks as if the messages are lost but they do not. To avail the fault tolerance(to pickup the jobs check pointed by the other engine) during fail over condition you need to have centralized checkpointing, i.e. either database based or NAS based check pointing.
To avoid duplication of the processing, use check point activities at every significant stage of the process. Significant stage means, where ever you do not want the re-processing to happen, put a check point immediately after that activity. For example if a process has a message sender (EMS Queue sender, Topic Publisher, RV publisher etc.) or JDBC activity , and if you do not want that to be repeated , put one check point immediately after the message sender or JDBC activity.
Do remember that it only minimizes the posibility of re-processing, but it doesn't eleminate it completely. As a result of this you will definitely have at least once delivery of the message but in a very rare possibility you may see duplicate processing of a message, and this happens because of re-processing of a message when the fault torelant engine pickup the message from check point.
Issue/Introduction
Either message loss or duplication can occur during fail over of bwengines when they are deployed in fault tolerant mode?