Can you please provide me with more information on the "protocol message-looping" defect that was partially fixed in RV 7.0 and completely fixed in RV 7.2?

Can you please provide me with more information on the "protocol message-looping" defect that was partially fixed in RV 7.0 and completely fixed in RV 7.2?

book

Article ID: KB0091426

calendar_today

Updated On:

Products Versions
TIBCO Rendezvous -
Not Applicable -

Description

Resolution:
The defect (excerpted below), which we believed was completely addressed in TIBCO Rendezvous 7.0, relating to the looping of routing protocol messages, still partially exists in TIBCO Rendezvous 7.1.

-   Fixed defect which could cause protocol message looping when a
     single router is repeatedly shut down and restarted.

RVRD 7.0 fixed 2 code paths, and we have found a third code path that needed changing to completely fix the defect.

This last fix is included in TIBCO Rendezvous 7.2.  The same symptoms and causes, as before, apply for this last fix - detailed below.

We referred to "RFC 1583 (OSPF Version 2): 14 Aging The Link State Database" to determine how best to address the "protocol message looping" defect in RVRD.

Since this defect is the consequence of a race condition, it is a very complicated problem to reproduce.

The following steps may help to reproduce the problem:
- Set up a network of routing daemons that offers the possibility of a loop.
- Keep shutting down and restarting one of the daemons (participating in that loop) twice within a 20-minute interval. After a number of such iterations, the problem may occur.


Symptoms:

The symptoms observed include RVRDs using high amounts of CPU and memory caused by the fact that the same _RV.RVRD.P messages are looping continuously in the routing network.  While this is happening you will possibly see several "Connected/Disconnected" and "Lost/Found" messages in the RVRD log file.


Causes:

This defect can manifest itself if a router instance is shutdown and restarted at least twice in less than 30 minutes, while it is processing data from its neighbours.  It could also be caused by any events leading a live RVRD-based routing system to become segmented while events relevant to routing were being processed.

NOTE: We often find that several of our customers encounter this defect when they try to recover from RVRDs that have started up in their environment with configuration errors (such as duplicate router names) - once they notice the error they often shutdown and restart the RVRD 2 or more times within a short time, which as mentioned above, triggers the defect.


Recovery:

Once a routing architecture is "infected", you will need to shutdown and restart every RVRD in the routing environment to recover from the resulting OSPF instability.  This is the only way to get rid of the looping protocol messages.  Recycling the RVRDs one after the other would not suffice.


Recommendation:

We recommend our customers to upgrade to version 7.2 or later of RVRD to avoid running into this potentially serious defect.

Environment

Product: TIBCO Rendezvous Version: 6.x.x, 7.x.x OS: All --------------------

Issue/Introduction

Can you please provide me with more information on the "protocol message-looping" defect that was partially fixed in RV 7.0 and completely fixed in RV 7.2?