Resolution: The topology explorer feature is used to remove what are referred to as"phantom" RTservers from the cloud. A phantom RTserver is defined as an RTserver 'S' which sees an indirect route to RTserver 'P', even though RTserver 'P' has already exited. This scenario is generally reproduced under extremely stressful conditions where RTservers are repeatedly brought down and then brought back up very quickly.
Phantom RTservers can be automatically removed from an RTserver cloud by introducing periodic "Topology Explorer Messages" to verify the topology as seen by any one RTserver in the cloud. This is accomplished by way of a topology explorer message being sent out every X seconds by each RTserver in the cloud, where X is specified by the undocumented option '_topology_explorer_interval'. By default this value is set to 0.0, which turns the topology exploration messages OFF. Note: each RTserver in the cloud must have this option set in order to remove phantom RTservers from the cloud. A recommended starting point for calculating the setting of this option is 5 seconds multiplied by the number of RTservers in the cloud. So, if there are 8 RTservers in the cloud the setting of the _topology_explorer_interval would be calculated as 8*5.0 or 40.0 seconds. This setting would be placed in each rtserver.cm file as: setopt _topology_explorer_interval 40
A topology explorer message 'M' sent by RTserver 'S' visits every other RTserver in the cloud by performing a depth first traversal of direct RTservers. The topology explorer message 'M' eventually returns to RTserver 'S'. RTserver 'S' compares the graph explored by message 'M' with its stored internal graph topology. Any phantom RTservers seen by RTserver 'S' (i.e. those that were not traversed by the topology explorer message 'M') are then removed from its internal graph topology.
The amount of overhead (in terms of messages per second) introduced into a cloud by using topology_explorer is given by: 2N*(N-1)/(_topology_explorer_interval) messages per second where N is the number of RTservers in the cloud. This calculation assumes each RTserver in the cloud has the same value set for the _topology_explorer_interval option (>0).
topology explorer messages have the following grammar:
str_ary // server 1 names str_ary // server 2 names str_ary // visited list str_ary // parent list The first two arrays are the undirected edges of the graph seen by the RTserver that sends the message. The size of each of these arrays will be the number of connections in the cloud. The last two arrays will grow as the message travels to each RTserver in the cloud so that the maximum size of the last two arrays will contain the number of RTservers in the cloud. Each string entry in all of the string arrays contains the unique subject of each RTserver.
The _topology_explorer_interval option was introduced in 5.5r5.