Not able to start more than 12 BusinessEvents agents on a Unix AIX box - Well-Known-Addressees(WKA) enabled.

Not able to start more than 12 BusinessEvents agents on a Unix AIX box - Well-Known-Addressees(WKA) enabled.

book

Article ID: KB0094624

calendar_today

Updated On:

Products Versions
TIBCO BusinessEvents Enterprise Edition -
Not Applicable -

Description

Resolution:
Description:
===========
Not able to start more than 12 BusinessEvents agents on a Unix AIX box - Well-Known-Addressees  (WKA) enabled.

Environment:
===========
BusinessEvents 4.0.1
AIX

Symptoms:
=========
Upon startup of the 13th agent, an error is reported in the coherence log - agent will not join the cluster

Oracle Coherence EE 3.6.0.0 &ltWarning> (thread=PacketPublisher, member=1): A potential communication problem has been detected. A packet has failed to be delivered (or acknowledged) after 225 seconds, although other packets were acknowledged by the same cluster member (Member(Id=12, Timestamp=&lttime>, Address=&ltIP:port>, MachineId=&ltid>, Location=site:&lthost>,process:&ltpid>, Role=TibcoCepContainerBEMain)) to this member (Member(Id=1, Timestamp==&lttime>, Address=&ltIP:port>, MachineId=&ltid>, Location=site:&lthost>,process:&ltpid>, Role=TibcoCepContainerBEMain)) as recently as 0 seconds ago. It is possible that the packet size greater than 1453 is responsible; for example, some network equipment cannot handle packets larger than 1472 bytes (IPv4) or 1468 bytes (IPv6). Use the 'ping' command with the &ltsize> option to verify successful delivery of specifically sized packets. Other possible causes include network failure, poor thread scheduling (see FAQ if running on Windows), an extremely overloaded server, a server that is attempting to run its processes using swap space, and unreasonably lengthy GC times.

Cause:
======
Coherence issue (OS limitation)

Resolution:
========
Change default coherence socketprovider to TCP (coherence-override-prod.xml).

&ltsocket-provider system-property="tangosol.coherence.socketprovider"&gttcp</socket-provider>

See: http://docs.oracle.com/cd/E15357_01/coh.360/e15723/appendix_operational.htm#BAGFJBCB

Information from Oracle support:
=========================
SystemSocketProvider (which is default) produces instances of the JVM's default sockets while as TCPSocketProvider produces TCP-based sockets. It seems that maybe the JVM socket implementations were not getting enough resources to handle the load when cluster members were increased.

Default values for the receive buffer is 32 for publishing, 64 for multicast listening, and 1428 for unicast listening. However the operating system will only treat the specified value as a hint, and is not required to allocate the specified amount. And this is true even if you override the requested value in coherence configuration. Please refer to following link for tuning the buffer size at OS level:

http://docs.oracle.com/cd/E15357_01/coh.360/e15723/tune_perftune.htm#CACIEGCC

In a production environment, it is recommended to run the TCP based provider with WKA list if one wishes to get away from UDP based communication.

Attachments:
==========
Sample coherence-override-prod.xml

Issue/Introduction

Not able to start more than 12 BusinessEvents agents on a Unix AIX box - Well-Known-Addressees(WKA) enabled.

Attachments

Not able to start more than 12 BusinessEvents agents on a Unix AIX box - Well-Known-Addressees(WKA) enabled. get_app