book
Article ID: KB0094624
calendar_today
Updated On:
Description
Resolution:
Description:
===========
Not able to start more than 12 BusinessEvents agents on a Unix AIX box - Well-Known-Addressees (WKA) enabled.
Environment:
===========
BusinessEvents 4.0.1
AIX
Symptoms:
=========
Upon startup of the 13th agent, an error is reported in the coherence log - agent will not join the cluster
Oracle Coherence EE 3.6.0.0 <Warning> (thread=PacketPublisher, member=1): A potential communication problem has been detected. A packet has failed to be delivered (or acknowledged) after 225 seconds, although other packets were acknowledged by the same cluster member (Member(Id=12, Timestamp=<time>, Address=<IP:port>, MachineId=<id>, Location=site:<host>,process:<pid>, Role=TibcoCepContainerBEMain)) to this member (Member(Id=1, Timestamp==<time>, Address=<IP:port>, MachineId=<id>, Location=site:<host>,process:<pid>, Role=TibcoCepContainerBEMain)) as recently as 0 seconds ago. It is possible that the packet size greater than 1453 is responsible; for example, some network equipment cannot handle packets larger than 1472 bytes (IPv4) or 1468 bytes (IPv6). Use the 'ping' command with the <size> option to verify successful delivery of specifically sized packets. Other possible causes include network failure, poor thread scheduling (see FAQ if running on Windows), an extremely overloaded server, a server that is attempting to run its processes using swap space, and unreasonably lengthy GC times.
Cause:
======
Coherence issue (OS limitation)
Resolution:
========
Change default coherence socketprovider to TCP (coherence-override-prod.xml).
<socket-provider system-property="tangosol.coherence.socketprovider">tcp</socket-provider>
See: http://docs.oracle.com/cd/E15357_01/coh.360/e15723/appendix_operational.htm#BAGFJBCB
Information from Oracle support:
=========================
SystemSocketProvider (which is default) produces instances of the JVM's default sockets while as TCPSocketProvider produces TCP-based sockets. It seems that maybe the JVM socket implementations were not getting enough resources to handle the load when cluster members were increased.
Default values for the receive buffer is 32 for publishing, 64 for multicast listening, and 1428 for unicast listening. However the operating system will only treat the specified value as a hint, and is not required to allocate the specified amount. And this is true even if you override the requested value in coherence configuration. Please refer to following link for tuning the buffer size at OS level:
http://docs.oracle.com/cd/E15357_01/coh.360/e15723/tune_perftune.htm#CACIEGCC
In a production environment, it is recommended to run the TCP based provider with WKA list if one wishes to get away from UDP based communication.
Attachments:
==========
Sample coherence-override-prod.xml
Issue/Introduction
Not able to start more than 12 BusinessEvents agents on a Unix AIX box - Well-Known-Addressees(WKA) enabled.