Products | Versions |
---|---|
TIBCO DataSynapse GridServer | - |
Not Applicable | - |
Resolution:
Tasks on an engine might finish and then die before the engine succeeds at notifying the broker and marking the tasks as finished. Also, an engine instance can shut down unexpectedly, even on a small grid or when there are no apparent network problems. These events log the following message:
INFO: [Engine] Shutdown called, Problem during message send: java.net.SocketException: Software caused connection abort: recv failed 1004
These events happen when the engine and broker do not communicate for a time longer than the keepalive timeout value in resin (120 seconds). After a timeout, the connection is reset from the broker's end, but the engine still believes it has an active connection and is unable to notify the broker and mark the tasks as finished.
Another manifestation of the problem is when the engines get this message continously in their logs:
04/17/09 13:11:39.715 WARNING: [JakartaHttpSupport$MethodRetryHandler]
org.apache.commons.httpclient.NoHttpResponseException: The server xxxxx failed to respond, will retry in 3000 msecs.
04/17/09 13:11:42.715 INFO: [org.apache.commons.httpclient.HttpMethodDirector] I/O exception caught when processing request: The server xxxx failed to respond
To avoid this issue, edit the two lines in the broker resin.conf file that set the keepalive parameters. The resin.conf file is located under GS_INSTALLATION_ROOT/conf directory. Set keepalive-max as well as keepalive-timeout to 0 as shown below:
<keepalive-max>0</keepalive-max>
<keepalive-timeout>0s</keepalive-timeout>
Make this change on every broker of your grid. A broker restart will be required for the change to take effect.