Products | Versions |
---|---|
TIBCO BusinessEvents Enterprise Edition | - |
Not Applicable | - |
Resolution:
Description:
===========
TIBCO BusinessEvents Agent hangs and Cache servers reports errors "java.lang.IllegalStateException: Service is not running".
Environment:
===========
TIBCO BusinessEvents 4.x
All Operating Systems
Symptoms:
=========
Inference Agent hang, CacheServer log includes the following errors/warnings
AgentMgr.Thread.1] - [root] java.lang.IllegalStateException: Service is not running: ReplicatedCache{Name=ReplicatedCache, State=(SERVICE_STOPPED), Id=3, Version=3.0}
Warning [ReplicatedCache:EventDispatcher] - [runtime.util] Thread pool [CacheCluster] is overloaded and the job submissions are facing repeated failures. Job [com.tibco.cep.runtime.service.om.coherence.cluster.CacheCluster$MemberService@4e04e8] has already been submitted [75] times and is still being rejected.
2012 Sep 22 06:01:05:275 GMT +8 PH-CS-02-1-CacheServer Error [AgentMgr.Thread.1] - [root] java.lang.IllegalStateException: Timeout waiting for catalog update
Cause:
======
Coherence mis-configuration.
Resolution:
========
Add property tangosol.coherence.guard.timeout to the cluster level of the CDD and set it to a high value (e.g. 360000).
Based on the Oracle knowledgebase:
"http://coherence.oracle.com/display/COH35UG/service-guardian" the service-guardian/timeout-milliseconds should be set equal to or greater than the packet-delivery/timeout-milliseconds value (default 60000).
see: http://docs.oracle.com/cd/E14447_01/coh.330/coh33ug/packetdelivery.htm
To increase the packet delivery timeout parameter add the below CDD property:
be.coherence.packet-publisher.packet-delivery.timeout=360000
Note: For Coherence Version 3.6.0.0 disable guarding does not work (tangosol.coherence.guard.timeout=0). Use a large timeout value to avoid the error.