book
Article ID: KB0085830
calendar_today
Updated On:
Description
Resolution:
Description:
============
TIBCO BusinessEvents (BE) 3.0.1 engine does not start. It throws exceptions with com.tangosol.net.RequestTimeoutException: Timeout during service start error.
Environment:
===========
TIBCO BusinessEvents 3.0.1
Symptoms:
=========
The TIBCO BusinessEvents (BE) engine throws the following error messages:
2010 Apr 20 17:09:02:599 GMT +2 CEP Error [main] - com.tangosol.net.RequestTimeoutException: Timeout during service start: ServiceInfo(Id=3, Name=DistributedCache, Type=DistributedCache
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onStartupTimeout(Grid.CDB:6)
at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.start(Service.CDB:27)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.start(Grid.CDB:38)
at com.tangosol.coherence.component.util.SafeService.startService(SafeService.CDB:28)
at com.tangosol.coherence.component.util.safeService.SafeCacheService.startService(SafeCacheService.CDB:5)
at com.tangosol.coherence.component.util.SafeService.ensureRunningService(SafeService.CDB:27)
at com.tangosol.coherence.component.util.SafeService.start(SafeService.CDB:14)
Check the coherence logs for the following eror.
2010-04-20 20:37:37.744/108.637 Oracle Coherence EE 3.4.2/411p1 <Error> (thread=DistributedCache, member=2): validatePolls: This service timed-out due to unanswered handshake request. Manual intervention is required to stop the members that have not responded to this Poll
{
PollId=1, active
InitTimeMillis=1271788597215
Service=DistributedCache (3)
RespondedMemberSet=[]
LeftMemberSet=[]
RemainingMemberSet=[1]
}
Cause:
=====
When a node joins a clustered service, it performs a handshake with each clustered node running the service. A missing handshake response prevents this node from joining the service. Most commonly, it is caused by an unresponsive (e.g., deadlocked) service thread.
Resolution:
=========
This exception can be thrown for several reasons, so use the one which applies in the problem context.
1). Possible causes could be that the multicasting is not functioning correctly or an unresponsive service thread. If the node cannot join a cluster, ensure that multicast is functioning correctly via the Multicast test or that WKA (Well Known Addresses) is correctly configured.
2). When a Coherence node joins a cluster, it must receive confirmation from every node in the cluster for every service that it is configured to run. If a new node does not receive confirmation for a service, it will not be able to join it. The goal is to identify the node(s) with the unresponsive service thread and remove it from the cluster by shutting down its JVM.
Issue/Introduction
TIBCO BusinessEvents (BE) 3.0.1 engine does not start. It throws exceptions with com.tangosol.net.RequestTimeoutException: Timeout during service start error.