GridServer Windows Engines instances crash and restart in loop when engine instance count is greater than 100 Engine instances per Daemon

GridServer Windows Engines instances crash and restart in loop when engine instance count is greater than 100 Engine instances per Daemon

book

Article ID: KB0073197

calendar_today

Updated On:

Products Versions
TIBCO DataSynapse GridServer -

Description

Some customers have reported that some of their engine instances restart in loop when they try to use all the CPUs available from their Engine host machines.

From the engine daemon logs we have some engine instances that seem to restart all the time (ever second or less):

---
Info: started instance 108: invoke.exe(16528)
Info: started instance 109: invoke.exe(12816)
Info: started instance 110: invoke.exe(5192)
Info: EngineExited, Exit code = -1073741502
Info: EngineExited, Restarting in 0 seconds
Info: EngineExited, Exit code = -1073741502
Info: EngineExited, Restarting in 0 seconds
Info: EngineExited, Exit code = -1073741502
Info: EngineExited, Restarting in 0 seconds
---

From the engine instance logs we have: 

---
INFO: [Engine] Shutdown called, Engine Daemon requesting Engine instance shutdown: Engine Daemon disabled by Director, exit code: 0
INFO: [EngineLoginPlugin] Initiating logoff, type=1, message: Engine Daemon requesting Engine instance shutdown: Engine Daemon disabled by Director
INFO: [EngineLoginPlugin] Sending logoff message
INFO: [EngineLoginPlugin] Logged off
INFO: [Engine] Shutting down Message Server.
INFO: [Engine] Shutdown reason: Engine Daemon requesting Engine instance shutdown: Engine Daemon disabled by Director
---

The above happens randomly since the instances affected, even had accepted work from the Broker, but not working properly. The Process Scheduling Policy from the Engine Configuration doesn't make any difference from native to numa and to balanced, all three scheduling policies behave in the same way as some engine instances crash and restart in loop.

Environment

Wndows

Resolution

If the above is present in a case, what has helped before is to ask the customer to review the value of below Windows Registry Key: 
- HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows

By default, it is set to value 768 – noninteractive desktop heap, we need to increase it:

1. Try to increase 768 – noninteractive desktop heap, to 1024.
      SharedSection=1024,20480,768  to SharedSection=1024,20480,1024
2. Reboot the machine 
3. Run local system account.

After doing the above, all engines instances available should be able to be running fine.

Issue/Introduction

Some customers have reported that some of their engine instances restart in loop when they try to use all the CPUs available from their Engine host machines. From the engine daemon logs we have some engine instances that seem to restart all the time (ever second or less). Try to increase value of HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows from 768 – noninteractive desktop heap, to 1024. SharedSection=1024,20480,768 to SharedSection=1024,20480,1024 After doing the above, all engines instances available should be able to be running fine.