The following changes can improve performance on large Grids:
1). On the Brokers, the default value for Engine "Max Millis Per Heartbeat" is 2 minutes. Try to keep the aggregate mean heartbeats per second (num engines * heartbeat frequency) to 40 or less. To meet this goal you may need to increase the value of "Max Millis Per Heartbeat" to greater than two minutes. To change this setting, click Manager, click Manager Configuration, then click Communication. The setting is under the Engine Heartbeat heading.
2). Increase the Assignment Timeout, on the Manager Configuration page, in the Services section, to 60000 ms. Increasing this allows more time for an Engine to connect and pickup an assigned task when the Broker is under heavy load. This value should be increased if you see "Task assignment expired:"… messages.
3). Separate the administrative port (web browser port) from the HTTP messaging port, to all administrative access even under heavy messaging load. This can be done via the Admin | Manager Reconfigure tab on the GUI.
4). On the Manager Configuration page, in the Communication section, change the Messaging Retry Wait to at least 5000 ms or higher. Change Maximum Resource Download Connections to 50. Change Driver/Engine/Daemon Socket Timeout to 120 seconds. Make sure All Sender Socket Timeout is set to 5000. Change Driver Heartbeat Timeout Factor to 15. Change Engine Heartbeat Timeout Factor to 4. In the services section, Task Reservation Timeout to 120, and Assignment Timeout(ms) to 60000.
5). Increase the heap size by changing the MAX_HEAP environment variable in the server.bat or server.sh file. A manager restart is needed. If gridserver is run as a Windows service, you need to recreate the service for the new heap size to take effect.
6). Add or change timeouts for Broker and Secondary Director connections. Sender Socket Timeout properties are exposed on the Manager Configuration page, in the communications section, under the Broker Connections and Secondary Director Connections. The default value is 5000 milliseconds.
7). Increase the SSL "Token Timeout", which is in effect regardless of SSL, for both "Director Resources" and "Broker Resources". The settings are on the Manager Configuration page, in the Security section under Resource Deployment.