If a failed schedule update has reloadFrequency lesser than the combined time it takes for the job to retry all the failures defined in the configuration, then this schedule update does not recover from the failure. Even if the initial issue gets resolved, the schedule update would run successfully only for one of the first In_Progress jobs but no new jobs are triggered.
- Disable/Enable schedule update does not help.
- Reload schedule update manually does not help.
- Delete the schedule update rule and create a new rule for the same file. It will still not run.
The root cause of the missing reloads is that when the job fails and is retried due to failure along with short reloadFrequency triggering new jobs, the TIBCO Spotfire Server(TSS)/Web Player(WP) gets overwhelmed and misses to update statuses of all the jobs, so TSS thinks that a job is still "In_Progress" when it has actually failed and all retries have been done. Following log entries could be seen in web player logs:
Spotfire.Dxp.Web.Library.ScheduledUpdates;"The update event for *** with JobDefinitionId=***,
JobInstanceId=*** will not be performed due to an update is already in progress."
Example: The rule is set to reload every 1 minute. TSS is set to retry 3 times before failure, and also retry at each reload(below snippet is from configuration.xml):
<stop-updates-after-repeated-fail>
<enabled>true</enabled>
<fails-before-stop>3</fails-before-stop>
<stop-only-when-cached>false</stop-only-when-cached>
<always-retry-when-scheduled>true</always-retry-when-scheduled>
</stop-updates-after-repeated-fail>
So when a job fails the first load, it then retries 3 times which takes more than a minute, so in the meantime, a new job is added to the queue, and so on and so forth. TSS/WP might not be able to keep track of all these jobs.