Existing Resource Instances remain in a failed state after host/node restart.

Existing Resource Instances remain in a failed state after host/node restart.

book

Article ID: KB0085462

calendar_today

Updated On:

Products Versions
TIBCO ActiveMatrix Service Grid -
Not Applicable -

Description

Resolution:
After tibcohost restart, most of the Resource Instances failed with the following exceptions during installation.
*********************************
TIBCO-AMX-SR-JDBC-013102: Unable to get user credentials from java:<SharedResource-name>_subjectJndiName_Identity due to exception ……com.tibco.trinity.runtime.core.provider.identity.trust.TrustRuntimeException: javax.security.auth.login.FailedLoginException: Key cannot be retrieved:com.tibco.trinity.runtime.core.provider.credential.keystore.KeystoreRuntimeException: com.tibco.trinity.runtime.core.provider.credential.keystore.KeystoreRuntimeException: Unable to retrieve keystore from URL(s) 'http://<AdminHostName>:<internal-port>/amxadministrator/svn/trunk/artifacts/keystore/admin_default_keystore.jceks,'; errors were java.io.EOFException .
********************************
Analysis:


The Resource Instances which were in a failed state are the ones which have a password field and their dependents. Such Resource Instances that has a password field will need to use the 'admin_default_keystore.jceks' file and the node uses AMX admin's internal HTTP connector port to download this jceks file using the URL "http://<AdminHostName>:<internal-port>/amxadministrator/svn/trunk/artifacts/keystore/admin_default_keystore.jceks".

Confirm if the internal HTTP connector resource instance is running and in sync under SystemNode.

Confirm that the keystore file can be downloaded using the admin URL "http://<AdminHostName>:<internal-port>/amxadministrator/svn/trunk/artifacts/keystore/admin_default_keystore.jceks" and verify the keystore entries using the keytool command. For this particular case, all the keyalias were present in the admin keystore file.

Confirm the artifact server application under System Environment is up and running as this application is responsible for making the file "admin_default_keystore.jceks" available to download over HTTP/HTTPS.

Try restarting the node as it downloads the fresh copy of this jceks from admin upon every restart. For this particular case, restarting the node did not fix the issue. It appears that  the node failed to download the latest copy from admin was falling back to the cached copy. Usually this happens when the admin is down or due to any of above mentioned reasons. In this case, admin was up and all the above constraints were satisfied, but it still failed to download the file from admin. We have seen similar cases where the node failed to download the keystore from admin with the following exception once the tibcohost is restarted after a node crash (due to out of memory issues).
********************************************
Unable to retrieve keystore from URL(s) <ADMIN-URL-TO-DOWNLOAD-JCEKS>; errors were java.io.EOFException
******************************************

Try creating a test JDBC RT so that the admin_default_keystore.jceks will get updated with new keyalias and every time there is change in this file, administrator will ask the node to download a fresh copy of this jceks file. If your test JDBC Resource Instance is installed successfully in the node, that means the node is able retrieve/access the updated keystore file. Now try to install all the failed resource instances.

Issue/Introduction

Existing Resource Instances remain in a failed state after host/node restart.