book
Article ID: KB0075293
calendar_today
Updated On:
Description
Important tips for a successful environment upgrade and for avoiding issues in the production environments -
- Try to work on a copy/mirror of the production environment (with data). Otherwise, a certain amount of the time/effort may be spent on issues that exist only in the non-production, lower environments, but NOT in the actual production environments.
- When taking a full, reliable backup of the environment, please make sure all the following are backed up while the system is stopped (i.e., all TIBCO processes of the environment are stopped). If using Virtual Machines (VMs), it is highly recommend to take VM snapshots -
+ Filesystem - TIBCO_HOME - from all servers including AMX Administrator servers / remote host servers / EMS servers. Preserve timestamps, please.
+ Filesystem - CONFIG_HOME - from all servers including AMX Administrator servers / remote host servers / EMS servers. Preserve timestamps, please.
+ Filesystem - SHARED_MOUNT used by AMX / BPM / EMS (as applicable). Preserve timestamps, please.
+ Database schemas - AMX Admin DB / BPM DBs / EMS data store (as applicable)
The "Resolution" section of this article covers steps that are needed to be performed before, during, and after the upgrade process is complete.
Issue/Introduction
This article summarizes the precautions and steps that are needed to avoid some common issues during the ActiveMatrix (AMX) Service Grid 3.4.0 upgrade process.
Environment
- All Support OS Platforms
- Mainly about AMX Service Grid 3.4.0 environment upgrade, with necessary mention of AMX-BPM 4.3 environment upgrade
Resolution
Below upgrade steps serve as a supplement to the product documentation and should not be treated as a replacement.
==============
A. Prior to the Upgrade - (can be done a few days or weeks before the scheduled upgrade date)
==============
0. Take a reliable, off-line, full backup of the environment.
1. Review the product documentation -
- ActiveMatrix Service Grid 3.4.0 readme ( https://docs.tibco.com/pub/amsg/3.4.0/TIB_amsg_3.4.0_readme.txt?id=0 )
- ActiveMatrix Service Grid 3.4.0 Installation Requirements ( https://docs.tibco.com/pub/amsg/3.4.0/doc/html/GUID-960917E8-F513-4521-8888-ABD848FA7BAD.html )
- ActiveMatrix Service Grid 3.4.0 Release Notes ( https://docs.tibco.com/pub/amsg/3.4.0/TIB_amsg_3.4.0_relnotes.pdf?id=1 )
- Upgrade documentation -
+ Installation and Configuration -> Upgrade and Downgrade
+ Installation and Configuration -> Troubleshooting
- Review latest hotfix (as of May 2020 - AMX 3.4.0 HF02) readme of AMX v3.4.0 . It may contain fixes/enhancements that will ease the upgrade from AMX 3.3.0 HFxx to AMX 3.4.0 HFxx platform, and resolve some issues that may be relavent to your business use cases.
2. Health check of the environment before upgrade -
If the AMX 3.3.0 Hotfix level is at or higher than HF13, then run "Enterprise Deployment Health Check" to see the overall health condition of the environment, such as
- time taken for Admin server to ping the Admin Database,
- time taken to ping the Qin Notification Server,
- time taken for Admin server to start up, etc.
If the health check results indicate that there are connectivity issue and/or slowness issue, please address those issue before going ahead with the environment upgrade.
3. Make sure the system requirements are still met right before the upgrade. System configuration may get changed from what it should be -
- Hardware, e.g., available disk space. How much additional disk space will be needed in TIBCO_HOME and CONFIG_HOME? In TIBCO_HOME you will need an additional 900 MB for Runtime profile installation; it would take up 1.5 GB if you are installing Runtime and Design Time profile. In CONFIG_HOME, you can expect the disk space to go up by 200 MB per TIBCO Host.
- OS resources, e.g., Unix user limits - open file / max user processes, etc.
- Third-party software, e.g., Oracle - open cursors (especially for BPM Evironment, raise MAX OPEN_CURSORS to a high number e.g. 5000+)
4. Check all applications are in the correct status (i.e. Running & In Sync). If there are any applications which are no longer needed by your business, please remove them before performing the upgrade.
5. Under SystemEnvironment, check if any of the following applications exist and are being used -
com.tibco.amx.commonlogging.logservice.app
com.tibco.amx.commonlogging.payloadservice.app
com.tibco.amx.mcr.aggregator
If these applications are not used or no longer needed, then you can remove them.
Removing "logservice.app" and "payloadservice.app" applications is quite straightforward - just stop, undeploy, delete the applications, clean up their EMS queues and resource instances/templates.
To remove the "mcr.aggregator" application, follow the steps 5.1 through 5.7:
5.1 You can follow the below KB article to disable MCR.
https://support.tibco.com/s/article/Tibco-KnowledgeArticle-Article-41154
Note: Disabling MCR is not mandatory. But we strongly recommend doing it now to avoid any possible troubles with MCR in the future (i.e., in the upgraded AMX 3.4.0 environment).
Examples:
<tibco_home>/tibcojre64/1.7.0/bin/java -Xmx1024m -XX:MaxPermSize=512m -Xms256m -classpath <tibco_home>/tools/lib/antpackage.jar org.apache.tools.ant.Main -f disable_mcr_logger_build.xml -Denv.name=SystemEnvironment -Dnode.name=SystemNode
<tibco_home>/tibcojre64/1.7.0/bin/java -Xmx1024m -XX:MaxPermSize=512m -Xms256m -classpath <tibco_home>/tools/lib/antpackage.jar org.apache.tools.ant.Main -f disable_mcr_logger_build.xml -Denv.name=DevEnvironment -Dnode.name=DevNode1
<tibco_home>/tibcojre64/1.7.0/bin/java -Xmx1024m -XX:MaxPermSize=512m -Xms256m -classpath <tibco_home>/tools/lib/antpackage.jar org.apache.tools.ant.Main -f disable_mcr_logger_build.xml -Denv.name=DevEnvironment -Dnode.name=DevNode2
5.2 Now in AMX Admin GUI, try to undeploy the "com.tibco.amx.mcr.aggregator" application from SystemEnvironment. The undeployment will fail at this time, and the application will become “Not Deployed”. The feature com.tibco.amx.mcr.aggregator.product.feature will become "Marked for uninstall".
Note: We need to try to undeploy the application "com.tibco.amx.mcr.aggregator" at this time (i.e., before editing the SystemNode 'node.xml' file - step 5.4). If we skip this step and jump to the next step, then we will have trouble deleting this “com.tibco.amx.mcr.aggregator” application later.
5.3 Stop SystemHost and SystemNode.
./tibcohost stop -wait true
5.4 In SystemNode configuration folder, remove the MCR custom feature entry com.tibco.governance.mcr.aggregator.product.feature:2.3.0.000 from the <enabled-rus> list of file "node.xml".
<config_home>\tibcohost\Admin-<enterprise>-<admin>\data_3.2.x\nodes\SystemNode\configuration\node.xml
5.5 Restart SystemHost and SystemNode with -clearCache option
nohup ./tibcohost start -clearCache &
5.6 Now we can remove the 'com.tibco.governance.mcr.aggregator.product.feature' feature from AMX Admin GUI. (Feature will be in "Marked for uninstalled" status. Just click the "Apply" button to delete this feature.)
5.7 Delete the "mcr.aggregator" application.
6. If there is any AMX 3.3.0 engineering build (EB) applied in the environment, please uninstall the EB and restore the original plugins jar(s). In particular, if AMX 3.3.0 EB61 has been installed in the environment (in that case we can see a utility application "com.tibco.amf.admin.tools.clean.pendingtask" deployed and running in SystemEnvironment), please contact TIBCO Support on how to remove that EB61.
7. Please check each of the AMX Nodes and make sure under the Infrastructure -> Software Management -> Features tab, there is no feature as "marked for install" or "marked for uninstall". If there is, please correct those features and make sure all features are in "installed" status.
8. Check the EMS Server (Notification Server and Messaging Bus Servers) of this environment to see if there are any pending messages. If yes, then what EMS destinations (e.g., queues) do they belong to? In particular if there are any pending messages in the Deployment Server queues of any host instances or nodes, please check if they can be cleaned up without impacting the business.
com.tibco.amf.admin.deploymentServerQueue.HostInstance01
com.tibco.amf.admin.deploymentServerQueue.HostInstance02:Node002A
==============
B. Pre-upgrade - (prior to running TIBCO Configuration Tool [TCT] to upgrade)
==============
0. Take a reliable, off-line, full backup of the environment. If "Part A. Prior to the Upgrade" is done the same day as this Part B, then the environment backup here can be an optional step.
1. Install AMX Service Grid v3.4.0 software along with the latest hotfixes (as of May 2020 - 3.4.0 HF02) for AMX v3.4.0 . For BPM Environemnt, please also install the latest Hotfix (as of May 2020 - 4.3 HF01) for BPM 4.3 .
Note: Upgrading only to AMX 3.4.0 GA Platform and BPM 4.3 GA versions is supported, but without the necessary AMX 3.4.0 and BPM 4.3 Hotfixes applied a number of workarounds will need to be followed - that will take additional upgrade time.
2. Before launching TCT, please check if the SystemHost or any remote host instance folder (which are under "<config_home>/tibcohost/") has any non-host instance sibling folders such as 'tibamx_patchmgr' folder. If yes, please temporarily move those non-host folder(s) outside the config_home. Otherwise you will run into TCT diagnostic testing failure "NOT ELIGIBLE" for upgrade (see KB article 000039761).
3. Increase TCT (in 'TIBCOConfigurationTool.ini' file) max heap to 4 GB and MaxPermSize to 2 GB (if feasible)
4. Increase max heap sizes (-Xmx) for SystemNode and runtime nodes. This can speed up the admin task generation by AMX Admin server and the task processing by runtime nodes during the environment upgrade.
Note: please check the free or available memory on the node machine to decide how much max heap size you can set aside for the runtime nodes. Do not specify too much -xmx, otherwise there might be insufficient free memory for other processes or tasks on the node VM machine.
e.g.
edit tibamx_SystemNode.tra to set -Xmx8192m
edit tibamx_DevNode1.tra to set -Xmx8192m
edit tibamx_DevNode2.tra to set -Xmx8192m
==============
C. AMX 3.4.0 Platform Upgrade -
==============
0. Shut down all TIBCO ActiveMatrix processes on the machines, and then proceed with the upgrade.
Note: TCT Upgrade wizard does have the ability to shut down all TIBCO Hosts and Nodes. However, if you have any Nodes that are *decoupled* from their managing TIBCO Host’s life cycle, then you need to shut them down *manually*, as TCT stopping the TIBCO Host will not stop those decoupled nodes. That's why we recommend to stop all AMX processes before starting the 3.4.0 upgrade.
1. Perform AMX 3.4.0 upgrade using TCT -> Upgrade or Downgrade TIBCO ActiveMatrix wizard, from each server:
- Upgrade the SystemHost first
- Upgrade the SystemHostReplica (if applicable)
- Upgrade remaining host instances. Please note Admin server (SystemHost and SystemNode) should remain running while upgrading the remote host instances.
If TCT AMX 3.4.0 platform upgrade for a host (SystemHost or runtime host) fails, please contact TIBCO Support.
Notes:
A) When upgrading to AMX Platform 3.4.0, the Nodes are upgraded to version 3.4.0 along with the TIBCO Hosts that manage them. There is no separate steps/commands for upgrading the Nodes.
B) Upgrade is idempotent: that is, in the unlikely event if it fails, the user can re-try (after taking remediating steps) and the upgrade process should resume and do only what is required.
C) The last step of AMX 3.4.0 platform upgrade is for TCT to restart the host instance which in turn starts the nodes that it manages. If you have Nodes that are decoupled from the managing TIBCO Host’s lifecycle, please start them up manually, as TCT starting up TIBCO Host will not start those decoupled nodes.
D) When upgrading a BPM Host instance to AMX 340 platform, if the TCT wizard failed with the following error after waiting for the BPMHost to start for about 30 seconds, this is becasue the host instance has taken too long to start, so TCT gets timed out waiting for the host instance to report that it's started up in time (AMX-17772).
"TIBCO-AMX-TOOLS-PATCHMGR-000601: One or more Host instances have not started or they are still starting. Make sure all Host instances have started. Start a Host instance manually if it has not started."
The resolution is: Since starting host is the last step during amx 3.4.0 platform upgrade of a host instance, the workaround is to simply uncheck the "start host" option on the TCT wizard screen when using TCT to upgrade a host instance to AMX platform 3.4.0 . Once the TCT upgrade completes successfully, then try to manually start up the host instance.
E) Do not kill the TCT Wizard or the Ant process running the upgrade, unless explicitly instructed by TIBCO Support. While the upgrade process is extremely robust and idempotent, and will resume upon restart in the event of abrupt process termination, it is best if you do not kill the upgrade process abruptly.
F) How long will the upgrade process take? Any recommendations to improve the upgrade process?
The upgrade process mainly comprises file copy (mainly copying files from TIBCO_HOME to CONFIG_HOME) and upgrading each Node’s amx.platform.app.
So, if your disk I/O is not optimized then you can expect it to take a lot more time.
Also, if you have TIBCO Hosts managing too many Nodes (each), that will contribute to the overall time taken as well.
In most cases though, it takes no more than 6 minutes to upgrade a TIBCO Host, provided TIBCO_HOME and CONFIG_HOME are on the same physical drive and the TIBCO Host manages ~3 Nodes.
If your TIBCO_HOME and CONFIG_HOME are on *different* physical drives then you will notice that the upgrade is slow. This slowness is directly proportional to the slowness of your machine I/O. The TCT Upgrade wizard does run diagnostics on file copy speed, and will warn you if it is too slow.
Please avoid doing any unrelated, intensive file I/O during the TCT upgrade process, as it will definitely affect the speed of the file copy process, and hence the speed of the upgrade.
If you happen to have many CONFIG_HOME folders on the same machine pointing back to the same TIBCO_HOME, it is recommended that you *upgrade one CONFIG_HOME at a time*, as it will be faster than upgrading all CONFIG_HOMEs simultaneously.
If your CONFIG_HOME has many TIBCOHost(s), they will be upgraded *sequentially*; the TCT Wizard will show you which TIBCOHost is being upgraded at any given time so you can monitor the progress.
2. Verify AMX 3.4.0 upgrade is successful:
- TCT output shows a "Build Successful"
- AMX Administrator is accessible
- After logging in to AMX Administrator: Infrastructure -> Hosts - Verify all hosts are Running and In Sync. The version column lists 3.4.0. If an AMX Platform 3.4.0 hotfix was installed in the pre-upgrade stage, the hotfix number will also be listed here, e.g. "3.4.0 HF02".
- Similarly, Infrastructure -> Nodes (toggle the environment drop-down) - all nodes are Running and In Sync. The version column lists 3.4.0. If an AMX Platform 3.4.0 hotfix was installed in the pre-upgrade stage, the hotfix number will also be listed here, e.g. "3.4.0 HF02".
- All Applications under SystemEnvironment including the amx platform app should be Running and In Sync.
In case any issue occurs during the environment upgrade, or the uprade is a success but some stage of the upgrade appears to be extremely slow, please upload the following logs for Support review:
- TCT logs (tct folder zipped up)
- complete SystemNodes (including user action and task history logs)
- complete BPMHost and BPMNode logs,
- viewaction.jsp page (download ALL user actions)
Notes:
A) If you do not see the newly added Administrator GUI menus such as "Infrastructure -> Health Check" then you might have to clear your browser cache and restart the browser.
B) If the version of a host instance is not displayed as 3.4.0 or 3.4.0 HFxx (rather it displays the pre-upgrade version such as "3.3.0 HF24") in AMX Admin GUI after the host has been upgraded to 3.4.0, this might be a cosmetic issue due to a combination of notification and TCT-initiated platform upgrade (AMX-17692).
Resolution: The issue has been documented in AMSG 3.4.0 Product Doc. We can follow the instructions to run CLI scripts to let AMX Admin display the correct and expected version.
Installation and Configuration -> Troubleshooting -> Upgrade or Downgrade -> Runtime is upgraded but Administrator is not updated with the changes in runtime
"If a Host or Node is upgraded or downgraded successfully in runtime, but ActiveMatrix Administrator is still showing the version before upgrade or downgrade, the task syncupWithRuntime in Host_build.xml and Node_build.xml, which are available in TIBCO_HOME/administrator/<version>/samples, can be used to update Host and Node in Administrator."
https://docs.tibco.com/pub/amsg/3.4.0/doc/html/GUID-A7A9CCAF-F94C-4A8E-A0D7-E07F4B37374E.html
C) If a node is in "Out of Sync" state, then please click the "Install or Sync" button for the node. This may bring it back in sync.
D) If a node shows "Not Healthy" in the "Node Health" column, then click on the "Not Healthy" link to check the reason.
3. This is the end of a successful AMX 3.4.0 platform upgrade of the environment. You can run business tests to verify that the upgraded environment and the applications all work as expected.
4. Take a reliable, off-line, full environment backup (either VM snapshots, or file systems + DBs backups) as explained earlier in this article.
==============
D. Post the AMX 3.4.0 Platform Upgrade -
==============
If the environment is a BPM environment, then after the AMX 3.4.0 platform upgrade is successfully completed, it is expected to see the amx.bpm.app, other BPM platform applications, and user BPM applications not in Running / In Sync state. They will become Running and In Sync after the BPM upgrade is completed successfully.
After TCT for BPM upgrade completes successfully, if the BPMNodes become "Out of Sync", then please click the "Install or Sync" button for each node to try to bring it back in sync. If the Nodes show "Not Healthy" in the "Node Health" column, then click on the "Not Healthy" link to check the reason. If it is because there are old version components Start Failed during BPM upgrade (those components may have failed to start well before the AMX 3.4.0 / BPM 4.3 upgrade, or there are no outstanding PIs/WIs on those old version components), then we can just ignore the Nodes "Not Healthy" state.