How to configure NFSv4 as shared storage for EMS Fault ToleranceNFSv4 mount options on Linux

How to configure NFSv4 as shared storage for EMS Fault ToleranceNFSv4 mount options on Linux

book

Article ID: KB0072324

calendar_today

Updated On:

Products Versions
TIBCO Enterprise Message Service -

Description

How to configure NFSv4 as shared storage for EMS Fault Tolerance?

Issue/Introduction

How to configure NFSv4 as shared storage for EMS Fault Tolerance

Environment

Linux

Resolution

If this is for a on-prem configuration, in a physical or virtual environment, there are many customers that use NFSv4 as shared storage for EMS F/T.

The following is what we have used for configuring NFSv4 with Linux with great success:

The NFS Server:
This is usually a NAS. The NAS must be configured to support NFS v4 or v4.1. The default is usually still NFSv3.
Also, ensure the /etc/exports is setup up correctly.

The NFS client:

In /etc/fstab, for the mount, use:

<device/mount> /<local mount> nfs proto=tcp,rw,nfsvers=4,soft,timeo=300,retrans=2,actimeo=1,_netdev 0 0

This has shown to provide the most reliability, and write performance.

In EMS:

TIBEMSD.CONF - The client/server heartbeat/timeouts must be set to ensure that the clients can reconnect after network or system failure. Failure to do so, will prevent the EMS clients from reconnecting and continuing without having to restart the applications and to prevent message loss. The following values were used successfully:

server_heartbeat_client = 10
client_timeout_server_connection=30
client_heartbeat_server=10
server_timeout_client_connection=30

If there are routes, then the server_hearbeat_server and server_timeout_server_connection should also be defined.

CLIENT Connections - If EMS Connection Factories are used, the client re-connect parameters must be defined in factories.conf, or directly in the client code or TIBCO product TRA file. The following are examples and the clients will attempt to reconnect for 100 seconds:

reconnect_attempt_count=100
reconnect_attempt_delay=1000
 
Failure to set these parameters will cause the connection to be purged after a fail-over, and the clients will have to be restarted.

For the file-based store, you must set the following parameters to ensure proper data integrity is maintained during various disk write error situations:

1. Set the file_crc store property of each file-based store to true.  We recommend this be applied to all production EMS servers as soon as possible.  Enabling CRC checksums on the EMS message data will prevent corrupted records from being read and processed.  In a future release of EMS, this option will be ignored and CRCs will be enabled on all persisted message data. 

2. Set the always_exit_on_disk_error global system property to true.  We recommend this be applied to all production EMS servers as soon as possible.  There is a defect present in all versions of the EMS server wherein the server may not exit properly when encountering a non-retryable disk error upon accessing a file-based store.  Setting always_exit_on_disk_error to true will ensure that the server exits when a disk error is encountered.