Can I store a CM ledger file on NFS, SAN, or NAS system?

Products	Versions
TIBCO Rendezvous	-
Not Applicable	-

Description

Resolution:
Remotely accessing a ledger is not strictly forbidden or unsupported; the issue is whether or not sufficient thought has been given as to the impact of a ledger being made 'unavailable' to an RVCM application.  In the end, it is a business decision that needs to be made based on needs and risk.

There are three main issues here:

1. RV can only ensure data integrity to the level of the return codes passed to us via the OS library calls.

This means that the storage vendor must give assurances that when the RV API is told that "the write completed without errors" that it indeed occurred.  It is possible to configure NFS so as to 'cheat' in what it tells the file writer, and under those conditions we cannot be responsible for the integrity of the user data.  It is unlikely to have a write fail unobserved on a locally mounted file system with synchronous writes being used.

2. An NFS system uses UDP RPCs (point-to-point UDP) to communicate between the client and server NFS daemons.

This implies that operating system resources are shared between the networking software (RV, TCP connections, etc.) and the file system.  If these become scarce, you have a potential compromise to the integrity of the file system or the UDP pool space available to the file system and to the RVD.  By extension, this can cause issues with the network file system.

SAN (Storage Attached Network) systems use a multi-port RAID device that has multiple SCSI-3 or fiber-channel connections that are shared via specialized disk device drivers that have rollback, check pointing, etc. handled transparently to the application - and these also have their own dedicated kernel buffers managed by the drivers.

Directly-served local disks or SANs have less of a potential resource issue.

3. RVCM is *not* a spooling system - the ledgers are a file-based buffer system that is being used by the CM layer to persist the messages intended for delivery.  Failure of the file system is similar to failure of system memory for the RVCM layer.  If the network is compromised, there is no persistence of the messages.  Applications must have a fallback state/behavior if the RVCM transports start returning send failures to the applications.

Some additional remarks:

We would not trust a 'host-based' NFS server to host ledger files.  These are general purpose machines that have not been designed to act solely as file servers.

Network Appliance has a robust design that can be relied upon to provide the kind of integrity that the customer intends.  Network Appliance has provided assurances to TIBCO that once the API has returned and the return code indicates that the file has been written, we can rely that this has occurred.  If there is some level of write failure at that point, Network Appliance has said that they would be responsible for it.

Besides running a Network Appliance file server, you want the file server network to be isolated from any non-file system traffic (no TIB, no extraneous TCP/IP, etc.) so that any poorly behaving application cannot compromise the file system availability for the RVCM API.  Customers who run an isolated file system network have far fewer problems than those who try to 'share' the network with other services.

The last condition is to ensure that there is sufficient network bandwidth and local buffer space available so that the performance is good and so that there are more than sufficient networking resources so that an RVCM application does not get 'starved' for disk buffers - this would cause resource waits and significantly reduced performance.  You want sufficient buffer pool available so that the networking system is not starved and you want the bandwidth on the storage network so that other types of traffic do not interfere with file traffic.

Finally, applications should be designed to take into account the possibility that ledger access may be denied to them.  Whether the application should shutdown, stop network sends and cache locally, or raise an alert to the users is a design decision, but one that has to be made.

If these architectural conditions are met, there should not be an issue with a remotely mounted ledger system.  Without this much care and isolation being built into the design of the system we know that there is a non-zero chance of losing critical business data in an NFS ledger system due to "write failure."  It has happened at least once in the last 12 months for one customer, but this customer was using a host based NFS system.

We have had no reported data loss due to file system write failures on SAN, directly attached, or Network Appliance file systems.

Issue/Introduction

Welcome to "KB Articles"