Why do I have"ERROR: Unable to open metadata file '/usr/tibco/datastore/meta.db', file may be locked" error message?

Why do I have"ERROR: Unable to open metadata file '/usr/tibco/datastore/meta.db', file may be locked" error message?

book

Article ID: KB0092803

calendar_today

Updated On:

Products Versions
TIBCO Enterprise Message Service -
Not Applicable -

Description

Resolution:
-- Possible Reasons for noticing this error message:

The primary server sends heartbeat messages to the backup server to indicate that it is still operating. When a network failure stops the servers from communicating with each other, the backup server detects the interruption in the steady stream of heartbeats.

To prevent the backup server from assuming the role of the primary server when primary is still up, the primary server locks the shared state during normal operation.

When a backup server detects the failure of the primary server, then the backup server attempts to assume the role of primary server. By doing so, the backup server tries to open the store files. If the primary is still running, this operation should fail since the lock is maintained by only one process at a time. If the primary server failed, then the lock will have been released and therefore the backup will be able to open the store files.


You may notice this error messages during following saturations:

1. There is a network problem

2. The primary EMS server may be hung, or it may have a poor performance problem.

3. The file system or File-Lock Manager does not work as required. For more information, please refer to

TIBCO EMS User's Guide=&gtChapter 13 "Fault Tolerance" => Implementing Shared State

4. The user account which starts EMS server does not have privilege to write .db files.


-- Trouble Shooting:

Check the following first:

1. Can you see primary server process?

    1.1 YES(primary server is running), then you should run tibemsadmin command "show server" to see if you can get the server information.
        1.1.1 Yes, it is more like a network issue.
    1.1.2 No, primary EMS server may hang, or poor performance, network problem
    
    1.2 No (primary server is not running), then it is more like a file lock issue or file write privilege issue.

-- Information needed for investigating this issue:


1. EMS main config file for both primary and backup server.

2. EMS log file for both primary and backup server.

3. If you suspect a network problem:
-- From primary server box "ping" backup server box;  and from secondary server box "ping" primary server box.
-- two "netstat -s" from each box, apart from 10 minutes
-- raw network packet capture from each box for about 10 minutes. To capture the data, you could use tcpdump. Windump, eithereal/Wireshark, snoop, ...

4. If you suspect a file lock issue:

Here is a little test program that may help in debugging this issue. To compile it, do something like:

    $ cc file_lock.c -o file_lock

It has the following usage: file_lock &ltfilename> [seconds]

What it does is opens the specified filename and then attempts to lock it, just as the EMS server does. It then unlocks the file, closes it and exits. If you specify [seconds] then after locking the file the program will sleep this number of seconds. This allows a second instance of this program to be run against the same file to verify that the file system did in fact lock the file.

For example, in one console run file_lock against a file and sleep for 60 seconds:

    $ file_lock testfile.txt 60

You will see the output:

    opened file testfile.txt
    locked file testfile.txt
    sleeping for 60 seconds

In order to test that your system is correctly releasing the file lock in unexpected conditions, it is recommended to also simulate a power down situtation. If this is a test set-up and you're able to power down the box, do it within 60 seconds.

In a second console run file_lock against the same file:

    $ file_lock testfile.txt

If you see the output

    opened file testfile.txt
file testfile.txt already locked by another process

This would mean that file lock was not released during power down.

Please try this and see if the file locking works correctly.

Issue/Introduction

Why do I have"ERROR: Unable to open metadata file '/usr/tibco/datastore/meta.db', file may be locked" error message?

Attachments

Why do I have"ERROR: Unable to open metadata file '/usr/tibco/datastore/meta.db', file may be locked" error message? get_app