If the message in the datastore is corrupted, what is going to happen?

If the message in the datastore is corrupted, what is going to happen?

book

Article ID: KB0072361

calendar_today

Updated On:

Products Versions
TIBCO Enterprise Message Service -
Not Applicable -

Description

1. Symptom:

If you noticed one of the following ERRORs in the EMS log file:

SEVERE ERROR: Exception trying to read message from store
ERROR: Exception trying to create message from store: IO failed
SEVERE ERROR: Exception trying to create valid messages record, Invalid message.
SEVERE ERROR: Persisted message possibly corrupt.
SEVERE ERROR: Exception trying to create valid messages record, Invalid message

Or you noticed "javax.jms.JMSException: Corrupted incoming data message" with your client then you have a corrupted message problem.

Sometimes, you may noticed that messages are piled up on the server due to the application cannot handle corrupted message properly.

2. CRC and datastore record:

For each record in our datastore, there is a record header part and a record body part, which is EMS message and includes EMS header. So that there are two level of CRC, record header CRC and record body (message) CRC.
Also, each record can be either a user data message record, or an internal record.

3. If a record is corrupted for any reasons, what is the result?

The result will be different depending on which part of the record is corrupted and what kind of record is corrupted.

A. The record message part is corrupted:

-- If CRC is disabled, for application date message records (customer's EMS messages), then if a message cannot be reconstructed from the store, the message will not make it to the system, so there is no issue with consumer receiving this message, just as the message does not exist.

However, any record that cannot be reconstructed due to corruption will cause the server to give up on the current record type recovery (there is a recovery pass per record type: connection, session, producer, consumer, etc... and also for application messages from the two stores:

async and sync) but still become active. This kind of corruption will have the same effect as having file_crc enabled since the result is the same: bad record ignored but recovery of the remaining records of the same type is stopped, and the server still becomes active.

CRC is disabled (file_crc disabled), the server may startup ok if the corruption is not preventing the server from reconstructing the message, but the corruption could cause potential issue for consumer to receive this message and construct it from wire format. This could cause application to get an exception, or even crash; or cause the application to receive wrong data.

-- If CRC is enabled (file_crc  = enabled) EMS server starts up, it will compute the CRC after loading the record and compare it with the one being stored. If it is different (a corresponding corruption for the message occurred), then an error is reported in EMS server log. You will notice one of the above ERRORs. When EMS server gets such error, it will cause the server to stop the messages recovery, which means that the rest of other valid messages would not be recovered. 


B. The record header part is corrupted:

-- If the data is a user application data, the application will lose the data.
-- If the data is an internal record, it depends on the content of the record, the result could be very serious. (for example if a consumer record for a durable subscriber is lost them the server cannot recover messages for that durable).


4. Solution:

In the case of that the customer messages are corrupted, but server sends the corrupted messages to the client, and the client cannot handle the corrupted messages, to get rid of the corrupted message you can:

A. If you enable the "track_message_ids" then you can run tibemsadmin command delete message <messageID>

B. For queue messages, you could set maxRedelivery=<value>, and set pre-fetch=none property, and run DestroyQueueMsg.java, this code explicit ACK and do not confirm the messages that need to be removed.

C. Purge the messages, that is the entire destination containing the corrupt message.

Since EMS 5.0, error recovery policy has been added into EMS server.  With default option, the EMS 5.0 server will exit startup completely when it detects an error due to file corruption or other. This gives the user a chance to decide what to do.  EMS 5.0 introduced '-forcestart' command line option that changes this behavior to strive to go active regardless of the error.  When this flag is used, the server will delete bad records so that during the next start, the flag should not be necessary.  Furthermore, the server will act as if it did not find the record to prevent corruption of its run-time state.  For details, please refer to EMS User's Guide: Chapter 5 Running the EMS Server: Error Recovery Policy.


5. Possible reasons you got the corrupted messages:

A. A hardware problem on the disk: Media error
B. Forced unmount of a physical disk when the disk is in use, or a hard disk in operation have been unplugged
C. Locking problem, the record in the db file is modified by different applications/threads of the application at the same time.
D. EMS server receives a corrupt message.
E. The machine was terminated abruptly. (A running system in operation have been unplugged, a fire or an explosion occurred.)
F.  Vmotion, Snapshots/VM shifts 

Issue/Introduction

If the message in the datastore is corrupted, what is going to happen?

Environment

All