How to determine if an fsck is running and how long will it take to complete on a Tibco LogLogic LMI appliance?

Products	Versions
TIBCO LogLogic Log Management Intelligence	all versions

Description

If you have rebooted or just powered on your system and the system has booted up but has not yet reached the login stage after the normal amount of time it takes to reach that point then an fsck may have been triggered. This occurs because either the # of mounts for a given filesystem exceeded the max mount count or because more than 6 months has elapsed since the last fsck for a mount point. The only way to verify an fsck is running is to use the iDRAC virtual serial connection (only available on H4 and newer hardware) or use a physical serial console connection (not VGA) to the physical appliance. Note: The iDRAC virtual console is not sufficient because it uses a VGA-level direct connect. For EVAs you can use the ESXi VM console. When using a VGA connection you'll only see "Probing EDD" as the last message on the terminal screen. This can look like the system is frozen but actually what has occurred is all output from that point on has switched over to serial instead of VGA when the Linux kernel began loading.

If an fsck is indeed executing do not interrupt it. Rebooting the system will simply prolong the fsck because it will execute automatically again during the next boot until it is able to finish and reset the flags that trigger it.

The more inodes in use means there is more metadata that exists, which means there is more data for fsck to verify and potentially repair (see References for a link to a webpage that documents the various fsck phases). So an appliance with 30% byte usage on /loglogic but 75% inode usage will have fsck take longer than if the byte usage was 70% and inode usage was 30%. This is because fsck is focused on the metadata rather than the contents of the files. Therefore, fsck takes time proportional to the amount of file system metadata and involves many seeks to follow links and block pointers. The amount of metadata depends on a number of factors, but it usually correlates to size of the filesystem and the amount of data. If your system has many small files due to collecting file-based data then your system may not fall into the 'usual' category.

Here is an fsck comparison between an ext3 files system compared to an ext4 file system which can give us a “general notion” of how much time one might expect for fsck to complete.

    32 GB fsck for all five fsck phases on and ext3 filesystem took 23.06 minutes.
    32 GB fsck for all five fsck phases on and ext4 filesystem took 13.78 minutes.

EXT3 filesystem Calculations:
    23.06 minutes / 32 GB = ~.7 minutes per 1GB

    1 GB = .7 minutes
    10 GB = 7 minutes
    100 GB = 70 minutes
    1000 GB = 700 minutes / 60 minutes per hr. = 11.66 hours per 1TB

Therefore, it is not unreasonable if an fsck is taking many hours. If there are many errors to repair then this can make the fsck take longer. The specifications of the hard drives make a difference too, as well as the RAID level. So for EVAs where customers provide their own hardware the fsck times may be faster or slower than LogLogic LMI physical appliances.

In some cases the extent of the errors may be sufficiently severe that the fsck command will stop and force you to manually execute it. The Linux OS will then prompt for the root password in order to enter into a maintenance mode:

loglogic: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
fsck failed for at least one filesystem.
Please repair manually and reboot.

Give root password for maintenance
(or type Control-D to continue):

If you press Ctrl+D the system will reboot without performing a fsck so you do not usually want that option. When the system is stopped at the 'give root password for maintenance prompt' there will not be any obvious indication that the system is waiting on user input. The only way to know is to be monitoring the system using the iDRAC virtual serial connection or the physical serial connection. To configure the iDRAC virtual serial refer to article 000030625. Note that starting with LMI 6.1.0, all new installations (for EVA and non-EVA) of LMI use the ext4 filesystem for all local filesystems.

Issue/Introduction

This article discusses the factors that play into how long fsck can take to run on an LMI appliance but there is no exact way to calculate it. The duration is more of an unknown for EVAs because Tibco does not provide the hardware for those so disk performance will differ for everyone.

Additional Information

Phases of an fsck: https://www.cs.duke.edu/csl/docs/sysadmin_course/sysadm-86.html
Article 000030625 - TIBCO LogLogic LMI - How to Redirect the Console Output To iDRAC SSH Session

Welcome to "KB Articles"