Products | Versions |
---|---|
TIBCO LogLogic Enterprise Virtual Appliance | all versions |
When collecting file-based data a best practice is to monitor LMI's /var/log/sys.log for activity that suggests the system is collecting more data than it can process in a timely manner. Alerts can be configured in the LMI GUI using the sample log below. Specifically this "timely manner" timeframe is defined as a 2 day window. The reason it is 2 days is because of a built-in cleanup task in LMI that prunes collected files in a rolling 2 day window (86400 seconds in 1 day * 2 days=172800 seconds, as referenced in the event below) from the filesystem to ensure the filesystem storage does not reach 100%. This is similar but not exactly the same as the emergency purging mechanism. The emergency purging mechanism only operates on the BFQ (/loglocal/data/vol1/) and is managed by engine_archive whereas this 2 day cleanup maintenance process occurs on /loglogic/data/filecollector and is managed by engine_filecollector.
If you see messages like the following in /var/log/sys.log then you'll need to investigate and maybe tweak LMI's settings to ensure collected logs are not deleted before they can be processed from the staging directory into the BFQ. Contact TIBCO LogLogic Technical Support if you need assistance or have questions about this.
<14>May 20 21:30:25 logapp COLLECTOR_FC: %LOGLOGIC-6 module:engine_filecollector(21577); file:engine_filecollector.c(flush_space,7767); action:--file /loglogic/data/filecollector/archiver/38_20_192.168.179.89_29_1526689766_1.txt.xml deleted, reason: its mtime (1526689768) is beyond the threshold (172800). ;
Data is deleted from /loglogic/data/filecollector/ sub-directories based on 3 different conditions:
1. If all forwarding rules are removed then the remainder of files in /loglogic/data/filecollector/forwarder are deleted.
2. If a file is older than the time threshold of 2 days then it is removed. This is the focus of this article.
3. If the disk usage exceeds the disk usage threshold then the oldest file is removed from both archiver/ and forwarder/. The threshold is different for each directory though.
For /loglogic/data/filecollector/archiver the threshold is 20GB * the value in the /loglogic/conf/capability.xml file for flush_old_files_size, or 60GB = 1.2 TB.
For /loglogic/data/filecollector/forwarder the threshold is 20GB * 2 * the same value in capability.xml = 2.4TB.
The 1st and 3rd conditions usually aren't triggered but the 2nd condition can easily be triggered if the system is overloaded.
LogLogic LMI supports parallel file collection, parallel file processing and parallel file forwarding but by default these are not enabled. So if the appliance is logging errors about files being deleted from the filecollector sub-directories it could simply be because parallel processing needs to be enabled to ensure all the downloaded files are processed more quickly. If parallel processing is already enabled then the settings may need to be increased, or if that has already been attempted then it could be an indication that the system is simply unable to keep up with all the collected data, regardless of the custom settings applied to the configuration.