How is the value of 'EST SIZE' statistic in the AS Node status output calculated?
book
Article ID: KB0071222
calendar_today
Updated On:
Description
The EST SIZE statistic is calculated by our storage engine, and it is an estimate of the size on disk (i.e. after compression etc) of the collection of immutable sorted-string table files (sst files) of the live data directory of the node. Checkpoints are not included in the statistic. The estimate works by starting at the bottom layer of the LSM tree and going up and adding up the sizes of all sst files that don't overlap in key range. This means that data dirs which are less compacted (i.e. more levels, more data overlapping between levels) will exclude more sst files from the estimate (and thus have a smaller estimated size) than data dirs which may contain the same data but in a more compacted form.
Note that the value of 'EST SIZE' may be different between primary and mirror girds. A primary grid data dir and mirror grid data dir may show different EST SIZES for a number of reasons. A primary grid live data dir may contain writes after the last checkpoint. Also it may contain keys that are written and then deleted before getting replicated to a mirror grid. A mirror grid node live data dir (in the case of incremental mirroring) contains a computed set of writes that are the delta of two checkpoints. Thus the shape of the mirror grid's live data LSM tree may be different from the primary grid.
Issue/Introduction
How is the value of 'EST SIZE' statistic in the AS Node status output calculated?
Environment
All supported platforms
Additional Information
AS Node Status, Estimated Data Size, EST SIZE
Feedback
thumb_up
Yes
thumb_down
No