The order of embedded data can affect Spotfire file size.

The order of embedded data can affect Spotfire file size.

book

Article ID: KB0080776

calendar_today

Updated On:

Products Versions
Spotfire Analyst All Versions

Description

You may notice that the size of .dxp files can vary a great deal even though they have essentially the same data (same number of rows and columns, and same values).

 

Resolution

Whether or not a data set is sorted can directly affect the size of a Spotfire .dxp file that has the data embedded. This means that an unsorted data set can result in a final Spotfire .dxp size that is approximately 10 times or more larger than a .dxp file that is based on the same data set which is sorted. This is because the sort order (sorted vs unsorted) of the data directly effects its compression and therefore also affects the Spotfire file size, since the Spotfire .dxp file is an archive, similar to a .zip file.

You can see this behavior by archiving/compressing two text files, one with an unsorted data set and one with the exact same data set but ordered. The file size of the .zip will be different. Spotfire compresses its embedded data in much the same way, so the size of the Spotfire .dxp file will also differ for the same reason.

Attached are two data sets which contain the same data, only one of which is sorted:
  • Data-Sorted.txt
    TEAM
    Atl.
    Atl.
    Atl.
    Atl.
    Atl.
    Atl.
    Atl.
    Atl.
    Atl.
    Atl.
    ...
  • Data-Unsorted.txt
    TEAM
    Det.
    Sea.
    Tor.
    N.Y.
    K.C.
    Mon.
    Pit.
    Chi.
    Phi.
    K.C.
    Min.
    ...

Here you can see the size of the files when this data is added to a .zip archive and when it is loaded in to Spotfire, which demonstrates this behavior. The Spotfire .dxp size is 13.6 times larger when the data is unsorted:
 Sorted DataUnsorted Data
Raw data size in bytes3,893,7823,893,782
Archive size in bytes (.zip)7,893405,810
Spotfire file size in bytes (.dxp)29,681486,540

 

This can explain why some Spotfire reports are much larger than others, even though the data is essentially identical, and the data may look identical when sorted within Spotfire.

Note: It is normally not required to consider this data sorting since most data sets will be more complex with many columns which means there will be diminished benefit of having the data sorted on a single column, since that will result in other columns being unsorted. This can have a larger impact on "tall and skinny" data sets with many rows but very few, or only one, columns.

Issue/Introduction

The order of embedded data can affect Spotfire file size.

Attachments

The order of embedded data can affect Spotfire file size. get_app