How to unstack/stack data in Statistica?

How to unstack/stack data in Statistica?

book

Article ID: KB0076964

calendar_today

Updated On:

Products Versions
Spotfire Statistica All supported versions

Description

Statistica offers options to perform unstacking or “stacking” of data files to make them more suitable for statistical analysis. The general operation or restructuring of the data is similar to a cross-tabulation of values (given certain categorical variables).  These operations are especially useful when accessing data from databases, where the individual data values are identified by case and variable IDs, and each unique combination of these IDs are contained in a new row of the data file.
 

Issue/Introduction

How to unstack/stack data interactively in Statistica?

Resolution

Unstacking Data

The Unstack option can be used to create a new data spreadsheet so that each case in the new (unstacked) data file will contain a unique combination of values or codes found in the selected Case ID variables, and the new data file will contain variables created from the unique combination of values for the codes found in the Code (column) variables.  The measurements found within the new data file come from the defined Unstack (value) variable selection.

Example 1

1. Launch Statistica  and click on the Home tab.  Then click on the Open drop-down menu and select Open Examples.  Double-click on the Datasets folder and then open the EnginePerformances.sta data file.

User-added image

Notice that this file has a Serial Number variable where each value is a unique case identifier.  There is also a Training variable that has two categories – 0 and 1.  So the data file could be ‘unstacked’ on the values for any of the continuous variables for each unique combination of each serial number and each training value. We'll assume that Efficiency is the variable measure that needs to be unstacked.

2. Click on the Data tab.  In the Transformations group, click the Stack button.  On the Unstacking/Stacking dialog, select the Unstacking tab

    User-added image

3. Click the Variables button.  Select Training for the Code (column) variable,variables Efficiency and Fuel Economy(%)  for the Unstack (value) variable, and Serial Number as the Case ID (row) variable.

User-added image

4. Click OK twice to get the unstacked data file.  The resulting spreadsheet has the Serial Number variable listed as well as variables for each of training codes.  The measurements in the spreadsheet are the efficiency values with respect to that serial number’s training code.

User-added image

Unstacking Data - Example 2

Statistica also offers different selection options to aggregate the measurement values for each unique value of the case identifier when combinations of the case and row variables are not unique. To understand this with an example:

1. Open the Activities.sta example data file (located in the same example datasets folder as in the previous example)

User-added image

2. For this data set, lets assume we want to unstack the Shopping and Sleep hours for each Gender and take the mean per region. Click on the Data tab.  In the Transformations group, click the Stack button. 

3. On the Unstacking tab, click the Variables button.  Select GENDER for the Code (column) variable, SHOPPING and SLEEP (and optionally other metrics available in the file) for the Unstack (value) variable, and GEO.REGION as the Case ID (row) variable.  Click OK .

User-added image
4. Since the Case ID specified here is not unique, we will choose Take the mean value from the drop down options of parameter " When multiple rows with same key values are found"

User-added image


5. The resulting spreadsheet will have the mean of all the metrics selected for each variable in Unstack value for the variable chosen for Code (GENDER)

User-added image

This can offer quick insights such as Males Shop more in the West than in the East and Females sleep more in the West than in the east.

Note that the case id variable is optional. This option may be useful when we want to unstack values based on codes but we want to consider every case as a separate record (even though there isn't a specific variable that may identify for CaseID) .



Stacking Data

The Stacking option can be used to create a new data spreadsheet so that for each case in the input and for each selected variable, this operation writes a new case to the resulting spreadsheet.  The value(s) of the selected variable(s) will be placed in the Destination Variable for this new case, and the name of the selected variable(s) will be placed in the Code Variable for the new case.  All other unselected variables’ values are replicated.  So, ignoring missing data, the stacked data spreadsheet will have a number of cases equal to the number of cases in the input spreadsheet times the number of variable selected.

Example 3

1. Click on the Open drop-down menu and select Open Examples.  Double-click on the Datasets folder and then open the Accident.sta data file.This data can be ‘stacked’ such that there is a variable for the two years with the number of accidents in a separate variable

User-added image

2. Click on the Data tab.  In the Transformations group, click the Stack button. 

3. On the Unstacking/Stacking dialog and select the Stacking tab.  Click the Variables button.  Select both Y_1983 and Y_1985.  For the Destination variable name enter Accidents; for the Code variable name enter Year.

User-added image

4. Click OK to get the ‘stacked’ spreadsheet.

User-added image

This stacked output can be used in an analysis where Year is expected to be used as a variable.

The stacking and unstacking operation is also available to be used in a workspace. The stacking and Unstacking nodes have the same options described above and the resulting spreadsheet is available to subsequent nodes in downstream analysis.

User-added image

 

Additional Information

See  also :  How to use Crosstab in Enterprise?
See also :  How to stack variables in a workspace ?