How to verify and "clean" data?

How to verify and "clean" data?

book

Article ID: KB0081798

calendar_today

Updated On:

Products Versions
Spotfire Statistica 12.7 and above

Description

This article talks about how to verify and "clean" data.
 

Issue/Introduction

How to verify and "clean" data?

Resolution

Select Verify Data from the Data - Verify Data submenu to access an interactive data-verification and cleaning facility. Use the options in the Verify Data dialog to enter the conditions to be met by the data.

Follow the standard syntax conventions common in Statistica to all those procedures that involve any operation of selecting cases based on their values. It's also possible to save the current verification condition to a text file or open a file with previously saved conditions.

The verification can be as simple as checking whether values in a variable are "legal" (e.g., only 1 and 2 might be allowed for Gender) or whether they fall within allowed ranges of values (e.g., Age must be more than 0 and less than 100). It can also be as complex as checking multiple logical conditions that some values must meet in relation to other values.

Consider this example of conditional verification: If a person is a male or less than 10 years old, then the number of pregnancies for that person cannot be more than zero. In order to apply these conditions, specify (for example):

Invalid if: (v1='MALE' or AGE<10) and PREGN>0

Once the verification condition(s) have been entered, click either the Find First button to select the first invalid case in the data file (after this first case has been selected, then find the next case by selecting Find Next Invalid Case from the Data - Verify Data submenu) or click the Mark All button to mark all of the invalid cases in the data file according to the Marked Cells spreadsheet layout.