Products | Versions |
---|---|
Spotfire Statistica | 12.7 |
Binning of continuous variables reduces the predictive power of the variable in Feature Selection
Cause
For categorical dependent variables, a chi-square test is used in the Feature Selection module to assess predictor performance. A chi-square test requires two categorical variables, so if a predictor is continuous then it must be binned prior to the computing of the test statistic. The bins are generated based on the range of the data but unfortunately, if there are a number of outliers in the lower or upper tails, this can cause the far most right or left bin to contain almost all of the observations. This will reduce the number of bins, the degrees of freedom, and the predictive power of the variable.
To resolve this issue, please follow these steps: