Why is the actual trained data smaller than your training set input in Neural Network Model?

book

Article ID: KB0074068

calendar_today

Updated On:

Products	Versions
Spotfire Statistica	13.0 and higher

Description

After using the Automated Neural Networks module to build a model, you find the actual trained data is less than your input training sample. This article explains two possible causes for that:

1. By default, the SANN module uses the "Random Sampling method" to train the model based on 70% of input data.

2. If missing data(MD) is represented in variables selected for the model, it is by default to use Casewise deletion where any cases with missing values are omitted when generating results.

Environment

Windows

Resolution

For situations related to cause 1:

Under the Sampling (CNN and ANS) tab in the SANN analysis dialog, you can adjust ratios of the random sampling method or use a sampling variable to indicate your training/testing/validation set. Click "?" on the top right to learn more details about these options.

For situations related to cause 2:

On the right panel of the SANN analysis dialog, you can select "Mean substitution" to replace MD with sample mean for continuous variables. Click "?" on top right to learn more details.

Or you can use other MD processing functions in Statistica. For example, "Filter cases/variables" is introduced in this article, "Processing MD" is introduced in this article.