Algorithm of In-Database CHAID analysis node in workspace

Algorithm of In-Database CHAID analysis node in workspace

book

Article ID: KB0080540

calendar_today

Updated On:

Products Versions
Spotfire Statistica 13.0 and higher

Description

Does In-Database CHAID analysis (In-DB CHAID) use the same algorithm as in Standard, Exchaustive, or Advanced I-Tree CHAID analysis?

Issue/Introduction

Algorithm of In-Database CHAID analysis node in workspace

Environment

Windows

Resolution

In Statistica workspace node browser, there are 3 different SVB nodes for I-Tree CHAID regression analysis:  “CHAID Standard Regression (SVB)”, “Advanced Regression CHAID (SVB)”, and “Exhaustive Regression CAHID (SVB)”.

User-added image

In-DB CHAID does the regular/non-exhaustive type of I-Tree CHAID analysis. The Advanced CHAID node in above screenshot is the WPF node version of the Standard CHAID algorithm.

Workspace In-DB CHAID node uses similar algorithm as in Advanced CHAID node, except some minor differences observed when comparing the parameters for the two nodes:

1. The intervals for continuous variables are based on percentiles (value range 2-50 allowed) for In-DB CHAID node, while ITree CHAID (Advanced CHAID) uses General Chi-square statistics. 

In-DB CHAID Interval Options for continuous variables:

User-added image

Advanced I-Tree CHAID Chi-Square statistics for continuous variables:

User-added image

2. The stopping parameter: In-DB CHAID node uses "minimum n of cases" while Advanced ITree CHAID node uses "minimum n (%) of cases".

In-DB CHAID stopping parameter: minimum n of cases:

User-added image

Advanced I-Tree CHAID stopping parameter: minimum n % of cases:

User-added image

Note:

1. Users may not get similar results for In-DB CHAID and Advanced I-Tree CHAID due to above two differences.

2. If user only has categorical predictors and no continuous predictors in the analysis, he may can get similar results by adjusting the stopping parameter of minimum number or % of cases in the two nodes. In such situation, user can compute the minimum % of cases such that it will match the minimum number of case as stopping parameter.