Constant misclassification rate from Random Forest Classification analysis

Constant misclassification rate from Random Forest Classification analysis

book

Article ID: KB0076593

calendar_today

Updated On:

Products Versions
Spotfire Statistica 13.3.1 and later

Description

There are cases where Random Forest Classification analysis gives constant misclassification rates for train and test datasets. 

User-added image
 

Issue/Introduction

Constant misclassification rate from Random Forest Classification analysis

Environment

Windows

Resolution

A way to troubleshoot is to check the frequency of the categories of the dependent variable in user's dataset for Random forest classification analysis.

If the data is highly imbalanced in terms of the class frequencies of the dependent variable, there is a possibility that, the majority class could be dominant and the algorithm does not have enough data to learn the rules to identify the minority class, so regardless of what happens every leaf node predicts the dominant class and every observation of the minority class could be misclassified leading to a constant misclassification rate.

A suggested solution to avoid this problem is to do a stratified sampling of the dataset to make the sampled data more balanced.