book
Article ID: KB0080525
calendar_today
Updated On:
Description
Why is the error message:
fitted values close to 0 or 1 in: family$deviance(mu, y, w)
Model unstable; fitted probabilities of 0 or 1 in: family$deriv(mu)
coming up when running gam() with large weights?
Issue/Introduction
Error messages from gam() function call
Environment
Product: TIBCO Spotfire S+
Version: All supported versions
OS: All supported operating systems
--------------------
Resolution
In particular this problem exists when the response variable is a 0/1 numeric vector and the weights are "large" values:
# Example data
sex <- factor(rep(c("M", "F") , c(6,6)))
my.res <- factor(c(0,1,1,1,0,0,1,1,1,1,1,0))
# Fits model with weights=20
> gam(my.res~sex, weights=rep(20, 12), family="binomial")
Call:
gam(formula = my.res ~ sex, family = "binomial", weights = rep(20, 12))
Degrees of Freedom: 12 total; 10 Residual
Residual Deviance: 274.49
# Has problems with weights = 200
> gam(my.res~sex, weights=rep(200, 12), family="binomial")
Warning messages:
1: fitted values close to 0 or 1 in: family$deviance(mu, y, w)
2: Model unstable; fitted probabilities of 0 or 1 in: family$deriv(mu)
Call:
gam(formula = my.res ~ sex, family = "binomial", weights = rep(200, 12))
Degrees of Freedom: 12 total; 10 Residual
Residual Deviance: 2744.9
A: When the response vector is numeric it is assumed to hold the data in ratio form where y(i) = s(i)/n(i), which means that it is the proportion of successes, s(i)/n(i). The weights vector should then contain the n(i)'s.
In this case gam() has numeric difficulties with the large units of the weights, and you get the warning message. The gam() function does not rescale the weights, because if they are really numbers of observations in a group, then we need to leave them as is for the tests and p-values to work out.
However, if your weights are correct according to the definition above and you are still encountering this error, you may want to try renormalizing the vector. For the above model with 200 as the weights, the data says that you had 4 cases where you observed 200 out of 200 failures and 8 cases where you observed 200 out 200 successes and no cases where a mixture of successes and failures were observed out of 200 trials.