How does Spotfire S+ calculate the upper and lower hinges in the boxplot() command?

How does Spotfire S+ calculate the upper and lower hinges in the boxplot() command?

book

Article ID: KB0080877

calendar_today

Updated On:

Products Versions
Spotfire S+ All supported versions

Description

When looking at the output for a boxplot() command the upper and lower hinges are not the same as the 25% and 75% quantile.  Why is this?
 

Issue/Introduction

How does Spotfire S+ calculate the upper and lower hinges in the boxplot() command?

Environment

Product: TIBCO Spotfire S+ Version: All supported versions OS: All supported operating systems --------------------

Resolution

The boxplot function uses hinges, as originally defined by Tukey, for the lower and upper limits of the box. The hinges are the median value of each half of the data where the overall median defines the halves. Hinges are similar to quartiles. The main difference between the two is that the depth (distance from the lower and upper limits of the data) of the hinges is calculated from the depth of the median. Hinges often lie slightly closer to the median than do the quartiles. The difference between hinges and quartiles is usually quite small. If you are interested in quantiles, you should use the quantile() or summary.default() functions instead of the stats component returned by boxplot.

Here is the formula, translated from the Fortran code used in boxplot() to Spotfire S+ code.  

Note that the definition of quantile that boxplot() uses is a bit different than what the Spotfire S+ quantile function does -- if the boxplot quantile is not right on a data point then it is constrained to be halfway between 2 data points.  The Spotfire S+ quantile function uses a weighted average of the 2 surrounding data points, with the weight depending on the quantile.

f <- function(x, quartiles = quantile(x, (4:0)/4), n = length(x))
{
    quartiles<-as.matrix(quartiles)
        conf <- matrix(NA,nrow=2,ncol=ncol(quartiles))
    # note that quartiles 0 and 5, min and max, are not used here.
        conf[1,] <- quartiles[3,] + 1.7 * ((1.25 * (quartiles[2,] - quartiles[4,]))/(1.35 * sqrt(max(c(n, 1)))))
        conf[2,] = quartiles[3,] * 2. - conf[1,]
        conf
}

E.g.,

> data<-fuel.frame$Fuel[fuel.frame$Type=="Compact"]
> z<-boxplot(data,plot=F)
> z$conf
         [,1]
[1,] 4.370558
[2,] 3.962775
> f(data) # this is a little different, because of difference in quartile computations
         [,1]
[1,] 4.339295
[2,] 3.994038
> f(quartiles=z$stats,n=length(data)) # use boxplot's def. of quartiles.
         [,1]
[1,] 4.370558
[2,] 3.962775