[R] Bootstrap 95% confidence intervals for splines

Tim Hesterberg timhesterberg at gmail.com
Sun Mar 27 17:06:48 CEST 2011


You're mixing up two concepts here,
  - splines
  - bootstrap confidence intervals
Separating them may help cut the confusion.

First, to do a bootstrap confidence interval for a difference in predictions
in the linear regression case, do:

repeat 10^4 times
  draw a bootstrap sample of the observations (subjects, keeping x & y together)
  fit the linear regression to the bootstrap sample
  record the difference in predictions at the two x values
end loop
The bootstrap confidence interval is the range of the middle 95% of
the recorded differences.

For a spline, the procedure is the same except for fitting a spline regression:

repeat 10^4 times
  draw a bootstrap sample of the observations (subjects, keeping x & y together)
  fit the SPLINE regression to the bootstrap sample
  record the difference in predictions at the two x values
end loop
The bootstrap confidence interval is the range of the middle 95% of
the recorded differences.

Tim Hesterberg

P.S. I think you're mixing up the response and explanatory variables.
I'd think of eating hot dogs as the cause (explanatory variable),
and waistline as the effect (response, or outcome).

P.P.S.  I don't like the terms "independent" and "dependent" variables,
as that conflicts with the concept of independence in probability.
"Independent" variables are generally not independent, and the "dependent"
variable may be independent of the others.

>There appear to be reports in the literature that transform continuous
>independent variablea by the use of splines, e.g.,  assume the dependent
>variable is hot dogs eaten per week (HD) and the independent variable is
>waistline (WL), a normal linear regression model would be:
>
>nonconfusing_regression  <- lm(HD ~ WL)
>
>One might use a spline,
>
>confusion_inducing_regression_with_spline <- lm(HD ~ ns(WL, df = 4) )
>
>Now is where the problem starts.
>
>>From nonconfusing_regression , I get, say 2 added hot dogs per week for each
>centimeter of waistline along with a s.e. of 0.5 hot dogs per week, which I
>multiply by 1.96 to garner each side of the 95% c.i.
>If I want to show what the difference between the 75th percentile (say 100
>cm) and 25th percentile (say 80 cm) waistlines are, I multiply 2 by
>100-80=20 and get 40 hot dogs per week as the point estimate with a similar
>bumping of the s.e. to 10 hot dogs per week.
>
>What do I do to get the point estimate and 95% confidence interval for the
>difference between 100 cm persons and 80 cm persons with
>confusion_inducing_regression_with_spline ?
>
>Best regards.
>
>Mitchell S. Wachtel, MD



More information about the R-help mailing list