[R] Quadratic regression: estimating the maximizing value
Mike Marchywka
marchywka at hotmail.com
Sat Feb 5 14:17:10 CET 2011
----------------------------------------
> From: Greg.Snow at imail.org
> To: gheine at blm.gov; r-help at r-project.org
> Date: Fri, 4 Feb 2011 17:49:51 -0700
> Subject: Re: [R] Quadratic regression: estimating the maximizing value
>
> No, your approach is not correct. For one you have not taken the covariance between B and C into account, another thing is that the ratio of 2 normal random variables is not necessarily normal, in some cases it can even follow a Cauchy distribution. Also note that with only 1 degree of freedom the Central Limit Theorem is not justified for using normal theory for non normal distributions. So the normal based confidence intervals on the coefficients are only reasonable if you are certain that the true error distribution is normal or nearly normal.
To make this df issue more clear, take some code like this,
> for ( i in 1:4)
+ {
+ Ldat <- data.frame("X"=c(3,7,14,24), "Y"=c(1,5,8,0))
+ Ldat=Ldat[which( 1:4 != i),]
+ LM<-lm(formula = Y ~ X + I(X^2), data = Ldat)
+ print ( DZ(LM$coefficients[2],LM$coefficients[3]))
+ }
X
13.46512
X
13.1519
X
13.11364
X
14.625
and try to compute "confidence intervals" with 3 data points fitted to 3 parameters.
What exactly does this mean? The result is exact if you only have 3 points or
that there is really no way to tell until you have a lot more data than parameters?
With your original data, look at residuals,
plot(LM)
and note the residuals are all zero with any 3 data points.
There is only so much you can do with limited data esp as number of points
approaches number of fitting parameters ( this is a common joke, always have
more parameters than data points ). Whatever reasons you have for expecting
the form to fit and things like measurement noise may be important to include
in your analysis to get anything meaningful out of it.
>
> Some possibilities that may work for you:
>
> Use the nls function with the curve parameterized to use the vertex point as one of the parameters (still need normality).
>
> Do bootstrapping (with only 4 points you are not going to get much with non-parametric bootstrap, but parametric with some assumptions may work).
>
> Use a Bayesian approach (this may be best if you can find some meaningful prior information).
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.snow at imail.org
> 801.408.8111
>
>
> > -----Original Message-----
> > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> > project.org] On Behalf Of gheine at blm.gov
> > Sent: Friday, February 04, 2011 2:34 PM
> > To: r-help at r-project.org
> > Subject: [R] Quadratic regression: estimating the maximizing value
> >
> >
> > A bioligist colleague sent me the following data.
> >
> > x Y
> > 3 1
> > 7 5
> > 14 8
> > 24 0
> >
> > (Yes, only four data points.) I don't know much about the
> > application, but apparently there are good empirical
> > reasons to use a quadratic model.
> >
> > The goal is to find the X value which maximizes the
> > response Y, and to find a confidence interval for this X
> > value.
> >
> > Finding the maximizing X value is pretty straightforward:
> >
> > >Ldat <- data.frame("X"=c(3,7,14,24), "Y"=c(1,5,8,0))
> > >(LM<-lm(formula = Y ~ X + I(X^2), data = Ldat))
> > Call:
> > lm(formula = Y ~ X + I(X^2), data = Ldat)
> >
> > Coefficients:
> > (Intercept) X I(X^2)
> > -3.86978 1.77762 -0.06729
> >
> > > DZ<-function(B,C) { (-B)/(2*C) } # Solve d/dx(A + Bx + Cx^2) = 0
> > > DZ(LM$coefficients[2],LM$coefficients[3])
> > X
> > 13.20961
> >
> >
> > To find a confidence interval, I used "confint()".
> > Default confidence level of 95% was not useful; used 80% instead,
> > and then computed DZ with the extreme X and I(X^2) coefficients:
> >
> > >(CI80<-confint(LM,level=0.8))
> >
> > 10 % 90 %
> > (Intercept) -5.6147948 -2.12476138
> > X 1.4476460 2.10759306
> > I(X^2) -0.0790312 -0.05553898
> >
> > > DZ(CI80[2,1],CI80[3,1])
> > [1] 9.1587
> > > DZ(CI80[2,2],CI80[3,2])
> > [1] 18.97400
> >
> > Conclusion: the 80% confidence level for the maximizing X value is
> > included in the range (9.158, 18.974)
> >
> > #################
> > Questions:
> >
> > 1) Is this the right procedure, or totally off base?
> >
> > 2) The coefficient of the "Intercept" is irrelevant to calculating
> > the maximizing value of X. Is there a way to reduce the size of
> > the confidence interval by doing a computation that leaves out this
> > parameter?
> >
> > 3) I believe that confint() indicates the axes of an ellipsoid,
> > rather than the corners of a box, in parameter space;
> > so that the above procedure is (slightly) too conservative.
> >
> > 4) Where are the calculations for confint() documented ?
> >
> > Thanks,
> > George Heine
> >
> > (Embedded image moved to file: pic23206.gif)Please consider the
> > environment
> > before printing this page
> >
> >
> >
> > <>=<>=<>=<>=<>=<>
> > =<>=<>=<>=<>=<>
> >
> > George Heine, PhD
> >
> > Mathematical Analyst
> >
> > National Operations
> > Center
> >
> > Bureau of Land Management
> >
> > voice (303) 236-0099
> >
> > fax (303) 236-1974
> >
> > cell (303) 905-5296
> >
> > <>=<>=<>=<>=<>=<>
> > =<>=<>=<>=<>=<>
> >
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list