[R] predict.lm(...,type="terms") question
Rui Barradas
ruipbarradas at sapo.pt
Wed Aug 29 18:59:35 CEST 2012
Hello,
predict(..., type = "terms") returns the predicted regressors multiplied
by the respective beta centered. The centering constant is also returned:
model <- lm(area ~ concn, data) # Already run
predict(model, type = "terms")
concn
1 -11542.321
2 -8244.515
3 1648.903
4 18137.934
attr(,"constant")
[1] 16416.75
So reversing these operations would return the predictor.
pred.center <- function(x) attr(x, "constant") # Just return the
centering constant
predictor <- function(model, newdata = NULL){
if(is.null(newdata))
newdata <- model.frame(model)
pred <- predict(model, newdata = newdata, type = "terms")
(pred + pred.center(pred)) / coef(model)[-1]
}
predictor(model)
concn
1 36.95206
2 61.95206
3 136.95206
4 261.95206
attr(,"constant")
[1] 16416.75
So now the problem is the new data. For that you cannot use the
regression made one way to "predict" the terms, it doesn't make sense.
Anyway, since yor fit in the original direction is good, you can reverse
the regression and expect to obtain usable results. (Maybe not
confidence intervals). In your model
y = mx + b + epsilon --> x' = (y'-b)/m + eps'/m
the residuals should also be included, which you didn't in your op. This
is really the problem, in the linear model x is not a random variable,
it often is an experiment design decision, and it is now becoming one.
That's why predict.lm doesn't allow to "predict" terms with new data.
I would stick to reversing the regression. And expect the residuals to
be scaled accordingly.
resid(model) / coef(model)["concn"]
resid(inv.model)
sd( resid(model) )
sd( resid(inv.model) )
sd( resid(model) ) / coef(model)["concn"]
Rui Barradas
Em 29-08-2012 15:16, John Thaden escreveu:
> I think I may be misreading the help pages, too, but misreading how?
>
> I agree that inverting the fitted model is simpler, but I worry that I'm
> misusing ordinary least squares regression by treating my response, with
> its error distribution, as a predictor with no such error. In practice,
> with my real data that includes about six independent peak area
> measurements per known concentration level, the diagnostic plots from
> plot.lm(inv.model) look very strange and worrisome.
>
> Certainly predict.lm(..., type = "terms") must be able to do what I need.
>
> -John
>
> On Wed, Aug 29, 2012 at 6:50 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
>
>> Hello,
>>
>> You seem to be misreading the help pages for lm and predict.lm, argument
>> 'terms'.
>> A much simpler way of solving your problem should be to invert the fitted
>> model using lm():
>>
>>
>> model <- lm(area ~ concn, data) # Your original model
>> inv.model <- lm(concn ~ area, data = data) # Your problem's model.
>>
>> # predicts from original data
>> pred1 <- predict(inv.model)
>> # predict from new data
>> pred2 <- predict(inv.model, newdata = new)
>>
>> # Let's see it.
>> plot(concn ~ area, data = data)
>> abline(inv.model)
>> points(data$area, pred1, col="blue", pch="+")
>> points(new$area, pred2, col="red", pch=16)
>>
>>
>> Also, 'data' is a really bad variable name, it's already an R function.
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> Em 28-08-2012 23:30, John Thaden escreveu:
>>
>>> Hello all,
>>>
>>> How do I actually use the output of predict.lm(..., type="terms") to
>>> predict new term values from new response values?
>>>
>>> I'm a chromatographer trying to use R (2.15.1) for one of the most
>>> common calculations in that business:
>>>
>>> - Given several chromatographic peak areas measured for control
>>> samples containing a molecule at known (increasing) concentrations,
>>> first derive a linear regression model relating the known
>>> concentration (predictor) to the observed peak area (response)
>>> - Then, given peak areas from new (real) samples containing
>>> unknown amounts of the molecule, use the model to predict
>>> concentrations of the
>>> molecule in the unknowns.
>>>
>>> In other words, given y = mx +b, I need to solve x' = (y'-b)/m for new
>>> data y'
>>>
>>> and in R, I'm trying something like this
>>>
>>> require(stats)
>>> data <- data.frame(area = c(4875, 8172, 18065, 34555), concn = c(25,
>>> 50, 125, 250))
>>> new <- data.frame(area = c(8172, 10220, 11570, 24150))
>>> model <- lm(area ~ concn, data)
>>> pred <- predict(model, type = "terms")
>>> #predicts from original data
>>> pred <- predict(model, type = "terms", newdata = new)
>>> #error
>>> pred <- predict(model, type = "terms", newdata = new, se.fit = TRUE)
>>> #error
>>> pred <- predict(model, type = "terms", newdata = new, interval =
>>> "prediction") #error
>>> new2 <- data.frame(area = c(8172, 10220, 11570, 24150), concn = 0)
>>> new2
>>> pred <- predict(model, type = "terms", newdata = new2)
>>> #wrong results
>>>
>>> Can someone please show me what I'm doing wrong?
>>>
>>> ______________________________**________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>>> PLEASE do read the posting guide http://www.R-project.org/**
>>> posting-guide.html <http://www.R-project.org/posting-guide.html>
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
More information about the R-help
mailing list