[R] Newb Prediction Question using stepAIC and predict(), is R wrong?

Bill.Venables at csiro.au Bill.Venables at csiro.au
Thu Feb 10 07:49:07 CET 2011


Using complex names, like res[, 3+i] or res$var, in the formula for a model is a very bad idea, especially if eventually you want eventualluy to predict to new data.  (In fact it won't work, so that makes is very bad indeed.)  So do not use '$' or '[..]' terms in model formulae - this is going to cause problems when it comes to predict, because your formula will not associate with the names it has in its formula in the new data frame.  When you think about it, this is obvious.

In your case you will have to identify the actual names and build the formula that way.

So your model will be fitted with a call something like

fm <- lm(paid ~ x3i + xi + Sun + Fri + Sat, data = reservesub)

(but you will have to use the real names for the first two, of course).

If you are doing this in some kind of loop, there are ways to handle it without using terms such as reservesub[, 3+i] but they are not all that simple.  Still, if you want to predict from the model to new data, there is no way round it.

Interactions are inculded generally with the * or the / linear model operators.

Bill Venables. 

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of BSanders
Sent: Thursday, 10 February 2011 2:49 PM
To: r-help at r-project.org
Subject: [R] Newb Prediction Question using stepAIC and predict(), is R wrong?


I'm using stepAIC to fit a model.  Then I'm trying to use that model to
predict future happenings.

My first few variables are labeled as their column. (Is this a problem?)
The dataframe that I use to build the model is the same as the data I'm
using to predict with.

Here is a portion of what is happening..


This is the value it is predicting  = > [1] 9.482975

Summary of the model
Call:
lm(formula = reservesub$paid ~ reservesub[, 3 + i] + reservesub$grads[, 
    i] + reservesub$Sun + reservesub$Fri + reservesub$Sat)

Residuals:
    Min      1Q  Median      3Q     Max 
-15.447  -4.993  -1.090   3.910  27.454 

Coefficients:
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)            5.71370    1.46449   3.902 0.000149 ***
reservesub[, 3 + i]    1.00868    0.01643  61.391  < 2e-16 ***
reservesub$grads[, i]  0.44649    0.12131   3.681 0.000333 ***
reservesub$Sun         8.63606    1.95100   4.426 1.93e-05 ***
reservesub$Fri         3.76928    2.00079   1.884 0.061682 .  
reservesub$Sat         4.03103    2.12754   1.895 0.060225 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Residual standard error: 7.842 on 138 degrees of freedom
  (131 observations deleted due to missingness)
Multiple R-squared: 0.9794,     Adjusted R-squared: 0.9787 
F-statistic:  1312 on 5 and 138 DF,  p-value: < 2.2e-16 


Here is the data that is being fed into predicted[p] =
predict.(stepsaicguess[[p]], newdata = reservesubpred[p,])
           V1  V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18
V19 paid Mon Tue Wed Thu
276 10/3/2010 155 84 76 68 64 63 53 42  42  42  42  38  38  38  35  31  31 
NA   84   0   0   0   0      
 Fri Sat Sun grads.1 grads.2 grads.3 grads.4 grads.5 grads.6 grads.7
0   0    1       8       4       1      10      11       0       0
    grads.8 grads.9 grads.10 grads.11 grads.12 grads.13 grads.14
     0       4        0        0        3        4        0


In this case, i = 1, so I calculate the predicted value should be 
5.7137+1.00868*84+.44649*8+1*8.636+0*3.769+0*4.03=102

But, R is giving me 9.482975 for a predicted value .. (Which, interestingly
is 5.7137+3.769*1) (Intercept+Sat)

Another question I have is, if I were to include interactions in this model,
would I have to make those variables in my prediction dataframe, or would R
'know' what to do?

Thanks in advance for your expert assistance.
-- 
View this message in context: http://r.789695.n4.nabble.com/Newb-Prediction-Question-using-stepAIC-and-predict-is-R-wrong-tp3298569p3298569.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list