[R] Help with predict function in glm

Rui Barradas ruipbarradas at sapo.pt
Mon Nov 26 12:21:06 CET 2012


Hello,

Why mail a question just to me? Post to the list and the odds of getting 
more answers (and better) are bigger.
As for your question, the problem is in the call to glm, you don't need 
the prefix 'train$' in the formula, the argument 'data' solves that and 
when predicting R will look for the columns with names in the formula 
and is unable to find columns called train$Outcome and train$Weight in 
the new data.frame 'test'. Corrected:

mylogit <- glm(Outcome ~ Weight, data=train, family = binomial("logit"))
predictions <- predict(mylogit, newdata = test, type= "response")


Hope this helps,

Rui Barradas
Em 26-11-2012 01:42, somnath bandyopadhyay escreveu:
>
> Hi,
> I am trying some basic logistic regression analysis using glm. I just have one dependent variable (Outcome) which is binary in nature and one independent variable (Weight). I fit a model using a training data set (train) which has 85 observations and try to apply it on an independent dataset (test) which has 55 observations. When I apply the predict function on the fitted model for the new dataset, I get the following warning "Warning message: 'newdata' had 55 rows but variable(s) found have 85 rows" and the predict works on the training observations and not on the test observations.
>
> Following is he session info, code and the training and test datasets I am using.
>
> What am I doing wrong? Any help would be greatly appreciated.
>
> Thanks,
> S.
>
>> train <- read.table("train_data.txt", header=T, row.names=1, sep="\t")
>> test<- read.table("test_data.txt", header=T, row.names=1, sep="\t")
>> mylogit <- glm(train$Outcome ~ train$Weight, data=train, family = binomial("logit"))
>> predictions <- predict(mylogit, newdata = test, type= "response")
> Warning message:
> 'newdata' had 55 rows but variable(s) found have 85 rows
>
>
>> sessionInfo()
> R version 2.15.0 (2012-03-30)
> Platform: x86_64-pc-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
>
>
>
>> train
> Outcome Weight
> AB256939_21 0 0.331
> AB257076_21 0 0.308
> AB257079_21 0 0.453
> AB415508_21 0 0.303
> AB700497_21 0 0.354
> AB904508_21 0 0.336
> AC048719_21 0 0.420
> AC185939_21 0 0.249
> AC185940_21 0 1.525
> AC445840_21 0 0.261
> E7490523_21 0 0.269
> E7490524_21 0 0.213
> E7659579_21 0 0.360
> E7661528_21 0 0.271
> E7781094_21 0 0.156
> E7781095_21 0 0.221
> E7781096_21 0 0.098
> E7969081_21 0 0.430
> E8117594_21 0 0.321
> E8133295_21 0 0.166
> E8161578_22 0 0.269
> E8483037_21 0 0.162
> E8559720_21 0 0.226
> L1065550_18 0 0.396
> L1065607_17 0 0.541
> L1065944_24 0 0.131
> L1066017_20 0 0.421
> L1069261_12 0 0.357
> L1069262_14 0 0.309
> L1069263_27 0 0.283
> L1069297_24 0 0.620
> L1081528_21 0 0.561
> L1084066_21 0 0.564
> L1086090_21 0 0.649
> L1104280_17 0 0.181
> L1111362_22 0 0.199
> L1118063_15 0 0.369
> L1133550_21 0 0.302
> L1144201_14 0 0.249
> L1155023_7 0 0.257
> L1158386_21 0 0.470
> L1163051_4 0 0.446
> ...........................
> ...........................
> ...........................
>
>
>> test
> Weight
> AB256870_21 0.364
> AB256873_21 0.329
> AB415518_21 0.219
> AB460669_21 0.481
> AB609036_21 0.313
> AB609038_21 0.196
> AB700495_21 0.402
> AB700498_21 0.343
> AC112834_21 0.372
> AC185937_21 0.270
> AC269527_21 0.285
> E7352023_21 0.358
> E7661554_21 0.471
> E7750502_21 0.437
> E7845183_21 0.232
> E7854155_21 0.474
> E7854156_21 0.121
> E7924877_21 0.312
> E7969079_21 0.423
> E8139256_21 0.329
> E8161577_22 1.060
> E8161580_21 0.157
> E8364473_21 0.227
> E8364474_21 0.069
> L1065940_14 0.256
> L1065946_10 0.184
> L1066018_25 0.282
> L1069260_15 1.094
> ................................
> ................................
>
>
>
>
>



More information about the R-help mailing list