[R] Training a model using glm
Mohan Radhakrishnan
radhakrishnan.mohan at gmail.com
Thu Sep 18 09:13:25 CEST 2014
Oh. I understand now. There is nothing wrong with the logic. It is the
syntax.
> library(AppliedPredictiveModeling)
*Warning message:*
*package ‘AppliedPredictiveModeling’ was built under R version 3.1.1 *
> set.seed(3433)
> data(AlzheimerDisease)
> adData = data.frame(diagnosis,predictors)
> inTrain = createDataPartition(adData$diagnosis, p = 3/4)[[1]]
> training = adData[ inTrain,]
> testing = adData[-inTrain,]
> training1 <- training[,grepl("^IL|^diagnosis",names(training))]
>
> test1 <- testing[,grepl("^IL|^diagnosis",names(testing))]
> modelFit <- train(diagnosis ~ .,method="glm",data=training1)
> confusionMatrix(test1$diagnosis,predict(modelFit, test1))
Confusion Matrix and Statistics
Reference
Prediction Impaired Control
Impaired 2 20
Control 9 51
Accuracy : 0.6463
95% CI : (0.533, 0.7488)
No Information Rate : 0.8659
P-Value [Acc > NIR] : 1.00000
Kappa : -0.0702
Mcnemar's Test P-Value : 0.06332
Sensitivity : 0.18182
Specificity : 0.71831
Pos Pred Value : 0.09091
Neg Pred Value : 0.85000
Prevalence : 0.13415
Detection Rate : 0.02439
Detection Prevalence : 0.26829
Balanced Accuracy : 0.45006
'Positive' Class : Impaired
Thanks,
Mohan
On Thu, Sep 18, 2014 at 12:21 AM, Max Kuhn <mxkuhn at gmail.com> wrote:
> You have not shown all of your code and it is difficult to diagnose the
> issue.
>
> I assume that you are using the data from:
>
> library(AppliedPredictiveModeling)
> data(AlzheimerDisease)
>
> If so, there is example code to analyze these data in that package. See
> ?scriptLocation.
>
> We have no idea how you got to the `training` object (package versions
> would be nice too).
>
> I suspect that Dennis is correct. Try using more normal syntax without the
> $ indexing in the formula. I wouldn't say it is (absolutely) wrong but it
> doesn't look right either.
>
> Max
>
>
> On Wed, Sep 17, 2014 at 2:04 PM, Mohan Radhakrishnan <
> radhakrishnan.mohan at gmail.com> wrote:
>
>> Hi Dennis,
>>
>> Why is there that warning ? I think my syntax is
>> right. Isn't it not? So the warning can be ignored ?
>>
>> Thanks,
>> Mohan
>>
>> On Wed, Sep 17, 2014 at 9:48 PM, Dennis Murphy <djmuser at gmail.com> wrote:
>>
>> > No reproducible example (i.e., no data) supplied, but the following
>> > should work in general, so I'm presuming this maps to the caret
>> > package as well. Thoroughly untested.
>> >
>> > library(caret) # something you failed to mention
>> >
>> > ...
>> > modelFit <- train(diagnosis ~ ., data = training1) # presumably a
>> > logistic regression
>> > confusionMatrix(test1$diagnosis, predict(modelFit, newdata = test1,
>> > type = "response"))
>> >
>> > For GLMs, there are several types of possible predictions. The default
>> > is 'link', which associates with the linear predictor. caret may have
>> > a different syntax so you should check its help pages re the supported
>> > predict methods.
>> >
>> > Hint: If a function takes a data = argument, you don't need to specify
>> > the variables as components of the data frame - the variable names are
>> > sufficient. You should also do some reading to understand why the
>> > model formula I used is correct if you're modeling one variable as
>> > response and all others in the data frame as covariates.
>> >
>> > Dennis
>> >
>> > On Tue, Sep 16, 2014 at 11:15 PM, Mohan Radhakrishnan
>> > <radhakrishnan.mohan at gmail.com> wrote:
>> > > I answered this question which was part of the online course
>> correctly by
>> > > executing some commands and guessing.
>> > >
>> > > But I didn't get the gist of this approach though my R code works.
>> > >
>> > > I have a training and test dataset.
>> > >
>> > >> nrow(training)
>> > >
>> > > [1] 251
>> > >
>> > >> nrow(testing)
>> > >
>> > > [1] 82
>> > >
>> > >> head(training1)
>> > >
>> > > diagnosis IL_11 IL_13 IL_16 IL_17E IL_1alpha IL_3
>> > > IL_4
>> > >
>> > > 6 Impaired 6.103215 1.282549 2.671032 3.637051 -8.180721 -3.863233
>> > > 1.208960
>> > >
>> > > 10 Impaired 4.593226 1.269463 3.476091 3.637051 -7.369791 -4.017384
>> > > 1.808289
>> > >
>> > > 11 Impaired 6.919778 1.274133 2.154845 4.749337 -7.849364 -4.509860
>> > > 1.568616
>> > >
>> > > 12 Impaired 3.218759 1.286356 3.593860 3.867347 -8.047190 -3.575551
>> > > 1.916923
>> > >
>> > > 13 Impaired 4.102821 1.274133 2.876338 5.731246 -7.849364 -4.509860
>> > > 1.808289
>> > >
>> > > 16 Impaired 4.360856 1.278484 2.776394 5.170380 -7.662778 -4.017384
>> > > 1.547563
>> > >
>> > > IL_5 IL_6 IL_6_Receptor IL_7 IL_8
>> > >
>> > > 6 -0.4004776 0.1856864 -0.51727788 2.776394 1.708270
>> > >
>> > > 10 0.1823216 -1.5342758 0.09668586 2.154845 1.701858
>> > >
>> > > 11 0.1823216 -1.0965412 0.35404039 2.924466 1.719944
>> > >
>> > > 12 0.3364722 -0.3987186 0.09668586 2.924466 1.675557
>> > >
>> > > 13 0.0000000 0.4223589 -0.53219115 1.564217 1.691393
>> > >
>> > > 16 0.2623643 0.4223589 0.18739989 1.269636 1.705116
>> > >
>> > > The testing dataset is similar with 13 columns. Number of rows vary.
>> > >
>> > >
>> > > training1 <- training[,grepl("^IL|^diagnosis",names(training))]
>> > >
>> > > test1 <- testing[,grepl("^IL|^diagnosis",names(testing))]
>> > >
>> > > modelFit <- train(training1$diagnosis ~ training1$IL_11 +
>> > training1$IL_13 +
>> > > training1$IL_16 + training1$IL_17E + training1$IL_1alpha +
>> > training1$IL_3 +
>> > > training1$IL_4 + training1$IL_5 + training1$IL_6 +
>> > training1$IL_6_Receptor
>> > > + training1$IL_7 + training1$IL_8,method="glm",data=training1)
>> > >
>> > > confusionMatrix(test1$diagnosis,predict(modelFit, test1))
>> > >
>> > > I get this error when I run the above command to get the confusion
>> > matrix.
>> > >
>> > > *'newdata' had 82 rows but variables found have 251 rows '*
>> > >
>> > > I thought this was simple. I train a model using the training dataset
>> and
>> > > predict using the test dataset and get the accuracy.
>> > >
>> > > Am I missing the obvious here ?
>> > >
>> > > Thanks,
>> > >
>> > > Mohan
>> > >
>> > > [[alternative HTML version deleted]]
>> > >
>> > > ______________________________________________
>> > > R-help at r-project.org mailing list
>> > > https://stat.ethz.ch/mailman/listinfo/r-help
>> > > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list