[R] How to do cross validation with glm?

Frank Harrell f.harrell at vanderbilt.edu
Wed Aug 24 19:39:32 CEST 2011


What is your sample size?  I've had trouble getting reliable estimates using
simple data splitting when N < 20,000.

Note that the following functions in the rms package facilitates
cross-validation and bootstrapping for validating models: ols, validate,
calibrate.

Frank

Andra Isan wrote:
> 
> Hi,
> 
> Thanks for the reply. What I meant is that, I would like to partition my
> dat data (a data frame) into training and testing data and then evaluate
> the performance of the model on test data. So, I thought cross validation
> is the natural choice to see how the prediction works on the hold-out
> data. Is there any example that I can take a look to see how to do cross
> validation and get the prediction results on my data?
> 
> Thanks a lot,
> Andra
> 
> --- On Wed, 8/24/11, Prof Brian Ripley <ripley at stats.ox.ac.uk>
> wrote:
> 
>> From: Prof Brian Ripley <ripley at stats.ox.ac.uk>
>> Subject: Re: [R] How to do cross validation with glm?
>> To: "Andra Isan" <andra_isan at yahoo.com>
>> Cc: r-help at r-project.org
>> Date: Wednesday, August 24, 2011, 10:11 AM
>> What you describe is not
>> cross-validation, so I am afraid we do not know what you
>> mean.  And cv.glm does 'prediction for the hold-out
>> data' for you: you can read the code to see how it does so.
>> 
>> I suspect you mean you want to do validation on a test set,
>> but that is not what you actually
>> claim.   There are lots of examples of this
>> sort of thing in MASS (the book, scripts in the package).
>> 
>> On Wed, 24 Aug 2011, Andra Isan wrote:
>> 
>> > Hi All,
>> > 
>> > I have a fitted model called glm.fit which I used glm
>> and data dat is my data frame
>> > 
>> > pred= predict(glm.fit, data = dat, type="response")
>> > 
>> > to predict how it predicts on my whole data but
>> obviously I have to do cross-validation to train the model
>> on one part of my data and predict on the other part. So, I
>> searched for it and I found a function cv.glm which is in
>> package boot. So, I tired to use it as:
>> > 
>> > cv.glm = (cv.glm(dat, glm.fit, cost,
>> K=nrow(dat))$delta)
>> > 
>> > but I am not sure how to do the prediction for the
>> hold-out data. Is there any better way for cross-validation
>> to learn a model on training data and test it on test data
>> in R?
>> > 
>> > Thanks,
>> > Andra
>> > 
>> > ______________________________________________
>> > R-help at r-project.org
>> mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained,
>> reproducible code.
>> > 
>> 
>> -- Brian D. Ripley,         
>>         ripley at stats.ox.ac.uk
>> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>> University of Oxford,         
>>    Tel:  +44 1865 272861 (self)
>> 1 South Parks Road,         
>>            +44 1865
>> 272866 (PA)
>> Oxford OX1 3TG, UK           
>>     Fax:  +44 1865 272595
>>
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: http://r.789695.n4.nabble.com/How-to-do-cross-validation-with-glm-tp3765994p3766108.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list