[R-sig-Geo] To validate logistic regression
Tobias Erik Reiners
Tobias.Reiners at bio.uni-giessen.de
Wed Apr 27 09:01:09 CEST 2011
Dear Komine,
I have another more sophisticated approach for you.
If you really want to validate your logistic model
with x-fold internal croxxvalidation you should not
only perform your data partitioning once. I recomment
to do it 100 to 999 times to really get an estimate of
your data and model quality stability.
####5 fold Crossvalidation with 100 Permutations
k <- 20 ##20% of the dataset as testdata
N <- 100 ##100 Permutations
permu <- paste("Permut_",1:N,sep="")
AUC_Results <- matrix(NA, 1, N, dimnames=list("AUC",permu))
n <- ncol(Dataset)
numrows <- nrow(Dataset)
learnDataSize <- round(numrows*(1-0.01*k))
testDataSize <- numrows-learnDataSize
##loop
for (j in 1:N){
cat("calculating",((j/N)*100),"% \n")
learnIndex <-sample(nrow(Dataset))[1:learnDataSize]
learnData <-Dataset[learnIndex,]
testData <-Dataset[-learnIndex,]
mg <-glm(formula =yourFormula,
family = binomial(link = "logit"),data=learnData)
bestmod_cv <-step(mg,direction="backward",trace=0)
predicted_cv <-predict(bestmod_cv, newdata=testData, type="response")
observed_cv <-testData[,"Y"]
AUC_result <-roc.auc(observed_k, predicted_k)
AUC_Results[1,j] <-AUC_result$A
}
Cheers,
Tobias Erik Reiners
Mammalian Ecology Group
Zitat von Dylan Beaudette <debeaudette at ucdavis.edu>:
> Another approach:
>
> See ?lrm, ?validate, and ?calibrate from the rms package.
>
> Dylan
>
> On Tuesday, April 26, 2011, Bram Van Moorter wrote:
>> Dear Komine,
>> Not sure whether this is the easiest way, but it has worked for me:
>>
>> set.seed(0)
>> head(tab <- data.frame(Y=as.numeric(runif(100)>0.5), X=rnorm(100)))
>> subs <- sample(c(1:nrow(tab)), round(nrow(tab)*0.66), replace=F) #the
>> 66% of data you want in one sample
>> tab1 <- tab[subs, ] #the one sample
>> tab2 <- tab[!c(1:nrow(tab)) %in% subs, ] #the other sample, which are
>> the data that do not fall in the first sample
>>
>> rlog1 <- glm(Y~X,family=binomial,data=tab1)
>> summary(rlog1)
>> tab2$pred <-predict(rlog1, newdata=tab2, type="response")
>> hist(tab2$pred)
>>
>> library(ROCR) #allows you to make easily ROC's which allows the
>> assessment of your prediction
>> pred <- prediction(tab2$pred, tab2$Y)
>> perf <- performance(pred,"tpr","fpr")
>> plot(perf); abline(0, 1, col="red") #the proportional line shows that
>> the prediction is as good as random, which you would expect in this
>> example
>>
>> Best,
>> Bram
>>
>>
>> > Hi,
>> > I would like your help to validate my logistic regression. I know how to
> do
>> > logistic regression.
>> >
>> > rlog<-glm(Y~X,family=binomial,data=tab)
>> > summary(rlog)
>> > HLgof.test(fit = fitted(rlog), obs=Y)
>> >
>> > However, I would like to validate my model. For example to divise my data
> in
>> > a sample for training (66%) and a sample for validation (34%).
>> > e.g for ma table
>> > Area Y X
>> > 1 1 135
>> > 1 0 200
>> > 1 1 97
>> > 1 1 160
>> > 1 0 201
>> > 1 1 144
>> > 1 0 100
>> >
>> > But I don't know how to validate it.
>> > 1- My first problem: How to create my 2 samples from my variables Y and X
>> > using pourcentage 66 ang 34 %?
>> >
>> > - How to have the pourcentage of good prediction and bad prediction?
>> >
>> > Thanks for your Help
>> > Komine
>> >
>>
>>
>> --
>> Bram Van Moorter
>> Centre for Conservation Biology (NTNU),
>> Norwegian Institute for Nature Research (NINA)
>> Trondheim (Norway)
>> email: Bram.Van.Moorter at gmail.com
>> website: http://ase-research.org/moorter
>> phone: +47 73596060
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>
>
> --
> Dylan E. Beaudette
> USDA-NRCS Soil Scientist
> California Soil Resource Lab
> http://casoilresource.lawr.ucdavis.edu/
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
More information about the R-sig-Geo
mailing list