[R-sig-Geo] To validate logistic regression
Bram Van Moorter
bram.van.moorter at gmail.com
Tue Apr 26 12:24:00 CEST 2011
Dear Komine,
Not sure whether this is the easiest way, but it has worked for me:
set.seed(0)
head(tab <- data.frame(Y=as.numeric(runif(100)>0.5), X=rnorm(100)))
subs <- sample(c(1:nrow(tab)), round(nrow(tab)*0.66), replace=F) #the
66% of data you want in one sample
tab1 <- tab[subs, ] #the one sample
tab2 <- tab[!c(1:nrow(tab)) %in% subs, ] #the other sample, which are
the data that do not fall in the first sample
rlog1 <- glm(Y~X,family=binomial,data=tab1)
summary(rlog1)
tab2$pred <-predict(rlog1, newdata=tab2, type="response")
hist(tab2$pred)
library(ROCR) #allows you to make easily ROC's which allows the
assessment of your prediction
pred <- prediction(tab2$pred, tab2$Y)
perf <- performance(pred,"tpr","fpr")
plot(perf); abline(0, 1, col="red") #the proportional line shows that
the prediction is as good as random, which you would expect in this
example
Best,
Bram
> Hi,
> I would like your help to validate my logistic regression. I know how to do
> logistic regression.
>
> rlog<-glm(Y~X,family=binomial,data=tab)
> summary(rlog)
> HLgof.test(fit = fitted(rlog), obs=Y)
>
> However, I would like to validate my model. For example to divise my data in
> a sample for training (66%) and a sample for validation (34%).
> e.g for ma table
> Area Y X
> 1 1 135
> 1 0 200
> 1 1 97
> 1 1 160
> 1 0 201
> 1 1 144
> 1 0 100
>
> But I don't know how to validate it.
> 1- My first problem: How to create my 2 samples from my variables Y and X
> using pourcentage 66 ang 34 %?
>
> - How to have the pourcentage of good prediction and bad prediction?
>
> Thanks for your Help
> Komine
>
--
Bram Van Moorter
Centre for Conservation Biology (NTNU),
Norwegian Institute for Nature Research (NINA)
Trondheim (Norway)
email: Bram.Van.Moorter at gmail.com
website: http://ase-research.org/moorter
phone: +47 73596060
More information about the R-sig-Geo
mailing list