[R] confidence interval for the logit - predict.glm
Christine Sonvilla
csonvilla at gmx.at
Thu May 29 03:33:08 CEST 2008
Dear Brian and list members,
Thanks very much for your response Brian!
I applied the adjusted calculation that you advised me to use [1/(1+exp(-upperlogit))] and as a result I don't get any more NA values in my upper confidence interval values.
Yet, some outcomes are very akward, since for very small values such as 1.98E-10, the lower values very reasonably turns out to be 0, yet the upper value is simply set to 1, which kind of makes the confidence interval redundant and doesn't give any proper idea how good of a predicted value this 1.98E-10 value and other similar ones are.
In the following I try to retrace what excatly I am doing, yet I suppose the principle of "reproducible code" will be difficult to accomplish, since I can't add my input files here, I try my best and hope it kind of fits the "criteria":
My input txt.files are:
testfishes.txt (column names are the fish species, row names are the planning units, where fish surveys where conducted, so the matrix is binary, presence/absence data, 0s and 1s)
predy.txt (column names are the predictor variables, numerical, one is categorical, row names are again the planning units where fish surveys were conducted)
predallx.txt (column names are the predictor variables, row names now are ALL the planning units, also those where no fish survey was conducted)
# I do a glm analysis for my fish models, loading the files first:
predallx<-read.table("predallx.txt",header=TRUE,row.names=1,sep="\t")
predallx$exposure<-factor(predallx$exposure)
predallx$exposure<-ordered(predallx$exposure,levels=c("Sheltered","Medium","Medexposed","Exposed"))
attach(predallx)
predy<-read.table("predy.txt",header=TRUE, row.names=1,sep="\t")
predy$exposure<-factor(predy$exposure)
attach(predy)
testfishes<-read.table("testfishes.txt",header=TRUE,row.names=1,sep="\t")
attach(testfishes)
# Creating a table for my glm output, models are compared on the basis of AIC values:
cat('fish',' ','model',' ','AIC',' ','df.residual',' ','deviance',"\n",
file="AICfish.txt",append=FALSE)
# Creating a matrix for the predicted values (fishoccur) and a matrix for the upper and lower ci:
fishoccur<-matrix(0,nrow(predallx),ncol(testfishes))
colnames(fishoccur)<-colnames(testfishes)
rownames(fishoccur)<-rownames(predallx)
upperresponse<-matrix(0,nrow(predallx),ncol(testfishes))
colnames(upperresponse)<-colnames(testfishes)
rownames(upperresponse)<-rownames(predallx)
lowerresponse<-matrix(0,nrow(predallx),ncol(testfishes))
colnames(lowerresponse)<-colnames(testfishes)
rownames(lowerresponse)<-rownames(predallx)
# Now I run the anlysis in a loop + stepwise approach and enter results in the tables created above, "i" stands for every single fish that in turn should run through the loop:
for (i in 1:5)
{
predy$eachfish <- testfishes[,i]
modelfish <- glm(eachfish~depth500m + exposure + nearest.est + island + mangroves + seagrass,
family=binomial, data=predy)
stepmodfi <- step(modelfish, trace= 0)
cat(i, make.names(c(stepmodfi$call),unique=TRUE), AIC(stepmodfi),
df.residual(stepmodfi),deviance(stepmodfi),"\n", file="AICfish.txt",append=TRUE)
# Eventually I want to get predicted values for all those planning units where no fish surveys were conducted:
# First I get my predicted values from the response, this yields a predicted value for every single fish and every single planning unit, and these results will be fed into the previously constructed matrix "fishoccur":
predictionresponse <- predict(stepmodfi, predallx, type="response", se.fit=FALSE)
fishoccur[,i] <- predictionresponse
# Now I want to get confidence intervals for these predicted values. I figured out that I would need to get these from the logit scale first and only thereafter transfer them to the response scale. So, first I get again predicted values, but on the logit scale. This time though I get just ONE result for every planning unit, when actually I also expected to get a result for every fish and every planning unit. So, it seems that there is just one value for each planning unit, but I thought it would be nessesary to get one value for each fish and planning unit combination(?)
predictionlogit <- predict(stepmodfi, predallx, type="link", se.fit=TRUE)
# Then, nevertheless, I calculated the upper and lower ci values and putting them into the previously created matrices of upperresponse and lowerresponse, the "i" stands for the number of fishes used in this run, so that I get one value for every combination of fish and planning unit. Yet, I wonder whether this can be done, if first I only get one predicted value from the logit scale:
upperresponse[,i] <- exp(upperlogit)/(1+exp(upperlogit))
lowerresponse[,i] <- exp(lowerlogit)/(1+exp(lowerlogit))
# Or I use your adjusted formula:
upperresponse[,i] <- 1/(1+exp(-upperlogit))
lowerresponse[,i] <- 1/(1+exp(-lowerlogit))
# Close the loop:
}
# Finally extracting tables:
write.table(fishoccur,file="fishoccur.txt",row.names=TRUE,sep=",")
write.table(upperresponse,file="upperresponse.txt",row.names=TRUE,sep=",")
write.table(lowerresponse,file="lowerresponse.txt",row.names=TRUE,sep=",")
When I use the formula exp(upperlogit)/(1+exp(upperlogit)) I get quite many NAs in my upper ci values, the lower though seems fine.
Using your formular eliminated the NAs but produces some confidence intervals which - particularly for very small predicted values - stretch from 0 to 1, which of course doesn't qualify as a confidence interval.
I hope that the description here is more stringent and I would be very grateful if you had any further idea, why I either get NAs or for some values 0-1 confidence intervals and whether the prediction on the logit scale in the first place is conducted correctly.
Thank you very much!
Regards,
Christine
Your response:
Possibly your calculation overflows: exp(upperlogit)/(1+exp(upperlogit))
could be replaced by 1/(1+exp(-upperlogit)), or even better by
plogis(upperlogit). This could happen via the Hauck-Donner effect: the
fitted probabilities are very near one and the standard errors are very
large.
As for your other points, please follow the posting guide and
'provide commented, minimal, self-contained, reproducible code'.
--
Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten
More information about the R-help
mailing list