[R-sig-eco] Log odds / predicted values etc

Gavin Simpson gavin.simpson at ucl.ac.uk
Tue Aug 24 11:02:43 CEST 2010


On Tue, 2010-08-24 at 07:42 +0100, Chris Mcowen wrote:
> Dear list, 
> 
> I am relatively new to R and GLM"S so please bear with me!

I really don't see how this is working properly as you've left too much
out of your code. Your model (if I believe what you write) was not
fitted with a data argument, and your newdata only contains a single
variable when it should contain values for all variables.

So it looks like you are either getting the fitted values not the
predicted values for new values of SEA, or R is picking up the other
variables from the workspace.

This is why we ask for reproducible examples or consistent code as typed
exactly into R.

What predictions do you want; for levels of SEA conditional upon holding
the other variables at their observed mean? If you want the fitted
probabilities for your observed data to plot against the levels of SEA,
don't both with newdata, just use

pred <- predict(mod, type = "response")

As for plotting, you don't want to join across levels of SEA with lines.
I don't think that makes much sense. What is wrong with a boxplot or
stripplot of these results:

boxplot(pred ~ SEA, data = yourdata)

where yourdata is the data frame containing the data you used to fit the
model. If your variables were scattered in the workspace (as your model
fitting line suggests model1 <- glm(THREAT~HAB*FR+BS+SEA+PD+WO) ) you
need to address that point first. Don't do it!!!

yourdata <- data.frame(THREAT, HAB, FR, BS, SEA, PD, WO)
rm(THREAT, HAB, FR, BS, SEA, PD, WO)

Now refit model:

model1 <- glm(THREAT~HAB*FR+BS+SEA+PD+WO, data = yourdata,
              family = binomial) ## Yep your code forgot this bit
                                 ## otherwise you fitted a Gaussian
                                 ## model!

The collect fitted values:

pred <- predict(model1, type = "response")

The plot:

boxplot(pred ~ SEA, data = yourdata)
stripplot(pred ~ SEA, data = yourdata)

Is that better?

> I am trying to plot a graph of my binary response (threat) on the Y
> axis and the variable SEA on the X-
> 
> I have used the following code but the graph looks odd
> ( https://files.me.com/chrismcowen/xz47o4) i presume it is because i
> have categorical factors and R is plotting a box and whisker plot of
> the binary response?

I don't think it will do that, as this works:

plot(x = sample(c(0L,1L), 100, replace = TRUE), y = rnorm(100))

and 'x' here is a integer just like your THREAT.

The /problem/ is that SEA is a factor and plot is generic dispatching on
it's 'x' argument. As you supplied 'x' a factor, the plot.factor method
was used and ?plot.factor describes what will be done in those cases.

>  It has been suggested i look at the  interaction plots on the
> predicted values, however i am unsure how to do this?

I think you want to look at effects displays; see the effects package:

http://cran.r-project.org/web/packages/effects/index.html

and this paper on the package:

http://www.jstatsoft.org/v08/i15

Although I thought I'd seen a more recent account from John Fox. Ah,
found it, but this is for a different set of GLM:

http://www.jstatsoft.org/v32/i01

So earlier paper is fine.

> The SEA variable is categorical 1 - 4  
> 
> mydata <- data.frame(SEA)
> Pred <- predict(model1, newdata = mydata, type = "response"),
> plot(x = traits$SEA, y = traits$threat), 
> lines(mydata$SEA, Pred)
> 
> I also want to use the model to predict results - if i know a species
> is annual, is pollinated by insects and has unisexual flowers - for
> example - how do i predict the response, in this case threatened or
> not.

One assumes these variables are in the model? If so, you could compute
the linear equation by hand and then convert to the response scale.
Alternatively, set up a new data frame with the combination of predictor
variables you want. But make sure you have the correct levels:

newData <- data.frame(SEA = factor("annual", levels = yourdata$SEA),
                      ....)

adding each variable found within your model and taking the levels from
the observed data.

Then do

newPred <- predict(model1, newdata = newData, type = "response")

Which you'll find easier will depend on your R and logistic regression
skills. Given you have factors etc, it may be easier to let R work this
out for you if you supply the correct 'newdata' data frame.

> I was thinking i would need to calculate the log odds using the
> coefficients from the GLM for each factor convert them into a
> probability, then calculate the probability for the entire model and
> then,  using this value,  predict if a species is threatened or not.
> However i am not sure if this is the best way or if i am being
> longwinded?
> 
> Furthermore, i am next going to convert threat or not into a continuos
> response and develop a  ordinal model, with varying degrees of threat
> (1-6) i presume the aforementioned  approach is applicable under this
> scenario as well?

I don't follow exactly what you were doing "by hand"; the answer to the
above would be coef(model1) %*% X_new where X_new was a new design
matrix with the combination of predictors required for the case at hand.
But as you have factors and interactions, you'll need to marry up all
the columns in the design matrix (?model.matrix) to your covariates. As
I said, it would be easier getting R to do the heavy lifting and use
predict().

The newer Fox paper I mention shows how to do effects displays for
proportional odds models. By the way, I don't think you mean continuous
for THREAT. Do you mean an ordered factor (i.e. it is still a discrete
variable)? If THREAT is a bounded (0,1] continuous variable, perhaps
look at beta regression.

I think that's about as far as I can go with the information provided.
Sorry I haven't produced a reproducible example, but if the pseudo code
above doesn't work, let me know and either make up some data and I'll go
through the steps for you in R or send me your data and I'll see about
cooking something up. Feel free to send the data off-list if you would
prefer not sharing it with the world.

G

> Chris
> 
> model1 <- glm(THREAT~HAB*FR+BS+SEA+PD+WO)
> 
> coef(model1)
>                 (Intercept)                HABLithophyte 
>                 -0.11880089                   0.10498055 
>             HABSemi-aquatic               HABTerrestrial 
>                  0.36712439                   0.17641700 
>                FRNon_fleshy          BSUnisexual_flowers 
>                  0.35551197                  -0.00706299 
>           BSUnisexual_plant               SEAHapaxanthic 
>                  0.06857112                  -0.53281439 
>                SEAPerrenial               SEAPleonanthic 
>                  0.18798090                  -0.08547676 
>                    PDBiotic                      WOWoody 
>                  0.29155388                   0.12533079 
>  HABLithophyte:FRNon_fleshy HABSemi-aquatic:FRNon_fleshy 
>                  0.27595822                  -0.31031510 
> HABTerrestrial:FRNon_fleshy 
>                 -0.31065039 
> 
> 
> 
> 'data.frame':	1115 obs. of  20 variables:
> $ order     : Factor w/ 9 levels "Alismatales",..: 7 3 3 8 8 8 8 8 8 8 ...
> $ fam       : Factor w/ 42 levels "Agavaceae","Alliaceae",..: 40 34 34 36 11 11 11 11 11 36 ...
> $ genus     : Factor w/ 449 levels "Acanthochlamys",..: 1 2 2 3 4 4 4 4 4 5 ...
> $ species   : Factor w/ 1065 levels "abdulrahmanii",..: 123 429 851 1065 156 243 485 735 761 107 ...
> $ Tip       : Factor w/ 1115 levels "Acanthochlamys_bracteata",..: 1 2 3 4 5 6 7 8 9 10 ...
> $ HAB       : Factor w/ 4 levels "Epiphyte","Lithophyte",..: 4 1 1 4 1 1 1 1 1 4 ...
> $ STO       : Factor w/ 2 levels "No_Storage_organ",..: 2 2 1 2 1 1 1 1 1 2 ...
> $ BS        : Factor w/ 3 levels "Bisexual_flower",..: 1 1 1 3 1 1 1 1 1 1 ...
> $ PD        : Factor w/ 2 levels "Abiotic","Biotic": 2 2 2 1 2 2 2 2 2 1 ...
> $ FR        : Factor w/ 2 levels "Fleshy","Non_fleshy": 1 2 2 2 1 1 1 1 1 2 ...
> $ END       : Factor w/ 2 levels "Endosperm","No_endosperm": 1 2 2 1 1 1 1 1 1 1 ...
> $ WO        : Factor w/ 2 levels "Non_woody","Woody": 1 1 1 1 1 1 1 1 1 1 ...
> $ RG        : Factor w/ 2 levels "One","Two+": 1 1 1 2 1 1 1 1 1 2 ...
> $ SEA       : Factor w/ 4 levels "Annual","Hapaxanthic",..: 3 3 3 3 3 3 3 3 3 1 ...
> $ SE        : Factor w/ 3 levels "Few","Numerous",..: 2 2 2 3 2 2 2 2 2 3 ...
> $ FS        : Factor w/ 2 levels "Actinomorphic",..: 1 2 2 1 1 1 1 1 1 2 ...
> $ IUCN      : Factor w/ 6 levels "CR","EN","LC",..: 4 5 6 3 5 1 5 6 4 3 ...
> $ EXTINCTION: int  6 2 3 1 2 5 2 3 6 1 ...
> $ THREAT    : int  1 0 1 0 0 1 0 1 1 0 ...
> $ THREATtf  : logi  TRUE FALSE TRUE FALSE FALSE TRUE ...
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-sig-ecology mailing list