[R] Regression lines for differently-sized groups on the same plot

Wed Jul 20 04:56:06 CEST 2005

Laura M Marx wrote:
> Hi there,
>   I've looked through the very helpful advice about adding fitted lines to 
> plots in the r-help archive, and can't find a post where someone has offered 
> a solution for my specific problem.  I need to plot logistic regression fits 
> from three differently-sized data subsets on a plot of the entire dataset.  
> A description and code are below:
>   I have an unbalanced dataset consisting of three different species (hem, 
> yb, and sm), with unequal numbers of wood pieces in each species group.  I 
> am trying to generate a plot that will show the size of the wood piece on 
> the X axis, the probability of it having tree seedlings growing on it on the 
> Y (a binomial yes or no variable), and three fitted curves showing how the 
> probability of having tree seedlings changes with increasing wood piece size 
> for each species.
>   I have no problem generating fits using GLM, and no problem creating the 
> plot.  However, if I try to add a fitted curve based only on the hem data 
> subset to a plot that shows the entire dataset, I get an error message that 
> the lengths of those data sets differ. "Error in xy.coords(x,y) : x and y 
> lengths differ".  I could see R's point -- you can't plot a regression line 
> of babies born as a function of stork abundance on a graph of cherries 
> produced (Y) versus rainfall (X), which for all the program knows, I'm 
> trying to do.  As a temporary fix, I added NAs to the end of the hem, yb, 
> and sm subsets to make them the same length as the entire dataset.  I can 
> now add my fitted curves to the plot, but the lines are not connected.  That 
> is, if the hem group only contains wood pieces that are 1, 4, and 10 meters 
> long, the plot has an X axis that ranges from 1 to 10, but line segments for 
> the hem group regression line only appear above 1, 4, and 10.  How can I fix 
> this?  An ideal solution would not require me to make the hem subset of my 
> data the same length as the full dataset, either (although the summaries of 
> regressions with the NAs (or zeroes) added and taken away are identical).  
> I'd also settle for a work-around that would have R connect the pieces of 
> the curve so that I get a solid line rather than small dots and dashes where 
> actual data exist.  Thanks so much for your help!
>   Laura Marx
>   Michigan State University, Dept. of Forestry 
> 
> #Note: hemdata has all the rows that are not hemlock species replaced with 
> #"NA"s.
> hemhem=glm(hempresence~logarea, family=binomial(logit), data=hemdata)
> hemyb=glm(hempresence~logarea, family=binomial(logit), data=birchdata)
> hemsm=glm(hempresence~logarea, family=binomial(logit), data=mapledata) 
> 
> attach(logreg) #logreg is the full dataset
> plot(logarea, hempresence, xlab = "Surface area of log (m2)", 
> ylab="Probability of hemlock seedling presence", type="n", font.lab=2, 
> cex.lab=1.5, axes=TRUE)
> lines(logarea,fitted(hemhem), lty=1, lwd=2)
> lines(logarea,fitted(hemyb), lty="dashed", lwd=2)
> lines(logarea,fitted(hemsm), lty="dotted", lwd=2)
> 

Hi, Laura,

Would ?predict.glm be better?

plot(logarea, hempresence,
      xlab = "Surface area of log (m2)",
      ylab="Probability of hemlock seedling presence",
      type="n", font.lab=2, cex.lab=1.5, axes=TRUE)
lines(logarea, predict(hemhem, logreg, "response"), lty=1, lwd=2)
lines(logarea, predict(hemyb, logreg, "response"), lty="dashed", lwd=2)
lines(logarea, predict(hemsm, logreg, "response"), lty="dotted", lwd=2)

Without seeing more description of your data, this is still a guess.

--sundar