[R] Regression lines for differently-sized groups on the same plot
Sundar Dorai-Raj
sundar.dorai-raj at pdf.com
Wed Jul 20 04:56:06 CEST 2005
Laura M Marx wrote:
> Hi there,
> I've looked through the very helpful advice about adding fitted lines to
> plots in the r-help archive, and can't find a post where someone has offered
> a solution for my specific problem. I need to plot logistic regression fits
> from three differently-sized data subsets on a plot of the entire dataset.
> A description and code are below:
> I have an unbalanced dataset consisting of three different species (hem,
> yb, and sm), with unequal numbers of wood pieces in each species group. I
> am trying to generate a plot that will show the size of the wood piece on
> the X axis, the probability of it having tree seedlings growing on it on the
> Y (a binomial yes or no variable), and three fitted curves showing how the
> probability of having tree seedlings changes with increasing wood piece size
> for each species.
> I have no problem generating fits using GLM, and no problem creating the
> plot. However, if I try to add a fitted curve based only on the hem data
> subset to a plot that shows the entire dataset, I get an error message that
> the lengths of those data sets differ. "Error in xy.coords(x,y) : x and y
> lengths differ". I could see R's point -- you can't plot a regression line
> of babies born as a function of stork abundance on a graph of cherries
> produced (Y) versus rainfall (X), which for all the program knows, I'm
> trying to do. As a temporary fix, I added NAs to the end of the hem, yb,
> and sm subsets to make them the same length as the entire dataset. I can
> now add my fitted curves to the plot, but the lines are not connected. That
> is, if the hem group only contains wood pieces that are 1, 4, and 10 meters
> long, the plot has an X axis that ranges from 1 to 10, but line segments for
> the hem group regression line only appear above 1, 4, and 10. How can I fix
> this? An ideal solution would not require me to make the hem subset of my
> data the same length as the full dataset, either (although the summaries of
> regressions with the NAs (or zeroes) added and taken away are identical).
> I'd also settle for a work-around that would have R connect the pieces of
> the curve so that I get a solid line rather than small dots and dashes where
> actual data exist. Thanks so much for your help!
> Laura Marx
> Michigan State University, Dept. of Forestry
>
> #Note: hemdata has all the rows that are not hemlock species replaced with
> #"NA"s.
> hemhem=glm(hempresence~logarea, family=binomial(logit), data=hemdata)
> hemyb=glm(hempresence~logarea, family=binomial(logit), data=birchdata)
> hemsm=glm(hempresence~logarea, family=binomial(logit), data=mapledata)
>
> attach(logreg) #logreg is the full dataset
> plot(logarea, hempresence, xlab = "Surface area of log (m2)",
> ylab="Probability of hemlock seedling presence", type="n", font.lab=2,
> cex.lab=1.5, axes=TRUE)
> lines(logarea,fitted(hemhem), lty=1, lwd=2)
> lines(logarea,fitted(hemyb), lty="dashed", lwd=2)
> lines(logarea,fitted(hemsm), lty="dotted", lwd=2)
>
Hi, Laura,
Would ?predict.glm be better?
plot(logarea, hempresence,
xlab = "Surface area of log (m2)",
ylab="Probability of hemlock seedling presence",
type="n", font.lab=2, cex.lab=1.5, axes=TRUE)
lines(logarea, predict(hemhem, logreg, "response"), lty=1, lwd=2)
lines(logarea, predict(hemyb, logreg, "response"), lty="dashed", lwd=2)
lines(logarea, predict(hemsm, logreg, "response"), lty="dotted", lwd=2)
Without seeing more description of your data, this is still a guess.
--sundar
More information about the R-help
mailing list