[R] Plotting fit marginals, multiple plots on same x-axis
Johann Hibschman
jhibschman at gmail.com
Thu Oct 8 16:20:39 CEST 2009
I'm trying to plot the "marginals" of a fit: the aggregated value of
the actual and predicted vs. a cut/bucketed dimension. (The data set
is huge, so just plotting the raw points would be unintelligible.) I'd
also like to plot the number of points in each bucket (or, rather, the
sum of the weights in each bucket), so I can mentally discount crazy
behavior at low weights.
To do this, I want a divided plot, with the same x-axis. The top plot,
larger, would show the predicted and actual line. The bottom plot,
smaller, the count/weight data. (Alternative suggestions for how to
view this are welcome.)
If I use ggplot2, I can get a plot that mostly looks like what I want,
but I can't get one facet to be larger than the other. For time
series, using plot.zoo with a heights option gives the effect I'm
looking for, but this isn't a time series. Using layout, I end up
duplicating the x-axis and wasting a lot of space; also, as far as I
can tell, nothing actually guarantees that two plots aligned with
layout would be on the same x-coordinate axis, so if the y-axis label
of one ends up larger than the other, the curves won't line up.
Now, I'm sure that I can eventually hack up a layout-based solution so
that it works, by appropriate margin/axis/etc settings, but I thought
I'd ask if there's a better, elegant way. Also, I'm no master of R
graphics, so it would take a long time for me to figure out what to
do, so I'd want a bit of confirmation that that's the right way to go.
So, any suggestions?
To give a concrete example, here's something based on the mtcars
dataset that more-or-less shows what I want, aside from the
complication that my dataset is much larger:
## Make some sample data.
mtc <- within(mtcars, mpg.pred <- predict(lm(mpg~wt)))
hp.cut <- 25*mtc$hp%/%25
mtc.agg <- merge(aggregate(mtc[,c("mpg","mpg.pred")],
list(hp.cut=hp.cut), mean), aggregate(list(count=rep(1,nrow(mtc))),
list(hp.cut=hp.cut), sum))
## Is there an easier way to do this aggregation?
## Basic plot with layout.
## Not that pretty, wastes a lot of space by duplicating axes.
layout(1:2, heights=c(2, 1))
plot(mpg ~ hp.cut, data=agg, type='b')
lines(mpg.pred ~ hp.cut, data=agg, type='b', col='red')
legend("topright", legend=c("actual", "predicted"), col=c("black",
"red"), lty=1, pch=1)
plot(count ~ hp.cut, data=agg, type='l')
## Try to use ggplot2 for prettier plots.
## Very pretty, but the "secondary" variable of count gets equal billing
## with the "main" variables of mpg and mpg.pred.
library(ggplot2)
mtc.melt <- melt(mtc.agg[,c("hp.cut","mpg","mpg.pred","count")], id.vars=1)
mtc.melt$mpg.f <- factor(ifelse(mtc.melt$variable=="count", "Count",
"MPG"), levels=c("MPG", "Count"))
qplot(hp.cut, value, data=mtc.melt, geom=c("line","point"),
colour=variable) + facet_grid(mpg.f ~ ., scales="free")
Thanks,
Johann
More information about the R-help
mailing list