[R] Plotting from different data sources on the same plot (with ggplot2)

hadley wickham h.wickham at gmail.com
Sun Sep 30 21:01:07 CEST 2007


> > There are a few ways you could describe the graph you want.  Here's
> > the one that I'd probably choose:
> >
> > ggplot(mapping = aes(x = log, y = lat)) +
> > geom_path(data = coast) +
> > geom_point(data = coords) +
> > coord_equal()
> >
> > We don't define a default dataset in the ggplot call, but instead
> > explicitly define the dataset in each of the layers. By default,
> > ggplot will make sure that all the data is displayed on the plot -
> > i.e. the x and y scales show the union of the ranges over all
> > datasets.
> >
> > Does that make sense?
>
> It makes perfect sense indeed... unfortunately it does not work
> here ;) :
>
>  > p = ggplot(mapping = aes(x=lon, y=lat)) + geom_path(data = coast)
> + geom_point(data = coords) + coord_equal()
>  > p
> Error in get("get_scales", env = .$.scales, inherits = TRUE)(.
> $.scales,  :
>          invalid subscript type

Oops, that's a bit of a bug - you can fix it either by making one of
the two datasets the default (ggplot(coast, ...)) or manually adding
all the scales you need (+ scale_y_continuous() +
scale_x_continuous()).  I've made a note to fix this.

> As expected there is nothing in the data part of the p object
>  > p$data
> NULL
>
> But there is no data specification either in the layers
>  > p$layers
> [[1]]
> geom_path: (colour=black, size=1, linetype=1) + ()
> stat_identity: (...=) + ()
> position_identity: ()
> mapping: ()
>
> [[2]]
> geom_point: (shape=19, colour=black, size=2) + ()
> stat_identity: (...=) + ()
> position_identity: ()
> mapping: ()

Compare geom_point(data=mtcars) with str(geom_point(data =mtcars))
(which throws an error but you should be able to see enough).  So the
layers aren't printing out their dataset if they have one - another
bug.  I'll add it to my todo.

>   There are no scales either, which apparently causes the error
>  > p$scales
> Scales:   ->
>
> Should I get a newer version of ggplot? (I have version 0.5.4)
>
> About the other solution:
>
> >> When tinkering a bit more with this I thought that the more natural
> >> and "ggplot" way to do it, IMHO, would be to have a new addition (`
> >> +`) method for the ggplot class and be able to do:
> >>         p = p1 + p2
> >> and have p containing both plots, on the same scale (the union of the
> >
> > You were obviously pretty close to the solution already!  - you just
> > need to remove the elements that p2 already has in common with p1 and
> > just add on the components that are different.
>
> I would love to be able to do so because this way I can define custom
> plot functions that all return me a ggplot object and then combine
> these at will to get final plots (Ex: one function for the coastline,
> another for stations coordinates, another one which gets one data
> value, yet another for bathymetry contours etc etc.). This modular
> design would be more efficient than to have to predefine all
> combinations in ad hoc functions (e.g. one function for coast+bathy
> +stations, another for coast+stations only, another for coast+bathy
> +stations+data1, another for... you get the point).
> However I don't see what to add and what to remove from the objects.
> Specifically, there is only "data" element in the ggplot object while
> my two objects (p1 and p2) both contain something different in $data.
> Should I define p$data as a list with p$data[[1]]=p1$data and p$data
> [[2]]=p2$data?

You can do this already :

sample <- c(geom_point(data = coast), geom_path(data = streams), coord_equal())
p + sample

I think the thing you are missing is that the elements in ggplot() are
just defaults that can be overridden in the individual layers
(although the bug above means that isn't working quite right at the
moment).  So just specify the dataset in the layer that you are
adding.

You can do things like:

p <- ggplot(mapping = aes(x=lat, y = long)) + geom_point()
# no data so there's nothing to plot:
p

# add on data
p %+% coast
p %+% coords

The data is completely independent of the plot specification.  This is
very different from the other plotting models in R, so it may take a
while to get your head around it.

Hadley

---
http://had.co.nz/



More information about the R-help mailing list