[R] Plotting from different data sources on the same plot (with ggplot2)

jiho jo.irisson at gmail.com
Sun Sep 30 19:52:28 CEST 2007


On 2007-September-30  , at 18:35 , hadley wickham wrote:
>> The ggplot book specifies that "[ggplot] makes it easy to combine
>> data from multiple sources". Since I use ggplot2 as much as I can
>> (thanks it's really really great!) I thought I would try producing
>> such a plot with ggplot2.
>>
>> NB: If this is possible/easy with an other plotting package please
>> let me know. I am not looking for something specific to maps but
>> rather for a generic mechanism to throw several pieces of data to a
>> graph and have the plotting routine take care of setting up axes that
>> will fit all data on the same scale.
>
> I don't think it's easy with any other plotting system (although I'd
> be happy to be proven wrong), and was one of the motivations for the
> construction of ggplot.
>
>> So, now for the ggplot2 part. I have two data sources: the
>> coordinates of the coastlines in a region of interest and the
>> coordinated of sampling stations in a subset of this region. I want
>> to plot the coastline as a line and the stations as points, on the
>> same graph. I can plot them independently easily:
>>
>> p1 = ggplot(coast,aes(x=lon,y=lat)) + geom_path() + coord_equal 
>> (ratio=1)
>> p1$aspect.ratio = 1
>>
>> p2 = ggplot(coords,aes(x=lon,y=lat)) + geom_point() + coord_equal
>> (ratio=1)
>> p2$aspect.ratio = 1
>
> There are a few ways you could describe the graph you want.  Here's
> the one that I'd probably choose:
>
> ggplot(mapping = aes(x = log, y = lat)) +
> geom_path(data = coast) +
> geom_point(data = coords) +
> coord_equal()
>
> We don't define a default dataset in the ggplot call, but instead
> explicitly define the dataset in each of the layers. By default,
> ggplot will make sure that all the data is displayed on the plot -
> i.e. the x and y scales show the union of the ranges over all
> datasets.
>
> Does that make sense?

It makes perfect sense indeed... unfortunately it does not work  
here ;) :

 > p = ggplot(mapping = aes(x=lon, y=lat)) + geom_path(data = coast)  
+ geom_point(data = coords) + coord_equal()
 > p
Error in get("get_scales", env = .$.scales, inherits = TRUE)(. 
$.scales,  :
         invalid subscript type

As expected there is nothing in the data part of the p object
 > p$data
NULL

But there is no data specification either in the layers
 > p$layers
[[1]]
geom_path: (colour=black, size=1, linetype=1) + ()
stat_identity: (...=) + ()
position_identity: ()
mapping: ()

[[2]]
geom_point: (shape=19, colour=black, size=2) + ()
stat_identity: (...=) + ()
position_identity: ()
mapping: ()

  There are no scales either, which apparently causes the error
 > p$scales
Scales:   ->

Should I get a newer version of ggplot? (I have version 0.5.4)

About the other solution:

>> When tinkering a bit more with this I thought that the more natural
>> and "ggplot" way to do it, IMHO, would be to have a new addition (`
>> +`) method for the ggplot class and be able to do:
>>         p = p1 + p2
>> and have p containing both plots, on the same scale (the union of the
>
> You were obviously pretty close to the solution already!  - you just
> need to remove the elements that p2 already has in common with p1 and
> just add on the components that are different.

I would love to be able to do so because this way I can define custom  
plot functions that all return me a ggplot object and then combine  
these at will to get final plots (Ex: one function for the coastline,  
another for stations coordinates, another one which gets one data  
value, yet another for bathymetry contours etc etc.). This modular  
design would be more efficient than to have to predefine all  
combinations in ad hoc functions (e.g. one function for coast+bathy 
+stations, another for coast+stations only, another for coast+bathy 
+stations+data1, another for... you get the point).
However I don't see what to add and what to remove from the objects.  
Specifically, there is only "data" element in the ggplot object while  
my two objects (p1 and p2) both contain something different in $data.  
Should I define p$data as a list with p$data[[1]]=p1$data and p$data 
[[2]]=p2$data?

> You also need to
> remember that the ggplot function just sets up a list of defaults that
> can be overridden within each layer - there is very little
> functionality provided by the ggplot object itself.
>
>> scales of p1 and p2), and just one set of axes. And even:
>>         p = add(p1, p2, drop=T)
>> which would give p1 and p2 plots clipped to the xlim and ylim of p2.
>
> Yes, it would be nice to have some syntax to overrule the default
> policy of showing all the data, although it gets a bit more
> complicated when you consider other scales like colour and size.

I understand. Anyway, ggplot2 is still in its early stages and this  
may come after some maturing. Thanks for your answers.

JiHO
---
http://jo.irisson.free.fr/



More information about the R-help mailing list