[R] questions regarding stat_smooth in ggplot area plot

Werner Heijstek w.heijstek at gmail.com
Fri Mar 25 10:11:55 CET 2011


Hi Dennis,

Thanks a lot for your insights.

I 'solved' the negative smooth by not using an xlim() but an ylim().
If I may, I'll ask a third question: How to plot multiple of these
ggplot area plots on top of one another so that the same x-axis is
shared?


vp.layout <- function(x, y) viewport(layout.pos.row=x, layout.pos.col=y)
arrange <- function(..., nrow=NULL, ncol=NULL, as.table=FALSE) {
 dots <- list(...)
 n <- length(dots)
 if(is.null(nrow) & is.null(ncol)) { nrow = floor(n/2) ; ncol = ceiling(n/nrow)}
 if(is.null(nrow)) { nrow = ceiling(n/ncol)}
 if(is.null(ncol)) { ncol = ceiling(n/nrow)}
        ## NOTE see n2mfrow in grDevices for possible alternative
grid.newpage()
pushViewport(viewport(layout=grid.layout(nrow,ncol) ) )
 ii.p <- 1
 for(ii.row in seq(1, nrow)){
 ii.table.row <- ii.row
 if(as.table) {ii.table.row <- nrow - ii.table.row + 1}
  for(ii.col in seq(1, ncol)){
   ii.table <- ii.p
   if(ii.p > n) break
   print(dots[[ii.table]], vp=vp.layout(ii.table.row, ii.col))
   ii.p <- ii.p + 1
  }
 }
}

set <- read.table(file="http://www.jovian.nl/set.csv", head=1,  sep=",")
set2 <- read.table(file="http://www.jovian.nl/set2.csv", head=1,  sep=",")
library(ggplot2)
s <- ggplot(set, aes(x = time, y = hours)) + geom_area(colour = 'red',
fill = 'red', alpha = 0.5) +
     geom_area(stat = 'smooth', span = 0.2, alpha = 0.3) + ylim(0,40)
s1 <- ggplot(set2, aes(x = time, y = hours)) + geom_area(colour =
'red', fill = 'red', alpha = 0.5) +
     geom_area(stat = 'smooth', span = 0.2, alpha = 0.3) + ylim(0,40)
arrange(s,s1,ncol=1)


The arrange() function was taken from
http://gettinggeneticsdone.blogspot.com/2010/03/arrange-multiple-ggplot2-plots-in-same.html.
In this example, the x-axes are only similar because the data sets
have the same range. In effect, nothing more happens than that two
images are plotted on top of one another. Now how to "merge" these two
(and later more) area plots on top of each other so that they share
the same x-axis (so that only one x-axis would be necessary on the
bottom of the plot)?

Thanks,

Werner


On Thu, Mar 24, 2011 at 6:08 PM, Dennis Murphy <djmuser at gmail.com> wrote:
> Hi:
>
> On Thu, Mar 24, 2011 at 7:21 AM, jovian <w.heijstek at gmail.com> wrote:
>>
>> Hello,
>>
>> I drew a simple area plot using ggplot2 using
>>
>> set <- read.table(file="http://www.jovian.nl/set.csv", head=1,  sep=",")
>> library(ggplot2)
>> ggplot() +
>> layer(
>>  data = set, mapping = aes(x = time, y = hours),
>>  geom = "area", stat="smooth", color="red"
>> ) +
>> layer(
>>  data = set, mapping = aes(x = time, y = hours),
>>  geom = "area", color="red", fill="red", alpha="0.5"
>> )
>>
>> I have two questions about this visualisation:
>>
>> - The smooth function is too "rough" right now, how do I make it follow
>> the
>> original values more closely?
>
> On the contrary, the function is too smooth - if you want it to conform
> better to the observed data, you have to 'roughen' it, which means reducing
> the span argument (see stat_smooth for details). Higher spans (or
> equivalently, wider bandwidths) generate smoother curves.
>
>> - The smooth function turns up with negative values (e.g. at the
>> beginning):
>> How do I prevent this? (e.g. to use 0 instead of any negative value.)
>
> It appears that in ggplot2, loess fits a local quadratic function to the
> data within its span (essentially, the bandwidth of the x window that
> contains 100*span% of the data). The wider the bandwidth, the smoother the
> function. As the window moves from left to right, its width will change so
> that it contains 100*span% of the data. It does some other magic to smooth
> the individual local fits, but basically the degree of smoothness is a
> function of the span. Your times start at x = 27, but the first nonzero y
> (hours) doesn't occur until x = 40, so you could restrict the extent of
> x-values with the xlim() argument if you want to get rid of the visual
> anomaly on the left end of your plot. Here's one approach:
>
> set <- read.table(file="http://www.jovian.nl/set.csv", head=1,  sep=",")
> library(ggplot2)
> ggplot() +
> layer(
>  data = set, mapping = aes(x = time, y =hours),
>  geom = "area", stat="smooth", span = 0.3, color="black", alpha = 0.3
> ) +
> layer(
>  data = set, mapping = aes(x = time, y = hours),
>  geom = "area", color="red", fill="red", alpha = 0.5
> ) +
> xlim(40, 85)
>
> or
> ggplot(set, aes(x = time, y = hours)) + geom_area(colour = 'red', fill =
> 'red', alpha = 0.5) +
>      geom_area(stat = 'smooth', span = 0.3, alpha = 0.3) + xlim(40, 85)
>
> I toyed with some of the plot parameters - you could do the same to get what
> you want. The two primary changes are the introduction of the span parameter
> in the first layer, associated with stat_smooth(), and the use of xlim() at
> the end to restrict the extent of the x-values to be displayed. You will get
> a warning about 'Removed 13 rows containing missing values', but those
> values are the times from 27-39 where hours = 0. If you need to have those
> times in the plot, then you'll have to live with the curve output by
> stat_smooth, even if it dips below zero. This is a consequence of the local
> quadratic fit. It's possible to get local linear fits (IIRC, that comes with
> degree = 1 in loess()), but I'll let you play with that if you so wish.
>
> stat_smooth() also has a n = argument; if you want the smooth to be
> generated over a fixed number of points rather than a fixed percentage of
> points, you could use that in place of span.
>
> HTH,
> Dennis
>
>> Thanks,
>>
>> Werner
>>
>>
>> --
>> View this message in context:
>> http://r.789695.n4.nabble.com/questions-regarding-stat-smooth-in-ggplot-area-plot-tp3402632p3402632.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>



More information about the R-help mailing list