[R] how to best present concentrated data points/ ggplot2

Joshua Wiley jwiley.psych at gmail.com
Wed Jul 6 06:10:56 CEST 2011


Hi Yang,

I would take a slightly different approach and use what Wilkinson
calls stripe density plots.  The idea is that if you are trying to
show a univariate density on dimension 1 with many overlapping or
extremely close observations, space on dimension 1 is precious, in two
dimensions, space on dimension 2 is abundant.  Rather than use things
like circles or squares which take up equal space on dims 1 & 2, use
something that takes up little space on dim 1, of course for human
perception, you want your plot to be visible, so extend the space used
on dimension two.  What I just described (in probably the most
obfuscated possible way) are lines.  Also, colour is sufficient to
distinguish different types so I did not bother with different line
types.  Here is an example:

library(ggplot2)

set.seed(10)
x <- rnorm(10000)
a <- rnorm(5000)
b <- rnorm(5000)

weights.x <- abs(a/sum(a))
weights.y <- abs(b/sum(b))
weight <- c(weights.x, weights.y)

type <- c(rep("a", 5000), rep("b", 5000))
## make it so different types of points do not overlap
ze <- c(rep(0, 5000), rep(-.5, 5000))

d <- data.frame(expo = x, weight = weight, type = type, ze = ze)

m <- ggplot(d, aes(x = expo, group = type, col = type, weight = weight))

## note, with this many observations and alpha, plot may be sloow
m +
  geom_density() +
  geom_linerange(aes(x = expo, ymin = ze - .1, ymax = ze + .1), alpha = .25)

HTH,

Josh

On Tue, Jul 5, 2011 at 5:46 PM, Yang Lu <Yang.Lu at williams.edu> wrote:
> Hi all,
>
> I am trying to plot a weighted density plot for two different types and want to show the data points on the x axis.
>
> The code is as follows. The data points are very concentrated. Is there a better way to present it( should I set the alpha value or something else)?
>
> Thanks!
>
> YL
>
> library(ggplot2)
>
> x <- rnorm(10000)
>
> a <- rnorm(5000)
>
> b <- rnorm(5000)
>
> weights.x <- abs(a/sum(a))
>
> weights.y <- abs(b/sum(b))
>
> weight <- c(weights.x, weights.y)
>
> ze <- rep(0,10000)
>
> type <- c(rep("a",5000), rep("b",5000))
>
> d <- data.frame(expo = x, weight = weight, type = type, ze = ze)
>
> m <- ggplot(d, aes(x = expo, group = type, col = type, weight = weight))
>
> m+geom_density()+geom_point(aes(x = expo, y = ze, shape = type))
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
https://joshuawiley.com/



More information about the R-help mailing list