[R] How do I do a pretty scatter plot using ggplot2?

Joshua Wiley jwiley.psych at gmail.com
Sat Mar 10 04:11:27 CET 2012


Hmm, smooth the chart makes me think you are trying to find the trends:


require(ggplot2)
ggplot(mtcars, aes(mpg, hp)) +
  geom_point() +
  stat_smooth()

Try it out and see what you think---it adds a locally smoothed line
that does something like trace the means (that is a very over
simplification, but the gist of it).

Cheers,

Josh

On Fri, Mar 9, 2012 at 7:00 PM, Michael <comtech.usa at gmail.com> wrote:
> The origin of this problem was that a plain scatter plot with too many
> points with high dispersion generated too many points flying all over
> places.
>
> We are trying to smooth the charts a bit...
>
> Any good recommendations?
>
> Thanks a lot!
>
> On Fri, Mar 9, 2012 at 8:59 PM, Michael <comtech.usa at gmail.com> wrote:
>
>> Sorry for the confusion Michael.
>>
>> I myself am trying to figure out what my boss is requesting:
>>
>> I am certain that I need to "plot the quantiles of each bin.  " ...
>>
>> But how are the quantiles plotted? Shall I specify 50% quantile, etc?
>>
>> Being a diligent guy I am trying my hard to do some homework and figure it
>> out myself...
>>
>> I thought there is a standard statistical prodedure that everybody knows...
>>
>> Any more thoughts?
>>
>> Thanks a lot!
>>
>>
>> On Fri, Mar 9, 2012 at 8:51 PM, R. Michael Weylandt <
>> michael.weylandt at gmail.com> wrote:
>>
>>> On Fri, Mar 9, 2012 at 9:28 PM, Michael <comtech.usa at gmail.com> wrote:
>>> > Thanks a lot Mike!
>>> >
>>>
>>> Michael if you don't mind. (Though admittedly it leads to some degree
>>> of confusion in a conversation like this)
>>>
>>> > Could you please explain your code a bit?
>>>
>>> Which part?
>>>
>>> >
>>> > My imagination is that for each bin, I am plotting a line which is the
>>> > quantile of the y-values in that bin?
>>>
>>> Oh, so you want a qqnorm()-esque line? How is that like a scatterplot?
>>>
>>> ....yes, that's something else entirely (and not clear from your first
>>> post -- to my ear the "quantile" is a statistic tied to the [e]cdf)
>>> This is actually much easier in ggplot (and certainly doable in base
>>> as well)
>>>
>>> Try this,
>>>
>>> DAT <- data.frame(x = runif(1000, 0, 20), y = rnorm(1000)) # Not so
>>> volatile this time
>>> DAT$xbin <- with(DAT, cut(x, seq(0, 20, 5)))
>>>
>>> library(ggplot2)
>>> p <- ggplot(DAT) + facet_wrap( ~ xbin) + stat_qq(aes(sample = y))
>>>
>>> print(p)
>>>
>>> If this isn't what you want, please spend some time to show an example
>>> of the sort of graph you desire (it can be a bit of code or a link to
>>> a picture or even a hand sketch hosted somewhere online)
>>>
>>> Out on a limb, I think you might really be thinking of something more
>>> like this:
>>>
>>> p <- ggplot(DAT) + facet_wrap( ~ xbin) + geom_step(aes(x =
>>> seq_along(y), y = sort(y)))
>>>
>>> and see this for more: http://had.co.nz/ggplot2/geom_step.html
>>>
>>> Michael Weylandt
>>>
>>> >
>>> > I ran your program but couldn't figure out the meaning of the dots in
>>> your
>>> > plot?
>>> >
>>> > Thanks again!
>>> >
>>> > On Fri, Mar 9, 2012 at 7:07 PM, R. Michael Weylandt
>>> > <michael.weylandt at gmail.com> wrote:
>>> >>
>>> >> That doesn't really seem to make sense to me as a graphical
>>> >> representation (transforming adjacent y values differently), but if
>>> >> you really want to do so, here's what I'd do if I understand your goal
>>> >> (the preprocessing is independent of the graphics engine):
>>> >>
>>> >> DAT <- data.frame(x = runif(1000, 0, 20), y = rcauchy(1000)^2) # Nice
>>> >> and volatile!
>>> >>
>>> >> # split y based on some x binning and assign empirical quantiles of
>>> each
>>> >> group
>>> >>
>>> >> DAT$yquant <- with(DAT, ave(y, cut(x, seq(0, 20, 5)), FUN =
>>> >> function(x) ecdf(x)(x)))
>>> >>
>>> >> # BASE
>>> >> plot(yquant ~ x, data = DAT)
>>> >>
>>> >>  # ggplot2
>>> >> library(ggplot2)
>>> >>
>>> >> p <- ggplot(DAT, aes(x = x, y = yquant)) + geom_point()
>>> >> print(p)
>>> >>
>>> >> Michael Weylandt
>>> >>
>>> >> PS -- I see Josh Wiley just responded pointing out your requirements
>>> >> #1 and #2 are incompatible: I've used 1 here.
>>> >>
>>> >> On Fri, Mar 9, 2012 at 7:37 PM, Michael <comtech.usa at gmail.com> wrote:
>>> >> > Hi all,
>>> >> >
>>> >> > I am trying hard to do the following and have already spent a few
>>> hours
>>> >> > in
>>> >> > vain:
>>> >> >
>>> >> > I wanted to do the scatter plot.
>>> >> >
>>> >> > But given the high dispersion on those dots, I would like to bin the
>>> >> > x-axis
>>> >> > and then for each bin of the x-axis, plot the quantiles of the
>>> y-values
>>> >> > of
>>> >> > the data points in each bin:
>>> >> >
>>> >> > 1. Uniform bin size on the x-axis;
>>> >> > 2. Equal number of observations in each bin;
>>> >> >
>>> >> > How to do that in R? I guess for the sake of prettyness, I'd better
>>> do
>>> >> > it
>>> >> > in ggplot2?
>>> >> >
>>> >> > Thank you!
>>> >> >
>>> >> >        [[alternative HTML version deleted]]
>>> >> >
>>> >> > ______________________________________________
>>> >> > R-help at r-project.org mailing list
>>> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> >> > PLEASE do read the posting guide
>>> >> > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
>>> >> > and provide commented, minimal, self-contained, reproducible code.
>>> >
>>> >
>>>
>>
>>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/



More information about the R-help mailing list