[R] find maximum values on the density function of a bimodal distribution
David Carlson
dcarlson at tamu.edu
Tue Apr 22 15:25:06 CEST 2014
Following on Bert's comments, you can see some of the issues by plotting histograms of the original data:
> hist(rv, breaks=15)
> abline(v= d.rv$x[c(mode1, mode2)], lty=2)
> hist(rv, breaks=25)
> abline(v= d.rv$x[c(mode1, mode2)], lty=2)
The density plot smooths the values into two peaks, but the histogram hints at the possibility that the underlying values are tri-modal. On the kernel density plot, different bandwidths will give different pictures. The default bandwidth is about 3.3. The manual page recommends bw="SJ" which gives about 3.8. Larger bandwidths smooth more and smaller bandwidths smooth less.
> plot(density(rv, bw=5)) # Unimodal
> plot(density(rv, bw=4)) # Barely bimodal
> plot(density(rv, bw=3)) # Bimodal
> plot(density(rv, bw=2)) # Bimodal with hints of more
> plot(density(rv, bw=1)) # Multimodal
David C
-----Original Message-----
From: Bert Gunter [mailto:gunter.berton at gene.com]
Sent: Monday, April 21, 2014 4:17 PM
To: David Carlson
Cc: Luigi Marongiu; r-help at r-project.org
Subject: Re: [R] find maximum values on the density function of a bimodal distribution
Well, yes and no, David.
Yours is one possible interpretation -- just work verbatim with the
finite sample and, as you say, hope it's well behaved.
But if the data are a noisy *sample* from a bimodal distribution and
the purpose is to estimate the *population* modes, then, of course,
it's a different kettle of fish. As what was desired wasn't clear to
me, and I wasn't going to wade through his HTML text mess anyway(note:
you are asked to post in **plain text** for this reason), I didn't
reply. However, if this is relevant, a search on "bump hunting" or
"1-d clustering" might bring up something useful. I'm sure there's an
R package or 20 that addresses this.
Cheers,
Bert
Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374
"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
H. Gilbert Welch
On Mon, Apr 21, 2014 at 1:38 PM, David Carlson <dcarlson at tamu.edu> wrote:
> This will work, as long as there are exactly 2 distinct modes:
>
>> runs <- rle(sign(diff(d.rv$y)))
>> length(runs$lengths) # There should be 4 runs if there are
> exactly 2 modes
> [1] 4
>> mode1 <- runs$lengths[1]+1
>> mode2 <- length(d.rv$x)- runs$lengths[4]
>> d.rv$y[c(mode1, mode2)] # The 2 modes:
> [1] 0.04012983 0.04049404
>
> --------------------------------------------------
> David L Carlson
> Listowner, Webmaster
> http://people.tamu.edu/~dcarlson/nar/
>
> ----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of Luigi
> Marongiu
> Sent: Monday, April 21, 2014 1:24 PM
> To: r-help at r-project.org
> Subject: [R] find maximum values on the density function of a
> bimodal distribution
>
> *dear all,I have a bimodal distribution and i would like to
> identify the
> two most frequent values. Using unimodal values i found from the
> R archive
> that is possible to identify the most frequent value, which
> corresponds to
> the peak of the density function:from Bert Gunter, Fri Mar 13
> 04rv <-
> rbinom(10000,1,0.1) + rnorm(10000)d.rv = density(rv)d.x =
> d.rv$xd.y =
> d.rv$yd.rv.max =
> d.rv$x[which.max(d.rv$y)]plot(d.rv)abline(v=d.rv.max)The
> following is instead a bimodal distribution, so how to identify
> the two
> peak values?rv <-c(-0.161948986691092, 6.25551346546337,
> 14.4449902518518, 5.71035851092391, 14.2659690175213,
> 13.4326694168529, 2.25851352934142, 19.0283322756163,
> 18.7578638985237, 3.6079681449722, 12.8469831785319,
> 16.8172446560292, 0.586180794938773, -2.78468096182699,
> 6.41435801855101, 5.40404032358274, 16.422996950473,
> 0.0146677573458032, 6.42413783291763, 16.9133720373153,
> 21.2106745914211, 4.04126872437582, 10.8506381906698,
> 8.74969277743266, 14.2820697977719, 15.9914414284944,
> 8.35395871093583, 22.322063211793, 14.3922587603495,
> -0.889640152783791, 15.5590006991235, 13.8636215566311,
> 8.04175502056292, -1.85932938527655, 3.0791620072771,
> 20.5240745935955, 3.68821147789088, 7.38116287748327,
> 8.13585591202069, 5.94733543506892, 7.77891267272758,
> 14.4774347329043, 0.309628817532129, -9.0260821589798,
> 12.4102239527495, -4.64595423395751, 17.3141235128072,
> 20.1952807795295, -8.18424754421263,
> -1.58917260547841)d.rv =
> density(rv)d.x = d.rv$xd.y = d.rv$yd.rv.max =
> d.rv$x[which.max(d.rv$y)]plot(d.rv)Thank you.Best wishes,Luigi*
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible
> code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list