[R] find maximum values on the density function of a bimodal distribution

David Carlson dcarlson at tamu.edu
Tue Apr 22 15:25:06 CEST 2014


Following on Bert's comments, you can see some of the issues by plotting histograms of the original data:

> hist(rv, breaks=15)
> abline(v= d.rv$x[c(mode1, mode2)], lty=2)
> hist(rv, breaks=25)
> abline(v= d.rv$x[c(mode1, mode2)], lty=2)

The density plot smooths the values into two peaks, but the histogram hints at the possibility that the underlying values are tri-modal. On the kernel density plot, different bandwidths will give different pictures. The default bandwidth is about 3.3. The manual page recommends bw="SJ" which gives about 3.8. Larger bandwidths smooth more and smaller bandwidths smooth less.

> plot(density(rv, bw=5)) # Unimodal
> plot(density(rv, bw=4)) # Barely bimodal
> plot(density(rv, bw=3)) # Bimodal
> plot(density(rv, bw=2)) # Bimodal with hints of more
> plot(density(rv, bw=1)) # Multimodal

David C

-----Original Message-----
From: Bert Gunter [mailto:gunter.berton at gene.com] 
Sent: Monday, April 21, 2014 4:17 PM
To: David Carlson
Cc: Luigi Marongiu; r-help at r-project.org
Subject: Re: [R] find maximum values on the density function of a bimodal distribution

Well, yes and no, David.

Yours is one possible interpretation -- just work verbatim with the
finite sample and, as you say, hope it's well behaved.

But if the data are a noisy  *sample* from a bimodal distribution and
the purpose is to estimate the *population* modes, then, of course,
it's a different kettle of fish. As what was desired wasn't clear to
me, and I wasn't going to wade through his HTML text mess anyway(note:
you are asked to post in **plain text** for this reason), I didn't
reply.  However, if this is relevant, a search on "bump hunting"  or
"1-d clustering" might bring up something useful. I'm sure there's an
R package or 20 that addresses this.

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
H. Gilbert Welch




On Mon, Apr 21, 2014 at 1:38 PM, David Carlson <dcarlson at tamu.edu> wrote:
> This will work, as long as there are exactly 2 distinct modes:
>
>> runs <- rle(sign(diff(d.rv$y)))
>> length(runs$lengths) # There should be 4 runs if there are
> exactly 2 modes
> [1] 4
>> mode1 <- runs$lengths[1]+1
>> mode2 <- length(d.rv$x)- runs$lengths[4]
>> d.rv$y[c(mode1, mode2)] # The 2 modes:
> [1] 0.04012983 0.04049404
>
> --------------------------------------------------
> David L Carlson
> Listowner, Webmaster
> http://people.tamu.edu/~dcarlson/nar/
>
> ----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of Luigi
> Marongiu
> Sent: Monday, April 21, 2014 1:24 PM
> To: r-help at r-project.org
> Subject: [R] find maximum values on the density function of a
> bimodal distribution
>
> *dear all,I have a bimodal distribution and i would like to
> identify the
> two most frequent values. Using unimodal values i found from the
> R archive
> that is possible to identify the most frequent value, which
> corresponds to
> the peak of the density function:from Bert Gunter, Fri Mar 13
> 04rv <-
> rbinom(10000,1,0.1) + rnorm(10000)d.rv = density(rv)d.x =
> d.rv$xd.y =
> d.rv$yd.rv.max =
> d.rv$x[which.max(d.rv$y)]plot(d.rv)abline(v=d.rv.max)The
> following is instead a bimodal distribution, so how to identify
> the two
> peak values?rv <-c(-0.161948986691092,    6.25551346546337,
> 14.4449902518518,    5.71035851092391,    14.2659690175213,
> 13.4326694168529,    2.25851352934142,    19.0283322756163,
> 18.7578638985237,    3.6079681449722,    12.8469831785319,
> 16.8172446560292,    0.586180794938773,    -2.78468096182699,
> 6.41435801855101,    5.40404032358274,    16.422996950473,
> 0.0146677573458032,    6.42413783291763,    16.9133720373153,
> 21.2106745914211,    4.04126872437582,    10.8506381906698,
> 8.74969277743266,    14.2820697977719,    15.9914414284944,
> 8.35395871093583,    22.322063211793,    14.3922587603495,
> -0.889640152783791,    15.5590006991235,    13.8636215566311,
> 8.04175502056292,    -1.85932938527655,    3.0791620072771,
> 20.5240745935955,    3.68821147789088,    7.38116287748327,
> 8.13585591202069,    5.94733543506892,    7.77891267272758,
> 14.4774347329043,    0.309628817532129,    -9.0260821589798,
> 12.4102239527495,    -4.64595423395751,    17.3141235128072,
> 20.1952807795295,    -8.18424754421263,
> -1.58917260547841)d.rv =
> density(rv)d.x = d.rv$xd.y = d.rv$yd.rv.max =
> d.rv$x[which.max(d.rv$y)]plot(d.rv)Thank you.Best wishes,Luigi*
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible
> code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list