[R] How to find maximum values on the density function of arandom variable

David Winsemius dwinsemius at comcast.net
Fri Mar 13 19:18:45 CET 2009


In discrete math, first differences are the analogues of first  
derivatives and second differences are the analogues of second  
derivatives. (I thought I learned this in Knuth, vol 1 25 years ago,  
but I cannot find it, so maybe it was in some the time series stuff I  
read 20 years ago). So a negative second difference may signal a local  
maximum ( and a positive second difference would signal a local  
minimum). You may need to also check for plateaus that are not really  
local maxima. They would result in a signal like (-1 0 0 1)
 > vec3 <- c(1:5,5,5:10)
 > diff(diff(vec3))
  [1]  0  0  0 -1  0   1  0  0  0  0  # a discrete inflection

I think the -2's would not need to be checked further, but the -1 's  
would need to be further verified. Since the differenced sequences get  
shortened with each application you also need to adjust before for  
using the results as indexes.
 > vec
  [1] 1 2 3 4 5 6 5 4 3 2 1
 >which(diff(diff(vec))==-2)
[1] 5
 > vec[which(diff(diff(vec))==-2)]
[1] 5        # "off" by one
 > vec[which(diff(diff(vec))==-2)+1]
[1] 6        # correct maximum

There must be worked out algorithms for peak finding in the R package  
universe.

-- 
David Winsemius

On Mar 13, 2009, at 1:38 PM, guox at ucalgary.ca wrote:

> Thanks for your ideas.
> I would like to find all possible maximums - "mountains" on a graph  
> of a
> given density function but I have no ideas. In calculus, there is a
> general approach for a given function f(x): Find derivative of f(x)  
> and
> estimate all zeros of f'(x). These zeros give us locations of  
> mountains if
> f(x) is differentiable. If f(x) is not differentiable (f(x) is not
> continuous in this case), we cannot use this approach. Bur in discrete
> case, we can still talk about local maximums, I guess. Your ideas  
> are very
> helpful.
> -james
>> If you are trying to build your own function then presumably you do
>> not want the global maximum, since that is trivially returned by max.
>> So what do you really want? Is this a programming question or just a
>> general statistics question?
>>
>> If you want to search along a (specific) sequence for local maxima,
>> then you could try programming with "second differences" looking for
>> shifts in sign from positive to negative.
>>
>> vec <- c(1:5, 6, 5:1)
>>
>>> diff(diff(vec))
>> [1]  0  0  0  0 -2  0  0  0  0
>>
>> It is a bit ambiguous what you would do with this vector:
>>
>>> vec2 <- c(1:5,rep(6,3),5:1)
>>> diff(diff(vec2))
>>  [1]  0  0  0  0 -1  0 -1  0  0  0  0
>>
>> --
>> David Winsemius
>>
>>
>>
>> On Mar 13, 2009, at 12:48 PM, guox at ucalgary.ca wrote:
>>
>>> Yes, a random variable, discrete or continuous one, should associate
>>> with
>>> a probability space and a measurable space.
>>> I thought that graph of density(rv) below could give us an example
>>> of a
>>> density function. I am very sorry for confusing you.
>>>
>>> My question is how to find/estimate maximum values of a given  
>>> density
>>> function (even any given function within a given domain).
>>> The number of these maximum values might be > 1 but the global one
>>> is unique.
>>> Any ideas and references?
>>> Thanks,
>>> -james
>>>> There is some considerable confusion in both the question and the
>>>> reply.
>>>>
>>>> rv is **not**  a random variable. It is an (iid) sample from  
>>>> (i.e. a
>>>> "realization" of) a random variable. It has *no* "density function"
>>>> and
>>>> the
>>>> density() function is simply a procedure to **estimate** the
>>>> density of
>>>> the
>>>> underlying random variable from which rv was sampled at a finite
>>>> number
>>>> of
>>>> points. The result of density()and the max given in the reply will
>>>> depend
>>>> on
>>>> the particular parameters given to density()(see ?density for
>>>> details),
>>>> as
>>>> well as the data. In other words, both the question and answer
>>>> posted are
>>>> nonsense.
>>>>
>>>> Now let me contradict what I just said. **If** you consider rv a
>>>> finite,
>>>> discrete distribution (i.e. the whole population), then, in fact,
>>>> it does
>>>> have a discrete density, with point mass j(i)/n at each unique  
>>>> sample
>>>> value
>>>> i, where n is the total sample size (= 10000 in the example) and
>>>> j(i) is
>>>> the
>>>> number of samples values == i, which would probably be 1 for all i.
>>>> Then,
>>>> of
>>>> course, one can talk about the density of this finite distribution
>>>> in the
>>>> obvious way and its maximum or maxima, occur at those i for which
>>>> n(i) is
>>>> largest.
>>>>
>>>> But of course that's not what the poster really meant, so that
>>>> brings us
>>>> back to the nonsense question and answer. What James probably meant
>>>> to
>>>> ask
>>>> was: "How can the maximum of the underlying population density
>>>> function
>>>> be
>>>> estimated?" Well, that's a complicated issue. One could, of course,
>>>> use
>>>> some
>>>> sort of density estimate -- there are tons -- and find its max;
>>>> that was
>>>> the
>>>> approach taken in the answer, but it's not so simple as it appears
>>>> because
>>>> of the need to choose the **appropriate** estimate (including the
>>>> parameters
>>>> of the statistical algorithm doing the estimating ). This is the
>>>> sort of
>>>> thing that actually requires some careful thought and statistical
>>>> expertise.
>>>> You will find, I believe, that the prescription for finding the max
>>>> suggested below can give quite different answers depending on the
>>>> parameters
>>>> chosen for this estimate, and on the estimate used. So if you need
>>>> to do
>>>> this right, may I suggest consulting the literature on density
>>>> estimation
>>>> or
>>>> perhaps talking with your local statistician?
>>>>
>>>> -- Bert Gunter
>>>> Genentech Nonclinical Statistics
>>>>
>>>> -----Original Message-----
>>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org
>>>> ]
>>>> On
>>>> Behalf Of Mike Lawrence
>>>> Sent: Thursday, March 12, 2009 5:40 PM
>>>> To: guox at ucalgary.ca
>>>> Cc: r-help at r-project.org
>>>> Subject: Re: [R] How to find maximum values on the density function
>>>> of
>>>> arandom variable
>>>>
>>>> rv <- rbinom(10000,1,0.1) + rnorm(10000)
>>>>
>>>> d.rv = density(rv)
>>>> d.x = d.rv$x
>>>> d.y = d.rv$y
>>>>
>>>> d.rv.max = d.rv$x[which.max(d.rv$y)]
>>>>
>>>> plot(d.rv)
>>>> abline(v=d.rv.max)
>>>>
>>>> #that what you want?
>>>>
>>>> On Thu, Mar 12, 2009 at 6:28 PM,  <guox at ucalgary.ca> wrote:
>>>>> I would like to find the maximum values on the density function  
>>>>> of a
>>>>> random variable. For example, I have a random variable
>>>>>
>>>>> rv <- rbinom(10000,1,0.1) + rnorm(10000)
>>>>>
>>>>> Its density function is given by density(rv) and can be  
>>>>> displayed by
>>>>> plot(density(rv)). How to calculate its maximum values?
>>>>> A density function may have a few (global and local) maximum  
>>>>> values.
>>>>> Please help. Thanks,
>>>>> -james
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Mike Lawrence
>>>> Graduate Student
>>>> Department of Psychology
>>>> Dalhousie University
>>>>
>>>> Looking to arrange a meeting? Check my public calendar:
>>>> http://tinyurl.com/mikes-public-calendar
>>>>
>>>> ~ Certainty is folly... I think. ~
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius, MD
>> Heritage Laboratories
>> West Hartford, CT
>>
>>
>>
>
>

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list