[R] "Re-creating" distributions

R. Michael Weylandt michael.weylandt at gmail.com
Fri Jun 8 16:30:20 CEST 2012


I wouldn't go quite so far as to say there's absolutely nothing else
-- one could, e.g., also fit lognormal, gamma, beta or most any other
two parameters distributions from the supplied data [assuming the
support matches].

What I did say is that you need domain specific knowledge to pick a
distribution to which to fit: then, if the moments are known in closed
form from the parameters, moment matching comes down to simultaneous
non-linear equations. I'm not aware of a unified infrastructure for
this in R [so I'm cc'ing the list in case someone else is], but it's
not a terribly difficult problem for the low dimensions we're talking
about.

E.g.,

If you know your data has a gamma distribution with mean 10 and
variance 20, you look at the Wikipedia gamma distribution page to find

Mean = k * theta
Variance = k * theta * theta

So Variance / Mean = theta --> Theta = 2 for your problem. Then k = 5.
Similarly, the all-great Wikipedians provide closed form solutions to
get the lognormal parameters back from observed sample moments:
http://en.wikipedia.org/wiki/Lognormal_distribution#Arithmetic_moments

As Bert rightly cautions, this is far outside the realm of good
practice and your energies would be better served if you could get a
better picture of the underlying data.

Best,
Michael

On Fri, Jun 8, 2012 at 9:13 AM, Bert Gunter <gunter.berton at gene.com> wrote:
>
> Andras:
> I realize my comment was rather cryptic, but which part of Michael's "You can't" did you not understand? Other then
>
> ?dnorm
>
> which, as Michael said, is probably not a good thing, you can do nothing. You need to refocus your efforts on changing the system to get useful data, not trying to make a silk purse out of a sow's ear. Or, as John Tukey said many years ago:
>
> "The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. "
> -- John Tukey
>
> -- Bert
>
>
>
>
> On Fri, Jun 8, 2012 at 5:14 AM, Andras Farkas <motyocska at yahoo.com> wrote:
>>
>>
>> Dear Bert and Michael
>>
>> thank you for your note below. Based on Michael's input and the lack of covariance matrix availble to me (for the most part), moment matching sounds like the best option. I have searched the internet for discussions on this using R but did not find much useful information. I also have to apologize, but I am somewhat new to the software and this level of statistics.I am usually pretty good at figuring things out, but this one is probably way over my head. I was wondering if you could point me into the right direction using R to "re-build" the distribution that has the following parameters:
>>
>> mean: 0.007, median: 0.003, SD:0.011.
>>
>> I greatly apreciate your help,
>>
>> Sincerely,
>>
>> Andras
>>
>> gunter.berton at gene.com> wrote:
>>
>>
>> From: Bert Gunter <gunter.berton at gene.com>
>> Subject: Re: [R] "Re-creating" distributions
>> To: "R. Michael Weylandt" <michael.weylandt at gmail.com>
>> Cc: "Andras Farkas" <motyocska at yahoo.com>, r-help at r-project.org
>> Date: Friday, June 8, 2012, 12:29 AM
>>
>> Related comment:
>>
>> "Even the data aren't sufficient." -- Brian Joiner (some years ago).
>>
>> Explanation: See W.E. Deming on "analytic" vs "enumerative" statistics.
>>
>> --- Bert
>>
>> On Thu, Jun 7, 2012 at 8:06 PM, R. Michael Weylandt
>> <michael.weylandt at gmail.com> wrote:
>> > Short answer: no, those are (in general) insufficient parameters to
>> > characterize a distribution.
>> >
>> > Long answer: unfortunately, it's not uncommon that those "summary
>> > statistics" are the only ones reported based on someone or other's
>> > limited experience with the Gaussian. There are a few things you could
>> > try, but each of them has problems:
>> >
>> > i) Pretend like your data is in fact normal and use those parameters
>> > because they do uniquely characterize a normal distribution. MASS
>> > (among others) provides a multivariate normal distribution [mvrnorm]
>> > if you have a covariance matrix available.
>> >
>> > ii) If you have reason to imagine another distribution [guided by
>> > domain knowledge], try to get its parameters in so far as possible by
>> > moment matching. Covariance structures are much harder for the general
>> > case though.
>> >
>> > iii) If you can get something that resembles original data, simply
>> > work by bootstrapping / imputation.
>> >
>> > Hope this helps,
>> > Michael
>> >
>> > On Thu, Jun 7, 2012 at 3:34 PM, Andras Farkas <motyocska at yahoo.com> wrote:
>> >> Dear All,
>> >>
>> >> I often have to work with certain models in which I try to "reproduce" a distribution the best I can with very little known information avaible. Is there a package or function in R that could best reproduce a probability distribution using only the mean, median and SD values availble without knowing the actual distribution type to begin with and/or the covariance matrix (for more then 1 data set)? All I usually have reported availble is mean, median and SD. I hope I made my question clear enough...
>> >>
>> >> thanks,
>> >>
>> >> Andras
>> >>
>> >>



More information about the R-help mailing list