[R] Method for checking automatically which distribtions fits a data
Frank E Harrell Jr
f.harrell at vanderbilt.edu
Mon Jul 7 19:00:54 CEST 2008
David Reinke wrote:
> The function ks.test(x,y, ...) performs a Kolmogorov-Smirnov test on a set
> of sample values x against a distribution y. Both x and y must be
> cumulative distributions; y can be either a vector of cumulative values or
> a predefined distribution such as pnorm().
>
> David Reinke
If you find which distribution best fits the empirical distribution, the
resulting estimates will have variances (once model uncertainty is taken
into account through bootstrapping) that are equal to those from the
empirical CDF so nothing is gained. You can use the empirical CDF as
the "final answer" unless prior knowledge on the distributional shape is
available.
Frank Harrell
>
> Senior Transportation Engineer/Economist
> Dowling Associates, Inc.
> 180 Grand Avenue, Suite 250
> Oakland, California 94612-3774
> 510.839.1742 x104 (voice)
> 510.839.0871 (fax)
> www.dowlinginc.com
>
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
> Behalf Of hadley wickham
> Sent: Monday, July 07, 2008 8:10 AM
> To: Ben Bolker
> Cc: r-help at stat.math.ethz.ch
> Subject: Re: [R] Method for checking automatically which distribtions fits
> a data
>
>>> Suppose I have a vector of data.
>>> Is there a method in R to help us automatically
>>> suggest which distributions fits to that data
>>> (e.g. normal, gamma, multinomial etc) ?
>>>
>>> - Gundala Viswanath
>>> Jakarta - Indonesia
>>>
>> See
>>
>> https://stat.ethz.ch/pipermail/r-help/2008-June/166259.html
>>
>> for example, normal vs gamma might be a sensible question
>> (for which you can use fitdistr() as suggested above), but
>> "multinomial" implies a very specific kind of response --
>> discrete data with a specified number of possible outcomes.
>
> Yes - the question as it is poorly stated. If you have a small
> (finite) choice of possible distributions you can use some kind of
> likelihood based statistic to determine which fits the data best. But
> what is the population of distributions in this case? All
> distributions that you see in stats101? All distributions that have
> names? All continuous distributions?
>
> Hadley
>
>
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
More information about the R-help
mailing list