[R] Which distribution best fits the data?

Thomas Adams Thomas.Adams at noaa.gov
Mon Jun 30 16:46:36 CEST 2008


Jenny,

You may try here: http://en.wikipedia.org/wiki/Normality_test which 
mentions the R package nortest

and here;

The Probability Plot Correlation Coefficient Test for Normality, James 
J. Filliben:

http://www.jstor.org/sici?sici=0040-1706(197502)17%3A1%3C111%3ATPPCCT%3E2.0.CO%3B2-6&cookieSet=1
http://www.minitab.com/resources/articles/normprob.pdf
http://engineering.tufts.edu/cee/people/vogel/publications/probability1986.pdf

Regards,
Tom

Jenny Barnes wrote:
> Hi Ben and R-help communtiy,
>
> More specifics:
>
> I am using sea-surface temperature (averaged over an area) and also 
> winds (averaged over an area) to use in a linear regression model as 
> predictors for rainfall over a small region of Africa. So I have 1 
> time series of sea-temp and one timeseries of rainfall (over 36 years 
> - seasonal average) and I have performed the linear regression between 
> the 2. I now want to check if the residuals are normally distributed. 
> If they are not I want an R function that will tell me what 
> distribution they are most similar to - so that I can apply a suitable 
> transformation to make the data normal.....
>
> Any more tips now that you have a few more details perhaps? :o)
>
> Thanks for your time,
>
> Jenny
>
> On Mon, 30 Jun 2008, Ben Bolker wrote:
>
>> Jenny Barnes <jmb <at> mssl.ucl.ac.uk> writes:
>>
>>>
>>> Dear R-help community,
>>>
>>> Does anybody know of a stats function in R that tells you which
>>> distribution best fits your data? I have tried look through the 
>>> archives
>>> but have only found functions that tell you if it's normal or log etc.
>>> specifically - I am looking for a function that tells you (given a
>>> timeseries) what the distribution is.
>>>
>>> Any help/advice will be greatly appreciated,
>>>
>>> All the best,
>>>
>>> Jenny Barnes
>>>
>>> jmb <at> mssl.ucl.ac.uk
>>
>>   The problem is that it's not generally a good
>> idea to data-dredge in this way. Your best bet is
>> to think about the characteristics of the
>> data (discrete or continuous, non-negative or real,
>> symmetric or skewed) and try to narrow it down to
>> a few distributions -- then you can use fitdistr()
>> (from the MASS package) or something similar
>> to compare among them.
>>
>>  If you say a little bit more about what
>> you're trying to do with the data you might
>> get some more specific advice.
>>
>>  Ben Bolker
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Thomas E Adams
National Weather Service
Ohio River Forecast Center
1901 South State Route 134
Wilmington, OH 45177

EMAIL:	thomas.adams at noaa.gov

VOICE:	937-383-0528
FAX:	937-383-0033



More information about the R-help mailing list