[R] Which distribution best fits the data?
Thomas Adams
Thomas.Adams at noaa.gov
Mon Jun 30 16:46:36 CEST 2008
Jenny,
You may try here: http://en.wikipedia.org/wiki/Normality_test which
mentions the R package nortest
and here;
The Probability Plot Correlation Coefficient Test for Normality, James
J. Filliben:
http://www.jstor.org/sici?sici=0040-1706(197502)17%3A1%3C111%3ATPPCCT%3E2.0.CO%3B2-6&cookieSet=1
http://www.minitab.com/resources/articles/normprob.pdf
http://engineering.tufts.edu/cee/people/vogel/publications/probability1986.pdf
Regards,
Tom
Jenny Barnes wrote:
> Hi Ben and R-help communtiy,
>
> More specifics:
>
> I am using sea-surface temperature (averaged over an area) and also
> winds (averaged over an area) to use in a linear regression model as
> predictors for rainfall over a small region of Africa. So I have 1
> time series of sea-temp and one timeseries of rainfall (over 36 years
> - seasonal average) and I have performed the linear regression between
> the 2. I now want to check if the residuals are normally distributed.
> If they are not I want an R function that will tell me what
> distribution they are most similar to - so that I can apply a suitable
> transformation to make the data normal.....
>
> Any more tips now that you have a few more details perhaps? :o)
>
> Thanks for your time,
>
> Jenny
>
> On Mon, 30 Jun 2008, Ben Bolker wrote:
>
>> Jenny Barnes <jmb <at> mssl.ucl.ac.uk> writes:
>>
>>>
>>> Dear R-help community,
>>>
>>> Does anybody know of a stats function in R that tells you which
>>> distribution best fits your data? I have tried look through the
>>> archives
>>> but have only found functions that tell you if it's normal or log etc.
>>> specifically - I am looking for a function that tells you (given a
>>> timeseries) what the distribution is.
>>>
>>> Any help/advice will be greatly appreciated,
>>>
>>> All the best,
>>>
>>> Jenny Barnes
>>>
>>> jmb <at> mssl.ucl.ac.uk
>>
>> The problem is that it's not generally a good
>> idea to data-dredge in this way. Your best bet is
>> to think about the characteristics of the
>> data (discrete or continuous, non-negative or real,
>> symmetric or skewed) and try to narrow it down to
>> a few distributions -- then you can use fitdistr()
>> (from the MASS package) or something similar
>> to compare among them.
>>
>> If you say a little bit more about what
>> you're trying to do with the data you might
>> get some more specific advice.
>>
>> Ben Bolker
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Thomas E Adams
National Weather Service
Ohio River Forecast Center
1901 South State Route 134
Wilmington, OH 45177
EMAIL: thomas.adams at noaa.gov
VOICE: 937-383-0528
FAX: 937-383-0033
More information about the R-help
mailing list