[R] fitting a gaussian to some x,y data
Rolf Turner
rolf at erdos.math.unb.ca
Fri Aug 25 19:12:19 CEST 2006
Michael Koppelman wrote:
> I apologize if this is redundant. I've been Googling, searching the
> archive and reading the help all morning and I am not getting closer
> to my goal.
>
> I have a series of data( xi, yi). It is not evenly sampled and it is
> messy (meaning that there is a lot of scatter in the data). I want to
> fit a normal distribution (i.e. a gaussian) to the data in order to
> find the center. (The data has a loose "normal" look to it.) I have
> done this with polynomial fitting in R with nls but want to try it
> with a Gaussian (I am a student astronomer and have not a lot of
> experience in statistics).
>
> In looking at the fitdistr function, it seems to take as input a
> bunch of x values and it will fit a gaussian to the histogram. That
> is not what I need to do, I want to fit a normal distribution to the
> x,y values and get out the parameters of the fit. I'm fooling with
> nls but it seems to want the data in some other format or something
> because it is complaining about "singular gradient".
>
> I'm sure this isn't hard and the answer must be out there somewhere
> but I can't find it. I appreciate any assistance.
Fitting a distribution simply means estimating
the parameters of that distribution.
For a Gaussian distribution this problem is trivial.
The parameters are the mean vector mu and the
covariance matrix Sigma.
The (optimal; maximum-likelihood-adjusted-to-be-unbiased)
estimates of these are the obvious ones --- the sample
mean and the sample covariance. In R you most easily
get these by
o cbinding your ``x'' and ``y'' vectors into
matrix:
> xy <- cbind(x,y)
o applying mean() to this matrix:
> mu.hat <- apply(xy,2,mean)
o calling upon var():
> Sigma.hat <- var(xy)
That is all there is to it.
If all you really want is an estimate of the ``centre''
then this estimate is just mu.hat; you don't need to
``fit a distribution'' at all.
From your description of the problem there may well be
other issues --- lack of independence of your data,
biased sample, outliers, skewness, god knows what. Such
issues should be dealt with as carefully as possible.
It may be the case that you should be doing some sort
of robust estimation. Or it may be the case that this
data set is hopeless and should be thrown away and a
new data set collected in a manner that will actually
yield some information.
But ``fitting a distribution'' is *NOT* the issue.
I don't mean to be rude, but this question illustrates the
point that you really shouldn't be *doing* statistics unless
you *understand* some statistics. At the very best you'll
waste a great deal of time.
cheers,
Rolf Turner
rolf at math.unb.ca
More information about the R-help
mailing list