[R] missing values imputation
A.J. Rossini
rossini at blindglobe.net
Wed May 12 19:23:05 CEST 2004
(Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> writes:
> On 12-May-04 Rolf Turner wrote:
>> Anne Piotet wrote:
>>
>>> What R functionnalities are there to do missing values imputation
>>> (substantial proportion of missing data)? I would prefer to use
>>> maximum likelihood methods ; is the EM algorithm implemented? in
>>> which package?
>>
>> The so-called ``EM algorithm'' is ***NOT*** an
>> algorithm. It is a methodology or a unifying concept.
>> It would be impossible to ``implement'' it. (Except
>> possibly by means of some extremely advanced and
>> sophisticated Artificial Intelligence software.)
>
> Do we understand the same thing by "EM Algorithm"?
>
> The one I'm thinking of -- formulated under that name by Dempster,
> Laird and Rubin in 1977 ("Maximum likelihood estimation from incomplete
> data via the EM algorithm", JRSS(B) 39, 1-38) -- is indeed an algorithm
> in exactly the same sense as any iterative search for the maximum of a
> function.
>
> Essentially, in the context of data modelled by an underlying exponential
> family distribution where there is incomplete information about the
> values which have this distribution, it proceeds by
>
> Start: Choose starting estimates for the parameters of the distribution
> E: Using the current parameter values, compute the expected vaues
> of the sufficient statistics conditional on the observed information
> M: Solve the maximum-likelihood equations (which are functions of the
> sufficient statistics) using the expected values computed in (E)
> If sufficently converged, stop. Otherwise, make the current parameter
> values equal to the values estimated in (M) and return to (E).
>
> Algorithm, this, or not????
>
> And where does "extremely advanced and sophisticated Artificial
> Intelligence software" come into it? You can, in some cases, perform
> the above EM algorithm by hand.
>
> Which "EM Algorithm" are you thinking of?
Thanks, Ted :-) -- to extend it a bit, one can imagine the use of
approximate solutions to the 2 steps (simulation methods to get
expected values, similar range of approaches for the maximization) and
get a general (but possibly not robust) computational solution for
the parametric problem. Just plug in a formula for the likelihood and
the sufficient statistics...
Of course, thousands of papers have been written on these variations
(likelihood, specific implementations of the E and M steps).
best,
-tony
--
rossini at u.washington.edu http://www.analytics.washington.edu/
Biomedical and Health Informatics University of Washington
Biostatistics, SCHARP/HVTN Fred Hutchinson Cancer Research Center
UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable
FHCRC (M/W): 206-667-7025 FAX=206-667-4812 | use Email
CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}}
More information about the R-help
mailing list