[R] Imputation

Wed May 4 14:48:35 CEST 2005

On 05/04/05 11:13, Ramesh Kolluru wrote:

 I have timeseries data for some factors, and some missing values are there in those
 factors, I want impute those missing values without disturbing the distribution of
 that factor, and maintaining the correlation with other factors. Pl. suggest me some
 imputation methods.
 I tried some functions in R like aregImpute, transcan. After the imputation I am
 unable to retrive the data with imputed values. Please give me some way to get the
 data with imputed values.

Here is one way to do it with transcan(), but I'm looking forward
to seeing other answers.  The data are in s.m, and the missing
values are NA.  The imputed values are in s.imp$imputed, in
order, and the third line simply replaces the NAs with these
values.  (I posted this before.  You might have found it by
searching the R search page below.)  This is for the simplest
possible sort of imputation.  I'm not sure that it meets your
requirements.  (In fact, I'm pretty sure it doesn't.)  So you'd
have to change the options for transcan, or do something else.

s.imp <- transcan(s.m,asis="*",data=s.m,imputed=T,long=T,pl=F)
s.na <- is.na(s.m) # which data are imputed
s.m[which(s.na)] <- unlist(s.imp$imputed)

As for aregImpute(), that has to be more difficult, because
aregImpute() does multiple imputation.  Very roughly, it produces
an whole set of imputed values, for the purpose of statistical
inference.  I don't know how to get a single best estimate out of
this set, or even whether this is a good idea.

Jon
-- 
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page: http://www.sas.upenn.edu/~baron
R search page: http://finzi.psych.upenn.edu/