[R] Imputing missing values

Dimitris Rizopoulos dimitris.rizopoulos at med.kuleuven.ac.be
Wed Sep 1 11:33:17 CEST 2004


Hi Jan,

you could try the following:

dat <- data.frame(Price=c(10,12,NA,8,7,9,NA,9,NA),
                  Crop=c(rep("Rise", 5), rep("Wheat", 4)),
                  Season=c(rep("Summer", 3), rep("Winter", 4),
rep("Summer", 2)))
######
dat <- dat[order(dat$Season, dat$Crop),]
dat$Price.imp <- unlist(tapply(dat$Price, list(dat$Crop, dat$Season),
function(x){
  mx <- mean(x, na.rm=TRUE)
  ifelse(is.na(x), mx, x)
  }))

dat

However, you should be careful using this imputation technique since
you don't take into account the extra variability of imputing new
values in your data set. I don't know what analysis are you planning
to do but in any case I would recommend to read some standard
references for missing values, e.g., Little, R. and Rubin, D. (2002).
Statistical Analysis with Missing Data, New York: Wiley.

I hope this helps.

Best,
Dimitris

----
Dimitris Rizopoulos
Doctoral Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/396887
Fax: +32/16/337015
Web: http://www.med.kuleuven.ac.be/biostat/
     http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm


----- Original Message ----- 
From: "Jan Smit" <janpsmit at yahoo.co.uk>
To: <R-help at stat.math.ethz.ch>
Sent: Wednesday, September 01, 2004 10:43 AM
Subject: [R] Imputing missing values


> Dear all,
>
> Apologies for this beginner's question. I have a
> variable Price, which is associated with factors
> Season and Crop, each of which have several levels.
> The Price variable contains missing values (NA), which
> I want to substitute by the mean of the remaining
> (non-NA) Price values of the same Season-Crop
> combination of levels.
>
> Price     Crop    Season
> 10        Rice    Summer
> 12        Rice    Summer
> NA        Rice    Summer
> 8         Rice    Winter
> 9         Wheat    Summer
>
> Price[is.na(Price)] gives me the missing values, and
> by(Price, list(Crop, Season), mean, na.rm = T) the
> values I want to impute. What I've not been able to
> figure out, by looking at by and the various
> incarnations of apply, is how to do the actual
> substitution.
>
> Any help would be much appreciated.
>
> Jan Smit
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html




More information about the R-help mailing list