[R] missing and replace
Fraser D. Neiman
fneiman at monticello.org
Thu Apr 27 13:20:33 CEST 2017
Dear All,
Replacing missing values with means is generally not a good idea:
"Perhaps the easiest way to impute is to replace each missing
value with the mean of the observed values for that variable. Unfortunately, this
strategy can severely distort the distribution for this variable, leading to complications
with summary measures including, notably, underestimates of the standard
deviation. Moreover, mean imputation distorts relationships between variables by
“pulling” estimates of the correlation toward zero."
That's from Gelman and Hill -- more here : http://www.stat.columbia.edu/~gelman/arm/missing.pdf
best, Fraser
________________________________________
From: Val [valkremk at gmail.com]
Sent: Wednesday, April 26, 2017 8:45 PM
To: r-help at R-project.org (r-help at r-project.org)
Subject: [R] missing and replace
HI all,
I have a data frame with three variables. Some of the variables do
have missing values and I want to replace those missing values
(1represented by NA) with the mean value of that variable. In this
sample data, variable z and y do have missing values. The mean value
of y and z are152. 25 and 359.5, respectively . I want replace those
missing values by the respective mean value ( rounded to the nearest
whole number).
DF1 <- read.table(header=TRUE, text='ID1 x y z
1 25 122 352
2 30 135 376
3 40 NA 350
4 26 157 NA
5 60 195 360')
mean x= 36.2
mean y=152.25
mean z= 359.5
output
ID1 x y z
1 25 122 352
2 30 135 376
3 40 152 350
4 26 157 360
5 60 195 360
Thank you in advance
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list