[R] Creating missing values.

Marc Feldesman feldesmanm at pdx.edu
Sun Mar 24 18:12:31 CET 2002


I'm trying to figure out whether there is a simple one or two-pass approach 
to randomly creating missing values for a set of existing (complete) 
data.  For example, I want to randomly make 10% of the entries in the Iris 
dataset missing (i.e. NA).  I don't want any case to have all missing 
values and I don't want any case to be missing the classification 
variable.  I can do this in about 3 passes, but I haven't figured out 
whether there is an efficient way to do this in one or two passes through 
the data.

My approach involves creating a dummy vector with a length equal to the 
full length
of the Iris data (750 elements).  >sample(750, 1:10, replace=T).  I then 
assigned all values of 2 to be 0 and all others to be 1.  This left me with 
approximately 10% of the entries as "missing".  I reshaped this into a 150 
x 5 matrix.   From here, things were pretty straightforward.

Is there anyway to bypass the dummy vector and operate directly on a copy 
of the original Iris matrix and get to the point above without the 
intermediate steps?

Thanks.


=====================
Dr. Marc R. Feldesman
Professor and Chairman
Anthropology Department
Portland State University
1721 SW Broadway
Portland, Oregon 97201
email:  feldesmanm at pdx.edu
phone:  503-725-3081
fax:    503-725-3905
http://web.pdx.edu/~h1mf
PGP Key Available On Request
======================

"Beyond every credibility gap lies a gullibility fill"

Powered by  Latochoerus and Windows 2000, SP1

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list