[R] replacing all NA's in a dataframe with zeros...
Gavin Simpson
gavin.simpson at ucl.ac.uk
Thu Mar 15 09:08:42 CET 2007
On Wed, 2007-03-14 at 20:16 -0700, Steven McKinney wrote:
> Since you can index a matrix or dataframe with
> a matrix of logicals, you can use is.na()
> to index all the NA locations and replace them
> all with 0 in one command.
>
A quicker solution, that, IIRC, was posted to the list by Peter
Dalgaard several years ago is:
sapply(mydata.df, function(x) {x[is.na(x)] <- 0; x}))
Some timings on a larger problem with 100 columns:
> mydata.df <- as.data.frame(matrix(sample(c(as.numeric(NA), 1),
size = 1000*100, replace = TRUE),
nrow = 1000))
> system.time(retval <- sapply(mydata.df,
function(x) {x[is.na(x)] <- 0; x}))
[1] 0.108 0.008 0.120 0.000 0.000
> system.time(mydata.df[is.na(mydata.df)] <- 0)
[1] 2.460 0.028 2.498 0.000 0.000
And a larger problem still, 1000 columns
> mydata.df <- as.data.frame(matrix(sample(c(as.numeric(NA), 1),
size = 1000*1000, replace = TRUE),
nrow = 1000))
> system.time(retval <- sapply(mydata.df, function(x) {x[is.na(x)] <- 0;
x}))
[1] 0.908 0.068 2.657 0.000 0.000
> system.time(mydata.df[is.na(mydata.df)] <- 0)
[1] 43.127 0.332 46.440 0.000 0.000
Profiling mydata.df[is.na(mydata.df)] <- 0 shows that it spends most of
this time subsetting the the individual cells of the data frame in turn
and setting the NA ones to 0.
HTH
G
> > mydata.df <- as.data.frame(matrix(sample(c(as.numeric(NA), 1), size = 30, replace = TRUE), nrow = 6))
> > mydata.df
> V1 V2 V3 V4 V5
> 1 1 NA 1 1 1
> 2 1 NA NA NA 1
> 3 NA NA 1 NA NA
> 4 NA NA NA NA 1
> 5 NA 1 NA NA 1
> 6 1 NA NA 1 1
> > is.na(mydata.df)
> V1 V2 V3 V4 V5
> 1 FALSE TRUE FALSE FALSE FALSE
> 2 FALSE TRUE TRUE TRUE FALSE
> 3 TRUE TRUE FALSE TRUE TRUE
> 4 TRUE TRUE TRUE TRUE FALSE
> 5 TRUE FALSE TRUE TRUE FALSE
> 6 FALSE TRUE TRUE FALSE FALSE
> > mydata.df[is.na(mydata.df)] <- 0
> > mydata.df
> V1 V2 V3 V4 V5
> 1 1 0 1 1 1
> 2 1 0 0 0 1
> 3 0 0 1 0 0
> 4 0 0 0 0 1
> 5 0 1 0 0 1
> 6 1 0 0 1 1
> >
>
> Steven McKinney
>
> Statistician
> Molecular Oncology and Breast Cancer Program
> British Columbia Cancer Research Centre
>
> email: smckinney at bccrc.ca
>
> tel: 604-675-8000 x7561
>
> BCCRC
> Molecular Oncology
> 675 West 10th Ave, Floor 4
> Vancouver B.C.
> V5Z 1L3
> Canada
>
>
>
>
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch on behalf of David L. Van Brunt, Ph.D.
> Sent: Wed 3/14/2007 5:22 PM
> To: R-Help List
> Subject: [R] replacing all NA's in a dataframe with zeros...
>
> I've seen how to replace the NA's in a single column with a data frame
>
> *> mydata$ncigs[is.na(mydata$ncigs)]<-0
>
> *But this is just one column... I have thousands of columns (!) that I need
> to do this, and I can't figure out a way, outside of the dreaded loop, do
> replace all NA's in an entire data frame (all vars) without naming each var
> separately. Yikes.
>
> I'm racking my brain on this, seems like I must be staring at the obvious,
> but it eludes me. Searches have come up CLOSE, but not quite what I need..
>
> Any pointers?
>
--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK [w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
More information about the R-help
mailing list