[R] Efficiency question: replacing all NAs with a zero

Dimitri Liakhovitski ld7631 at gmail.com
Tue Mar 30 02:21:50 CEST 2010


Dear R'ers,

I have a very large data frame (over 4000 rows and 2,500 columns). My
task is very simple - I have to replace all NAs with a zero. My code
works fine on smaller data frames - but I have to deal with a huge one
and there are many NAs in each column.
R runs out of memory on me ("Reached total allocation of 1535Mb: see
help(memory.size)"). Is there any other, more efficient way of doing
it?
Thanks a lot for any hints!
Dimitri


# Building an example frame:
frame<-data.frame(a=rnorm(1:100),b=rnorm(1:100),c=rnorm(1:100),d=rnorm(1:100),e=rnorm(1:100),f=rnorm(1:100),g=rnorm(1:100))
set.seed(1234)
for(i in names(frame)){
	i.for.NA<-sample(1:100,60)
	frame[[i]][i.for.NA]<-NA
}

# Replacing all NAs in "frame" with zeros - is of course fast in this
example, because this data frame is very small
system.time({
frame<-lapply(frame,function(x){
	x[is.na(x)]<-0
	return(x)
})})


-- 
Dimitri Liakhovitski
Ninah.com
Dimitri.Liakhovitski at ninah.com



More information about the R-help mailing list