[R] replacing all NA's in a dataframe with zeros...

Thu Mar 15 11:31:09 CET 2007

On Thu, 2007-03-15 at 10:21 +0100, Peter Dalgaard wrote:
> Gavin Simpson wrote:
> > On Wed, 2007-03-14 at 20:16 -0700, Steven McKinney wrote:
> >   
> >> Since you can index a matrix or dataframe with
> >> a matrix of logicals, you can use is.na()
> >> to index all the NA locations and replace them
> >> all with 0 in one command.
> >>
> >>     
> >
> > A quicker solution, that, IIRC,  was posted to the list by Peter
> > Dalgaard several years ago is:
> >
> > sapply(mydata.df, function(x) {x[is.na(x)] <- 0; x}))
> >   
> I hope your memory fails you, because it doesn't actually work.....

Ah, yes, apologies Peter. I have the sapply version embedded in a
package function that I happened to be working on (where I wanted the
result to be a matrix) and pasted directly from there and not my crib
sheet of useful R-help snippets where I do have it as lapply(...). I'd
forgotten I'd changed Peter's suggestion slightly in my function.

That'll teach me to reply before my morning cup of Earl Grey.

All the best,

G

> 
> > sapply(test.df, function(x) {x[is.na(x)] <- 0; x})
>      x1 x2 x3
> [1,]  0  1  1
> [2,]  2  2  0
> [3,]  3  3  0
> [4,]  0  4  4
> 
> is a matrix, not a data frame.
> 
> Instead:
> 
> > test.df[] <- lapply(test.df, function(x) {x[is.na(x)] <- 0; x})
> > test.df
>   x1 x2 x3
> 1  0  1  1
> 2  2  2  0
> 3  3  3  0
> 4  0  4  4
> 
> Speedwise, sapply() is doing lapply() internally, and the assignment
> overhead should be small, so I'd expect similar timings.
> 
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson                 [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%