[R] replacing all NA's in a dataframe with zeros...
Gavin Simpson
gavin.simpson at ucl.ac.uk
Thu Mar 15 11:31:09 CET 2007
On Thu, 2007-03-15 at 10:21 +0100, Peter Dalgaard wrote:
> Gavin Simpson wrote:
> > On Wed, 2007-03-14 at 20:16 -0700, Steven McKinney wrote:
> >
> >> Since you can index a matrix or dataframe with
> >> a matrix of logicals, you can use is.na()
> >> to index all the NA locations and replace them
> >> all with 0 in one command.
> >>
> >>
> >
> > A quicker solution, that, IIRC, was posted to the list by Peter
> > Dalgaard several years ago is:
> >
> > sapply(mydata.df, function(x) {x[is.na(x)] <- 0; x}))
> >
> I hope your memory fails you, because it doesn't actually work.....
Ah, yes, apologies Peter. I have the sapply version embedded in a
package function that I happened to be working on (where I wanted the
result to be a matrix) and pasted directly from there and not my crib
sheet of useful R-help snippets where I do have it as lapply(...). I'd
forgotten I'd changed Peter's suggestion slightly in my function.
That'll teach me to reply before my morning cup of Earl Grey.
All the best,
G
>
> > sapply(test.df, function(x) {x[is.na(x)] <- 0; x})
> x1 x2 x3
> [1,] 0 1 1
> [2,] 2 2 0
> [3,] 3 3 0
> [4,] 0 4 4
>
> is a matrix, not a data frame.
>
> Instead:
>
> > test.df[] <- lapply(test.df, function(x) {x[is.na(x)] <- 0; x})
> > test.df
> x1 x2 x3
> 1 0 1 1
> 2 2 2 0
> 3 3 3 0
> 4 0 4 4
>
> Speedwise, sapply() is doing lapply() internally, and the assignment
> overhead should be small, so I'd expect similar timings.
>
--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC, UCL Geography, [f] +44 (0)20 7679 0565
Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
More information about the R-help
mailing list