[R] searching and replacing in a data frame.
Joshua Wiley
jwiley.psych at gmail.com
Mon Jul 18 10:04:06 CEST 2011
On Mon, Jul 18, 2011 at 12:22 AM, Ashim Kapoor <ashimkapoor at gmail.com> wrote:
> ttt <- data.frame(A = c(Inf, 0, 0), B = c(1, 2, 3))
>>
>> apply(ttt, 2, function(x) {x[is.infinite(x)] <- 0; x})
>>
>
> Ok thank you. That does work. What does
>
> apply(ttt, 1, function(x) x[is.infinite(x)] <- 0 )
>
> this return. I get all 0's,but can you explai why ?
I think so, though it gets a bit messy. First we can simplify things
by getting rid of apply for now and just dealing with a simple vector.
x <- c(Inf, 1)
When you type:
x[is.infinite(x)] <- 0
This function has the side effect of altering the object 'x', but it
does not actually return x (at least not for the default method, this
does not hold for data frames and possibly other methods that can be
dispatched). Let's see what apply() gets to work with:
## simple example vector
x <- c(Inf, 1)
## store output of subassignment function
test <- x[is.infinite(x)] <- 0
## look at test and x
> test
[1] 0
> x
[1] 0 1
If you try different examples, you will see that 'test' will be
whatever the object on the right of the assignment operator was. In
your case, it is a singleton 0. Now, we can go look at the
documentation ?apply sepcifically look at the "Value" section which
is what is returned.
If each call to 'FUN' returns a vector of length 'n', then 'apply'
returns an array of dimension 'c(n, dim(X)[MARGIN])' if 'n > 1'.
If 'n' equals '1', 'apply' returns a vector if 'MARGIN' has length
1 and an array of dimension 'dim(X)[MARGIN]' otherwise. If 'n' is
'0', the result has length 0 but not necessarily the 'correct'
dimension.
since n = 1, apply returns an array of dimension dim(X)[MARGIN] which
in your original case is equivalent to:
> dim(ttt)[c(1, 2)]
[1] 3 2
so a 3 x 2 array is return populated with whatever value you were
using to replace Inf. You might think that because ttt is a data
frame, the data frame method for `[<-` would get dispatched, but this
is not the case because what you are actually passing is rows or
columns of the data frame which are just vectors
> class(ttt)
[1] "data.frame"
> class(ttt)
[1] "data.frame"
> apply(ttt, 2, class)
A B
"numeric" "numeric"
> apply(ttt, 1, class)
[1] "numeric" "numeric" "numeric"
> apply(ttt, 1:2, class)
A B
[1,] "numeric" "numeric"
[2,] "numeric" "numeric"
[3,] "numeric" "numeric"
The simple way around all of this is to be clear what you what the
anonymous function (function(x) ) to return.
People better versed in the more inner workings of R may have some
corrections to how I have explained it.
HTH,
Josh
>
> Thank you.
> Ashim
>
--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
https://joshuawiley.com/
More information about the R-help
mailing list