[R] Using sapply to build a count matrix
Marc Schwartz
marc_schwartz at me.com
Thu Jul 2 04:38:48 CEST 2009
On Jul 1, 2009, at 9:15 PM, Murray Cooper wrote:
> Dear All,
>
> I am new to R and slowly learning how to use the system.
>
> The following code is an exercise I was trying.
> The intent is to generate 10 random samples of size 5 from
> a vector with integers 1:10 and 2 missing values. I then want
> to generate a matrix, for each sample which shows the frequency
> of missing values (NA) in each sample. My solution, using sapply
> is at the end.
>
> If anyone has the time and/or intrest to critique my method I'd
> be very grateful. I'm especially interested in knowing if there is
> a better way to accomplish this problem.
>
>> (x<-replicate(10,sample(c(1:10,rep(NA,2)),5)))
> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
> [1,] 3 NA 3 4 2 10 NA 4 5 4
> [2,] 5 7 7 3 9 2 8 NA 7 9
> [3,] NA 8 1 5 NA 7 10 2 NA 6
> [4,] 2 NA 6 10 8 4 4 7 4 7
> [5,] 7 9 10 8 3 6 1 NA 9 NA
>> # Since table will return only a single item of vaule FALSE
>> # if there are no missing values (NA) in a sample, sapply
>> # will return a list and not a matrix.
>> # So to get a matrix, the factor function needs to be used
>> # to identify possible results (FALSE, TRUE) for the table
>> # function.
>> sapply(1:10,function(i) table(factor(is.na(x[,i]),c(FALSE,TRUE))))
> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
> FALSE 4 3 5 5 4 5 4 3 4 4
> TRUE 1 2 0 0 1 0 1 2 1 1
>
> Thanks for your thoughts.
Murray, if I correctly understand what you want as an end result, then:
> x
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 3 NA 3 4 2 10 NA 4 5 4
[2,] 5 7 7 3 9 2 8 NA 7 9
[3,] NA 8 1 5 NA 7 10 2 NA 6
[4,] 2 NA 6 10 8 4 4 7 4 7
[5,] 7 9 10 8 3 6 1 NA 9 NA
> colSums(is.na(x))
[1] 1 2 0 0 1 0 1 2 1 1
To take that in stages:
> is.na(x)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
[2,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
[3,] TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE
[4,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[5,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE
The above gives you either TRUE or FALSE at each position in the
matrix. TRUE if the value is NA.
The colSums() function is optimized for speed using C code, to
calculate the sum of the values in each column. Since a TRUE is equal
to 1 and a FALSE is equal to 0, using colSums() on the above
intermediate step, gives you a column by column count of the NA values
in each.
> as.numeric(TRUE)
[1] 1
> as.numeric(FALSE)
[1] 0
See ?colSums for more information and the sister function rowSums().
HTH,
Marc Schwartz
More information about the R-help
mailing list