[R] Need help with table() and apply()
jim holtman
jholtman at gmail.com
Sun Nov 20 23:43:49 CET 2011
It might be good if you told us the problem you are trying to solve.
Why do you have factors in the dataframe? Can you just have the
values? Do you want to count the 'levels' of the factors in a row, or
do you want to count the numeric they represent (in your case it is
the same, so I wonder why the factor).
Here is one way of doing it to count what the 'level' values are:
> apply(df, 1, function(x) tabulate(as.integer(x), nbins = 4))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 2 3 2 2 1 2 1 2 2 2
[2,] 1 4 7 3 6 5 0 1 1 2
[3,] 3 1 1 4 2 1 6 5 5 3
[4,] 4 2 0 1 1 2 3 2 2 3
>
So tell us what you want to do, not how you want to do it.
2011/11/20 jim holtman <jholtman at gmail.com>:
> The answer to your question as to why you had to convert back to
> factors is that you "undid" the factors when you did the 'cbind' to
> create the dataframe. Here is what you should have done:
>
>> df <- data.frame(rating.1 , rating.2 , rating.3 , rating.4 ,
> + rating.5 , rating.6 , rating.7 , rating.8 ,
> + rating.9 , rating.10)
>>
>> str(df)
> 'data.frame': 10 obs. of 10 variables:
> $ rating.1 : Factor w/ 4 levels "1","2","3","4": 4 1 2 4 3 2 4 1 2 1
> $ rating.2 : Factor w/ 4 levels "1","2","3","4": 2 3 2 3 2 2 1 3 3 3
> $ rating.3 : Factor w/ 4 levels "1","2","3","4": 3 1 1 3 2 1 3 3 1 3
> $ rating.4 : Factor w/ 4 levels "1","2","3","4": 4 2 2 2 2 4 3 3 3 4
> $ rating.5 : Factor w/ 4 levels "1","2","3","4": 1 2 2 2 1 2 3 3 4 4
> $ rating.6 : Factor w/ 4 levels "1","2","3","4": 3 2 2 1 2 2 3 3 3 2
> $ rating.7 : Factor w/ 4 levels "1","2","3","4": 3 4 2 2 4 3 4 4 4 4
> $ rating.8 : Factor w/ 4 levels "1","2","3","4": 4 1 3 1 3 1 4 4 3 3
> $ rating.9 : Factor w/ 4 levels "1","2","3","4": 4 4 2 3 2 4 3 2 3 2
> $ rating.10: Factor w/ 4 levels "1","2","3","4": 1 2 1 3 2 2 3 1 1 1
>
> Notice that the factors are maintained.
>
> When having problems, break up the steps and see what happens at each
> one. Here is the output of your 'cbind':
>
>> x <- (cbind(rating.1 , rating.2 , rating.3 , rating.4 ,
> + rating.5 , rating.6 , rating.7 , rating.8 ,
> + rating.9 , rating.10)
> + )
>> str(x)
> int [1:10, 1:10] 4 1 2 4 3 2 4 1 2 1 ...
> - attr(*, "dimnames")=List of 2
> ..$ : NULL
> ..$ : chr [1:10] "rating.1" "rating.2" "rating.3" "rating.4" ...
>>
>
> notice it is just an integer array.
>
> Also if you had looked at the HELP page, you would have seen:
>
> In the default method, all the vectors/matrices must be atomic (see
> vector) or lists. Expressions are not allowed. Language objects (such
> as formulae and calls) and pairlists will be coerced to lists: other
> objects (such as names and external pointers) will be included as
> elements in a list result. Any classes the inputs might have are
> discarded (in particular, factors are replaced by their internal
> codes).
>
> Notice the last sentence.
>
> 2011/11/20 Stuart Luppescu <slu at ccsr.uchicago.edu>:
>> Hello, I am having trouble getting counts of values in rows of a data
>> frame. I'm trying to use apply, but it's not working.
>>
>> This gives a sample of the kind of data I'm working with:
>>
>> rating.1 <- factor(sample(1:4, size=10, replace=T), levels=1:4)
>> rating.2 <- factor(sample(1:4, size=10, replace=T), levels=1:4)
>> rating.3 <- factor(sample(1:3, size=10, replace=T), levels=1:4)
>> rating.4 <- factor(sample(2:4, size=10, replace=T), levels=1:4)
>> rating.5 <- factor(sample(1:4, size=10, replace=T), levels=1:4)
>> rating.6 <- factor(sample(1:3, size=10, replace=T), levels=1:4)
>> rating.7 <- factor(sample(2:4, size=10, replace=T), levels=1:4)
>> rating.8 <- factor(sample(1:4, size=10, replace=T), levels=1:4)
>> rating.9 <- factor(sample(2:4, size=10, replace=T), levels=1:4)
>> rating.10 <- factor(sample(1:3, size=10, replace=T), levels=1:4)
>>
>> df <- as.data.frame(cbind(rating.1 , rating.2 , rating.3 , rating.4 ,
>> rating.5 , rating.6 , rating.7 , rating.8 ,
>> rating.9 , rating.10))
>>
>> for(i in 1:10) {
>> df[,i] <- factor(df[,i], levels=1:4)
>> }
>>
>> [Aside: why does the original df have columns of class "integer" when
>> the original data are factors? Why is it necessary to reconvert them
>> into factors? Also, is it possible to do this without a for loop?]
>>
>> If I do this:
>>
>> apply(df[,1:10], 1, table)
>>
>> I get a 4x10 array, the contents of which I do not understand.
>>
>> apply(df[,1:10], 2, table)
>>
>> gives 10 tables for the columns, but it leaves out factor levels which
>> do not occur. For example,
>>
>> rating.6 : 'table' int [1:3(1d)] 7 1 2
>> ..- attr(*, "dimnames")=List of 1
>> .. ..$ : chr [1:3] "1" "2" "3"
>>
>> lapply(df[, 1:10], table)
>>
>> gives tables of the columns keeping the levels with 0 counts:
>>
>> $ rating.6 : 'table' int [1:4(1d)] 7 1 2 0
>> ..- attr(*, "dimnames")=List of 1
>> .. ..$ : chr [1:4] "1" "2" "3" "4"
>>
>> But I really want tables of the rows. Do I have to write my own function
>> to count the numbers of values?
>>
>> Thanks in advance.
>>
>> --
>> Stuart Luppescu -=- slu .at. ccsr.uchicago.edu
>> University of Chicago -=- CCSR
>> 才文と智奈美の父 -=- Kernel 3.0.6-gentoo
>> You say yourself it wasn't reproducible. So it could have been anything
>> that "crashed" your R, cosmic radiation, a bolt of lightning reversing a
>> bit in your computer memory, ... :-) -- Martin Maechler (replying to a
>> bug report) R-devel (July 2005)
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
--
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.
More information about the R-help
mailing list