[R] Tabulating using arbitrary numbers of factors

Erik Iverson eiverson at NMDP.ORG
Fri Oct 2 22:08:25 CEST 2009


Andrew, 

Is this what you're looking for?  Most likely a more elegant solution exists... but maybe this is good enough. 

## BEGIN R SAMPLE CODE
## sample data frame, 3 factors
tmp <- data.frame(f1 = sample(gl(2, 50, labels = c("Male", "Female"))),
                  f2 = sample(gl(4, 25, labels =
                    c("White", "Black", "Hispanic", "Other"))),
                  f3 = sample(gl(4, 25, labels =
                    c("0-20", "21-40", "41-60", "61-80"))))

summary(tmp)

## the function
test <- function(...) {
  tbl <- table(interaction(..., sep = "!"))
  tbl.nozero <- tbl[tbl > 0]
  
  nms <- strsplit(names(tbl.nozero), "!")
  
  cb <- cbind(t(do.call(data.frame, nms)), tbl.nozero)
  dimnames(cb) <- NULL
  cb
}

## test calling the function, does this produce what you want? 
with(tmp, test(f1, f2, f3))

## END R SAMPLE CODE 

Best Regards,
Erik Iverson

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On Behalf Of Andrew Spence
> Sent: Friday, October 02, 2009 1:15 PM
> To: r-help at r-project.org
> Subject: [R] Tabulating using arbitrary numbers of factors
> 
> Dear R-help,
> 
> 
> 
> First of all, thank you VERY much for any help you have time to offer. I
> greatly appreciate it.
> 
> 
> 
> I would like to write a function that, given an arbitrary number of
> factors
> from a data frame, tabulates the number of occurrences of each unique
> combination of the factors. Cleary, this works:
> 
> 
> 
> > table(horse,date,surface)
> 
> <SNIP>
> 
> , , surface = TURF
> 
> 
> 
>                    date
> 
> horse               20080404 20080514 20081015 20081025 20081120 20081203
> 20090319
> 
>   Bedevil                  0        0        0        0        0        0
> 0
> 
>   Cut To The Point       227        0        0        0        0        0
> 0
> 
> <SNIP>
> 
> 
> 
> But I would prefer output that skips all the zeros, flattens any
> dimensions
> greater than 2, and gives the level names rather than codes. I can write
> code specifically for n factors like this: (here 2 levels):
> 
> 
> 
> ft <- function(x,y) {cbind(
> levels(x)[unique(cbind(x,y))[,1]],levels(y)[unique(cbind(x,y))[,2]],
> table(x,y)[unique(cbind(x,y))])}
> 
> 
> 
> which gives the lovely output I'm looking for:
> 
> 
> 
> #      [,1]                [,2]       [,3]
> 
> # [1,] "Cut To The Point"  "20080404" "227"
> 
> # [2,] "Prairie Wolf"      "20080404" "364"
> 
> # [3,] "Bedevil"           "20080514" "319"
> 
> # [4,] "Prairie Wolf"      "20080514" "330"
> 
> 
> 
> But my attempts to make this into a function that handles arbitrary
> numbers
> of factors as separate input arguments has failed. The closest I can get
> is:
> 
> 
> 
> ft2 <- function (...) { cbind( unique(cbind(...)),
> table(...)[unique(cbind(...))] )
> 
> 
> 
> giving:
> 
> > ft2(horse,date)
> 
>       horse date
> 
>  [1,]     2    1 227
> 
>  [2,]     9    1 364
> 
>  [3,]     1    2 319
> 
>  [4,]     9    2 330
> 
>  [5,]     9    3 291
> 
>  [6,]    12    3 249
> 
>  [7,]    10    3 286
> 
>  [8,]     5    4 217
> 
>  [9,]     3    4 426
> 
> [10,]     8    4 468
> 
> [11,]     9    5 319
> 
> [12,]    13    5 328
> 
> [13,]    12    5 138
> 
> [14,]     7    6 375
> 
> [15,]    11    6 366
> 
> [16,]     4    7 255
> 
> [17,]     6    7 517
> 
> 
> 
> I would be greatly in debt to anyone willing to show me how to make the
> above function take arbitrary inputs and still produce output displaying
> factor level names instead of the underlying coded numbers.
> 
> 
> 
> Cheers and thanks for your time!
> 
> 
> 
> Andrew Spence
> RCUK Academic Research Fellow
> Structure and Motion Laboratory
> Royal Veterinary College
> Hawkshead Lane
> North Mymms, Hatfield
> Hertfordshire AL9 7TA
> +44 (0) 1707 666988
> 
> mailto:aspence at rvc.ac.uk
> 
> http://www.rvc.ac.uk/sml/People/andrewspence.cfm
> 
> 
> 
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list