[R] Tabulating using arbitrary numbers of factors
Erik Iverson
eiverson at NMDP.ORG
Fri Oct 2 22:08:25 CEST 2009
Andrew,
Is this what you're looking for? Most likely a more elegant solution exists... but maybe this is good enough.
## BEGIN R SAMPLE CODE
## sample data frame, 3 factors
tmp <- data.frame(f1 = sample(gl(2, 50, labels = c("Male", "Female"))),
f2 = sample(gl(4, 25, labels =
c("White", "Black", "Hispanic", "Other"))),
f3 = sample(gl(4, 25, labels =
c("0-20", "21-40", "41-60", "61-80"))))
summary(tmp)
## the function
test <- function(...) {
tbl <- table(interaction(..., sep = "!"))
tbl.nozero <- tbl[tbl > 0]
nms <- strsplit(names(tbl.nozero), "!")
cb <- cbind(t(do.call(data.frame, nms)), tbl.nozero)
dimnames(cb) <- NULL
cb
}
## test calling the function, does this produce what you want?
with(tmp, test(f1, f2, f3))
## END R SAMPLE CODE
Best Regards,
Erik Iverson
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On Behalf Of Andrew Spence
> Sent: Friday, October 02, 2009 1:15 PM
> To: r-help at r-project.org
> Subject: [R] Tabulating using arbitrary numbers of factors
>
> Dear R-help,
>
>
>
> First of all, thank you VERY much for any help you have time to offer. I
> greatly appreciate it.
>
>
>
> I would like to write a function that, given an arbitrary number of
> factors
> from a data frame, tabulates the number of occurrences of each unique
> combination of the factors. Cleary, this works:
>
>
>
> > table(horse,date,surface)
>
> <SNIP>
>
> , , surface = TURF
>
>
>
> date
>
> horse 20080404 20080514 20081015 20081025 20081120 20081203
> 20090319
>
> Bedevil 0 0 0 0 0 0
> 0
>
> Cut To The Point 227 0 0 0 0 0
> 0
>
> <SNIP>
>
>
>
> But I would prefer output that skips all the zeros, flattens any
> dimensions
> greater than 2, and gives the level names rather than codes. I can write
> code specifically for n factors like this: (here 2 levels):
>
>
>
> ft <- function(x,y) {cbind(
> levels(x)[unique(cbind(x,y))[,1]],levels(y)[unique(cbind(x,y))[,2]],
> table(x,y)[unique(cbind(x,y))])}
>
>
>
> which gives the lovely output I'm looking for:
>
>
>
> # [,1] [,2] [,3]
>
> # [1,] "Cut To The Point" "20080404" "227"
>
> # [2,] "Prairie Wolf" "20080404" "364"
>
> # [3,] "Bedevil" "20080514" "319"
>
> # [4,] "Prairie Wolf" "20080514" "330"
>
>
>
> But my attempts to make this into a function that handles arbitrary
> numbers
> of factors as separate input arguments has failed. The closest I can get
> is:
>
>
>
> ft2 <- function (...) { cbind( unique(cbind(...)),
> table(...)[unique(cbind(...))] )
>
>
>
> giving:
>
> > ft2(horse,date)
>
> horse date
>
> [1,] 2 1 227
>
> [2,] 9 1 364
>
> [3,] 1 2 319
>
> [4,] 9 2 330
>
> [5,] 9 3 291
>
> [6,] 12 3 249
>
> [7,] 10 3 286
>
> [8,] 5 4 217
>
> [9,] 3 4 426
>
> [10,] 8 4 468
>
> [11,] 9 5 319
>
> [12,] 13 5 328
>
> [13,] 12 5 138
>
> [14,] 7 6 375
>
> [15,] 11 6 366
>
> [16,] 4 7 255
>
> [17,] 6 7 517
>
>
>
> I would be greatly in debt to anyone willing to show me how to make the
> above function take arbitrary inputs and still produce output displaying
> factor level names instead of the underlying coded numbers.
>
>
>
> Cheers and thanks for your time!
>
>
>
> Andrew Spence
> RCUK Academic Research Fellow
> Structure and Motion Laboratory
> Royal Veterinary College
> Hawkshead Lane
> North Mymms, Hatfield
> Hertfordshire AL9 7TA
> +44 (0) 1707 666988
>
> mailto:aspence at rvc.ac.uk
>
> http://www.rvc.ac.uk/sml/People/andrewspence.cfm
>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list