[R] extracting non-NA entries from a two-way frequency table

William Dunlap wdunlap at tibco.com
Fri Dec 13 21:06:38 CET 2013


The following puts the data.frame into 'long' format and then drops
rows with NA's for 'n'.

f <- function(data){
    df <- data.frame(
        expand.grid(
            boys = data[["boys"]],
            girls = as.integer(sub("^g", "", colnames(data)[-1]))
        ),
        n = unlist(data[, -1])) # n is all but the 'boys', the first, column
    df[!is.na(df[["n"]]), ]
}

E.g.,

> f(Geissler)[c(2,30,75),]
    boys girls    n
2     11     0   24
34     5     2 9547
104    0     7  436
> Geissler[Geissler["boys"]==5, "g2"]
[1] 9547

reshape() can probably do this but I always get lost in its argument list.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Michael Friendly
> Sent: Friday, December 13, 2013 11:42 AM
> To: R-help
> Subject: [R] extracting non-NA entries from a two-way frequency table
> 
> I have data in the form of a two-way table recording the number of
> families with varying numbers
> of boys (rows) and girls (columns: g0 -- g12) below, also given in
> dput() format.
> 
> I want to convert this to a data frame containing only the non-NA
> entries, with columns
> boys, girls, Freq, where Freq is the table entry.  Can anyone help with
> this?
> I suppose that the steps are to transpose each row to a column
> identifying the number of
> girls, and then delete the NAs, but I can't quite see how to do this.
> 
> 
>  > Geissler
>     boys     g0     g1    g2    g3    g4   g5   g6   g7  g8 g9 g10 g11 g12
> 1    12      7     NA    NA    NA    NA   NA   NA   NA  NA NA  NA  NA  NA
> 2    11     24     45    NA    NA    NA   NA   NA   NA  NA NA  NA  NA  NA
> 3    10     30     93   181    NA    NA   NA   NA   NA  NA NA  NA  NA  NA
> 4     9     90    287   492   478    NA   NA   NA   NA  NA NA  NA  NA  NA
> 5     8    264    713  1027  1077   829   NA   NA   NA  NA NA  NA  NA  NA
> 6     7    631   1655  2418  2309  1801 1112   NA   NA  NA NA  NA  NA  NA
> 7     6   1579   3725  4948  4757  3470 2310 1343   NA  NA NA  NA  NA  NA
> 8     5   3666   7908  9547  8498  6436 3878 2161 1033  NA NA  NA  NA  NA
> 9     4   8628  16340 17332 14479 10263 5917 3072 1540 670 NA  NA  NA  NA
> 10    3  20540  31611 30175 22221 13972 7603 3895 1783 837 286  NA  NA  NA
> 11    2  47819  57179 44793 28630 15700 8171 3951 1776 722 275 104  NA  NA
> 12    1 114609  89213 53789 28101 13740 6233 2719 1152 432 151  72  24  NA
> 13    0     NA 108719 42860 17395  7004 2839 1096  436 161 66  30   8   3
> 
> Geissler <-
> structure(list(boys = c(12L, 11L, 10L, 9L, 8L, 7L, 6L, 5L, 4L,
> 3L, 2L, 1L, 0L), g0 = c(7L, 24L, 30L, 90L, 264L, 631L, 1579L,
> 3666L, 8628L, 20540L, 47819L, 114609L, NA), g1 = c(NA, 45L, 93L,
> 287L, 713L, 1655L, 3725L, 7908L, 16340L, 31611L, 57179L, 89213L,
> 108719L), g2 = c(NA, NA, 181L, 492L, 1027L, 2418L, 4948L, 9547L,
> 17332L, 30175L, 44793L, 53789L, 42860L), g3 = c(NA, NA, NA, 478L,
> 1077L, 2309L, 4757L, 8498L, 14479L, 22221L, 28630L, 28101L, 17395L
> ), g4 = c(NA, NA, NA, NA, 829L, 1801L, 3470L, 6436L, 10263L,
> 13972L, 15700L, 13740L, 7004L), g5 = c(NA, NA, NA, NA, NA, 1112L,
> 2310L, 3878L, 5917L, 7603L, 8171L, 6233L, 2839L), g6 = c(NA,
> NA, NA, NA, NA, NA, 1343L, 2161L, 3072L, 3895L, 3951L, 2719L,
> 1096L), g7 = c(NA, NA, NA, NA, NA, NA, NA, 1033L, 1540L, 1783L,
> 1776L, 1152L, 436L), g8 = c(NA, NA, NA, NA, NA, NA, NA, NA, 670L,
> 837L, 722L, 432L, 161L), g9 = c(NA, NA, NA, NA, NA, NA, NA, NA,
> NA, 286L, 275L, 151L, 66L), g10 = c(NA, NA, NA, NA, NA, NA, NA,
> NA, NA, NA, 104L, 72L, 30L), g11 = c(NA, NA, NA, NA, NA, NA,
> NA, NA, NA, NA, NA, 24L, 8L), g12 = c(NA, NA, NA, NA, NA, NA,
> NA, NA, NA, NA, NA, NA, 3L)), .Names = c("boys", "g0", "g1",
> "g2", "g3", "g4", "g5", "g6", "g7", "g8", "g9", "g10", "g11",
> "g12"), class = "data.frame", row.names = c(NA, -13L))
> 
> 
> --
> Michael Friendly     Email: friendly AT yorku DOT ca
> Professor, Psychology Dept. & Chair, Quantitative Methods
> York University      Voice: 416 736-2100 x66249 Fax: 416 736-5814
> 4700 Keele Street    Web:   http://www.datavis.ca
> Toronto, ONT  M3J 1P3 CANADA
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list