[R] Summarizing factor data in table?

Tony Plate tplate at acm.org
Tue Apr 26 20:00:19 CEST 2005


Do you want to count the number of non-NA divisions and organizations in 
the data for each year (where duplicates are counted as many times as 
they appear)?

 > tapply(!is.na(foo$div), foo$yr, sum)
1998 1999 2000
    0    4    2
 > tapply(!is.na(foo$org), foo$yr, sum)
1998 1999 2000
    4    4    2
 >

Or perhaps the number of unique non-NA divisions and organizations in 
the data for each year?

 > tapply(foo$div, foo$yr, function(x) length(na.omit(unique(x))))
1998 1999 2000
    0    4    2
 > tapply(foo$org, foo$yr, function(x) length(na.omit(unique(x))))
1998 1999 2000
    4    4    2
 >

(I don't understand where the "3" in your desired output comes from 
though, which maybe indicates I completely misunderstand your request.)

Andy Bunn wrote:
> I have a very simple query with regard to summarizing the number of factors
> present in a certain snippet of a data frame.
> Given the following data frame:
> 
> 	foo <- data.frame(yr = c(rep(1998,4), rep(1999,4), rep(2000,2)), div =
> factor(c(rep(NA,4),"A","B","C","D","A","C")),
>       	            org = factor(c(1:4,1:4,1,2)))
> 
> I want to get two new variables. Object ndiv would give the number of
> divisions by year:
>      1998 0
>      1999 3
>      2000 2
> Object norgs would give the number of organizations
>      1998 4
>      1999 4
>      2000 2
> I figure xtabs should be able to do it, but I'm stuck without a for loop.
> Any suggestions? -Andy
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>




More information about the R-help mailing list