[R] crosstabulation and unlist function
rmailbox at justemail.net
rmailbox at justemail.net
Mon Oct 12 22:23:17 CEST 2009
What you're really saying is that you don't care about the distinction between "aa", "bb" and "cc". In that case, a different arrangement of the data will be more useful:
library (reshape )
df.melt <- melt ( df, id.var = "dd")
with ( df.melt, table ( dd, value ) )
Eric
----- Original message -----
From: "eugen pircalabelu" <eugen_pircalabelu at yahoo.com>
To: "David Winsemius" <dwinsemius at comcast.net>
Cc: "R-help" <r-help at stat.math.ethz.ch>
Date: Mon, 12 Oct 2009 13:05:33 -0700 (PDT)
Subject: Re: [R] crosstabulation and unlist function
Hello,
First of all, thank you David for your reply, but sadly this is not what i wanted (i am sorry for not being more specific about my problem!)
aa<-c(1:5)
bb<-c(NA,2,NA,4,5)
cc<-c(1,2,NA,4,NA)
dd<-c("A","B","B","A","C")
table(unlist(df[,1:3]))
> df
aa bb cc dd
1 1 NA 1 A
2 2 2 2 B
3 3 NA NA B
4 4 4 4 A
5 5 5 NA C
I do not want to get this:
> tapply(apply(df[,1:3],1,sum, na.rm=TRUE), df$dd, sum)
A B C
14 6 10
but a crosstabulation between table(unlist(df[,1:3])) and df$dd, which should look something like this:
1 2 3 4 5
A 2 0 0 3 0
B 0 3 1 0 0
C 0 0 0 0 2
meaning that when dd is A 1 appears 2 times, 2 doesn't appear, 3 doesn't appear, 4 appears 3times, 5 doesn't appear; when dd is C only 5 appears 2 times (i am not really interested in the NA occurence).
Hopefully, this time my question was a lot more clear.
Thank you very much !
----- Original Message ----
From: David Winsemius <dwinsemius at comcast.net>
To: David Winsemius <dwinsemius at comcast.net>
Cc: eugen pircalabelu <eugen_pircalabelu at yahoo.com>; R-help <r-help at stat.math.ethz.ch>
Sent: Mon, October 12, 2009 9:36:39 PM
Subject: Re: [R] crosstabulation and unlist function
On Oct 12, 2009, at 3:25 PM, David Winsemius wrote:
>
> On Oct 12, 2009, at 2:36 PM, eugen pircalabelu wrote:
>
>> Hello R-users,
>>
>> My toy example:
>> aa<-c(1:5)
>> bb<-c(NA,2,NA,4,5)
>> cc<-c(1,2,NA,4,NA)
>> dd<-c("A","B","B","A","C")
>> df<-data.frame(aa,bb,cc,dd=as.factor(dd))
>> table(unlist(df[,1:3]))
>>
>> Can anyone point me to what function let's me do a crosstabulation between table(unlist(df[,1:3])) and df$dd?
>> I want to find out when dd==A (or B, or C) how many times do the values 1, 2 ,3,.. appear in df[,1:3]?
>> Thank you very much!
>
> One way would be to collect the row sums of those columns first, and then sum by index:
>
> tapply(apply(df[,1:3],1,sum, na.rm=TRUE), df$dd, sum)
> A B C
> 14 9 10
This method is safer than working on table(unlist(df[, 1:3]) since it does not "break" when an entire row is empty.
> aa<-c(1,2,NA,4,5)
> bb<-c(NA,2,NA,4,5)
> cc<-c(1,2,NA,4,NA)
> dd<-c("A","B","B","A","C")
> df<-data.frame(aa,bb,cc,dd=as.factor(dd))
> table(unlist(df[,1:3]))
1 2 4 5
2 3 3 2 # missing row willno longer be aligned with "dd".
> tapply(table(unlist(df[,1:3])), df$dd, sum)
Error in tapply(table(unlist(df[, 1:3])), df$dd, sum) :
arguments must have same length
> tapply(apply(df[,1:3],1,sum, na.rm=TRUE), df$dd, sum)
A B C
14 6 10
>
> --
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list