[R] count each answer category in each column

Sat Apr 20 00:44:51 CEST 2013

Hi,
Try this:
dat1<- read.csv("Ye.csv",header=TRUE,stringsAsFactors=FALSE,sep=",",na.strings="N/A")
library(plyr)
library(reshape2)

lst1<-lapply(seq(2,ncol(dat1)),function(i) {x1<- cbind(ID=dat1[,1],dat1[i]);x2<-mutate(dcast(melt(x1,id.var="ID"),value~ID,length),var=rowSums(cbind(A,B,C,D,E)));colnames(x2)[7]<- colnames(x1)[2];x2[,c(1,7,2:6)]})
 names(lst1)<- colnames(dat1)[-1]
 lst1[1]
#$Size
#  value Size A B C D E
#1     1    3 1 1 1 0 0
#2     2    3 1 1 1 0 0
#3     3    4 1 1 0 1 1
#4     4    5 1 0 1 2 1
#5     5    5 0 1 1 2 1
#6     6    2 0 0 0 1 1

#or #with more IDs

lst2<- lapply(seq(2,ncol(dat1)),function(i) {x1<- cbind(ID=dat1[,1],dat1[i]);x2<-dcast(melt(x1,id.var="ID"),value~ID,length); x3<-mutate(x2,var=rowSums(cbind(x2[!grepl("value",names(x2))])));colnames(x3)[7]<- colnames(x1)[2];x3[,c(1,7,2:6)]})
names(lst2)<- colnames(dat1)[-1]

identical(lst1,lst2)
#[1] TRUE

A.K.

________________________________
 From: Ye Lin <yelin at lbl.gov>
To: arun <smartpink111 at yahoo.com> 
Sent: Friday, April 19, 2013 5:49 PM
Subject: Re: [R] count each answer category in each column

Hey A.K

I modified the scripts but it didnt work. Here is my scripts:

lst1<-lapply(list(c(3:25),c(2,4:25),c(2:3,5:25),c(2:4,6:25),c(2:5,7:25),(2:6,8:25),c(2:7,9:25),c(2:8,10:25),c(2:9,11:25),c(2:10,12:25),c(2:11,13:25),c(2:12,14:25),c(2:13,15:25),c(2:14,16:25),c(2:15,17:25),c(2:16,18:25),c(2:17,19:25),c(2:18,20:25),c(2:19,21:25),c(2:20,22:25),c(2:21,23:25),c(2:22,24,25),c(2:23,25),c(2:24)),function(i) mutate(dcast(melt(dat1[,-i],id.var="ID"),value~ID,length),var=rowSums(cbind(A,B,C,D,E))))

ERROR:<text>

lst2<-lapply(seq_along(colnames(dat1)[-1]),function(i) {x1<-lst1[[i]]; colnames(x1)[7]<- colnames(dat1)[i+1]; colnames(x1)[1]<-"Var1";x1[,c(1,7,2:6)]})

names(lst2)<- colnames(dat1)[-1]

I have attached a sample with the same size of my data. I have 25 cols.

Could you help me take a look? Thanks so much!

Ye

On Fri, Apr 19, 2013 at 2:03 PM, arun <smartpink111 at yahoo.com> wrote:

Hi Ye,
>
>The numbers were based on the column numbers for data.  I used that to order in the way you asked the output.
>For example:
> dat1[,-c(3,4)]
>#  ID Gender
>#1  A Female
>#2  A   Male
>#3  B Female
>#4  B   Male
>#5  C   Male
>
>
>dat1[,-c(2,4)]
>  ID   Age
>1  A  0-10
>2  A  0-10
>3  B 11-20
>4  B 11-20
>5  C   >20
>
>
>In your larger dataframe, it depends on how many columns you have.  I was doing this just for demonstration. 
>
>colnames(x1)[5] was the last column for x1.  Its name I wanted to change it so that you can get "Gender", "Age", etc.
>
>x1[,c(1,5,2:4)] is rearranging the columns to make sure that it matches the output order of columns you wanted.
>
>If you find it difficult to process this into your actual data, you can send me the dataset or a dataset that matches the original dataset.
>
>A.K.
>
>
>
>________________________________
> From: Ye Lin <yelin at lbl.gov>
>To: arun <smartpink111 at yahoo.com>
>Sent: Friday, April 19, 2013 4:40 PM
>
>Subject: Re: [R] count each answer category in each column
>
>
>
>Hey A.K,
>
>
>When I apply this to a larger data frame, I assume I should adjust these numbers highlighted below:
>
> lst1<-lapply(list(c(3,4),c(2,4),c(2,3)),function(i) mutate(dcast(melt(dat1[,-i],id.var="ID"),value~ID,length),var=rowSums(cbind(A,B,C))))
>
> lst2<-lapply(seq_along(colnames(dat1)[-1]),function(i) {x1<-lst1[[i]]; colnames(x1)[5]<- colnames(dat1)[i+1]; colnames(x1)[1]<-"Var1";x1[,c(1,5,2:4)]})
>
>
>Could you give me a little more instruction here on the first command for lst1, what is this list used for? e.g my data frame have m cols and n rows, and the first column is still the ID column.
>
>Sorry that I am confused by the long scripts.
>
>
>Thanks for your help!
>
>
>Ye
>
>
>
>
>
>On Fri, Apr 19, 2013 at 12:30 PM, arun <smartpink111 at yahoo.com> wrote:
>
>Hi,
>>Try this:
>>dat1<- read.table(text="
>>
>>ID   Gender   Age   Rate
>> A       Female    0-10   Good
>>  A      Male        0-10   Good
>>  B       Female     11-20  Bad
>> B       Male         11-20  Bad
>>   C       Male         >20     N/A
>>",sep="",header=TRUE,stringsAsFactors=FALSE,na.strings="N/A")
>>library(plyr)
>>library(reshape2)
>>
>> lst1<-lapply(list(c(3,4),c(2,4),c(2,3)),function(i) mutate(dcast(melt(dat1[,-i],id.var="ID"),value~ID,length),var=rowSums(cbind(A,B,C))))
>> lst2<-lapply(seq_along(colnames(dat1)[-1]),function(i) {x1<-lst1[[i]]; colnames(x1)[5]<- colnames(dat1)[i+1]; colnames(x1)[1]<-"Var1";x1[,c(1,5,2:4)]})
>>names(lst2)<- colnames(dat1)[-1]
>> lst2
>>#$Gender
>> #   Var1 Gender A B C
>>#1 Female      2 1 1 0
>>#2   Male      3 1 1 1
>>
>>#$Age
>> #  Var1 Age A B C
>>#1  0-10   2 2 0 0
>>#2 11-20   2 0 2 0
>>#3   >20   1 0 0 1
>>
>>#$Rate
>> # Var1 Rate A B C
>>#1  Bad    2 0 2 0
>>#2 Good    2 2 0 0
>>#3 <NA>    1 0 0 1
>>
>>A.K.
>>
>>
>>
>>________________________________
>> From: Ye Lin <yelin at lbl.gov>
>>To: arun <smartpink111 at yahoo.com>
>>Sent: Friday, April 19, 2013 1:44 PM
>>
>>Subject: Re: [R] count each answer category in each column
>>
>>
>>
>>Yes, but I am wondering if I can calculate how many kinds of answers and how many under each category together, then maybe results can be sth like :
>>
>>
>>$Gender
>>
>>
>>Var1        Gender  A    B     C
>>
>>Male       3           1    1      1
>>
>>Female    2          1     1      0
>>
>>N/A        0           0     0     0
>>
>>
>>Thanks!
>>
>>
>>Ye
>>
>>
>>
>>
>>On Fri, Apr 19, 2013 at 10:36 AM, arun <smartpink111 at yahoo.com> wrote:
>>
>>Hi Ye,
>>>Just a doubt:
>>>In the example given , for ID "A", you have 1 female and 1 male.
>>>
>>>Do you want to categorize the same thing for each ID?
>>>For example:
>>>A:
>>>Var1 Gender
>>>Female 1
>>>Male 1
>>>NA  0
>>>
>>>Var1 Age
>>>0-10 2
>>> 
>>>Var1 Rate
>>>Good 2
>>>
>>>B:
>>>....................
>>>A.K.
>>>
>>>
>>>
>>>
>>>________________________________
>>> From: Ye Lin <yelin at lbl.gov>
>>>To: arun <smartpink111 at yahoo.com>
>>>Cc: R help <r-help at r-project.org>
>>>Sent: Friday, April 19, 2013 1:25 PM
>>>Subject: Re: [R] count each answer category in each column
>>>
>>>
>>>
>>>
>>>Thanks A.K
>>>
>>>
>>>Is it possible to apply this to a more complicated situation , for example, I have an ID column for each row, say:
>>>
>>>
>>>ID   Gender   Age   Rate
>>> A       Female    0-10   Good
>>>  A      Male        0-10   Good
>>>  B       Female     11-20  Bad
>>> B       Male         11-20  Bad
>>>   C       Male         >20     N/A
>>>
>>>
>>>
>>>When return the results indicate how many answers are from each ID, say for gender, we have 2 female, and 1 from category A and 1 from category B??? Thanks.
>>>
>>>Ye
>>>
>>>
>>>
>>>
>>>On Thu, Apr 18, 2013 at 4:04 PM, arun <smartpink111 at yahoo.com> wrote:
>>>
>>>Hi,
>>>>Try this:
>>>>Assuming that "table" is "data.frame"
>>>>
>>>>
>>>>dat1<-read.table(text="
>>>>
>>>>Gender  Age  Rate
>>>>Female    0-10  Good
>>>>Male        0-10  Good
>>>>Female    11-20  Bad
>>>>Male        11-20  Bad
>>>>Male        >20    N/A
>>>>",sep="",header=TRUE,stringsAsFactors=FALSE,na.strings="N/A")
>>>>lapply(seq_len(ncol(dat1)),function(i) {x1<-as.data.frame(table(dat1[,i],useNA="always"));colnames(x1)[2]<-colnames(dat1)[i];x1})
>>>>#[[1]]
>>>> #   Var1 Gender
>>>>#1 Female      2
>>>>#2   Male      3
>>>>#3   <NA>      0
>>>>
>>>>#[[2]]
>>>> #  Var1 Age
>>>>#1  0-10   2
>>>>#2 11-20   2
>>>>#3   >20   1
>>>>#4  <NA>   0
>>>>
>>>>#[[3]]
>>>> # Var1 Rate
>>>>#1  Bad    2
>>>>#2 Good    2
>>>>#3 <NA>    1
>>>>A.K.
>>>>
>>>>
>>>>
>>>>
>>>>----- Original Message -----
>>>>From: Ye Lin <yelin at lbl.gov>
>>>>To: R help <r-help at r-project.org>
>>>>Cc:
>>>>Sent: Thursday, April 18, 2013 6:46 PM
>>>>Subject: [R] count each answer category in each column
>>>>
>>>>Hey,
>>>>
>>>>Is it possible that R can calculate each options under each column and
>>>>return a summary table?
>>>>
>>>>Suppose I have a table like this:
>>>>
>>>>Gender   Age   Rate
>>>>Female    0-10   Good
>>>>Male        0-10   Good
>>>>Female     11-20  Bad
>>>>Male         11-20  Bad
>>>>Male         >20     N/A
>>>>
>>>>I want to have a summary table including the information that how many
>>>>answers in each category, sth like this:
>>>>
>>>>  X         Gender
>>>>Male       3
>>>>Female    2
>>>>N/A        0
>>>>
>>>>  X          Age
>>>>0-10         2
>>>>11-20         2
>>>>>20           1
>>>>N/A         0
>>>>
>>>>X          Rate
>>>>Good       2
>>>>Bad          2
>>>>N/A         1
>>>>
>>>>So basically I want to calculate, in each column, how many people choose
>>>>each answer, including N/A. I know I can do it in Excel in a very
>>>>visualized way, but is there anyway to do it in R in a robust way if I have
>>>>a fairly large dataset.
>>>>
>>>>Thanks!
>>>>
>>>>    [[alternative HTML version deleted]]
>>>>
>>>>______________________________________________
>>>>R-help at r-project.org mailing list
>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>
>>
>