[R] count each answer category in each column
arun
smartpink111 at yahoo.com
Sat Apr 20 00:44:51 CEST 2013
Hi,
Try this:
dat1<- read.csv("Ye.csv",header=TRUE,stringsAsFactors=FALSE,sep=",",na.strings="N/A")
library(plyr)
library(reshape2)
lst1<-lapply(seq(2,ncol(dat1)),function(i) {x1<- cbind(ID=dat1[,1],dat1[i]);x2<-mutate(dcast(melt(x1,id.var="ID"),value~ID,length),var=rowSums(cbind(A,B,C,D,E)));colnames(x2)[7]<- colnames(x1)[2];x2[,c(1,7,2:6)]})
names(lst1)<- colnames(dat1)[-1]
lst1[1]
#$Size
# value Size A B C D E
#1 1 3 1 1 1 0 0
#2 2 3 1 1 1 0 0
#3 3 4 1 1 0 1 1
#4 4 5 1 0 1 2 1
#5 5 5 0 1 1 2 1
#6 6 2 0 0 0 1 1
#or #with more IDs
lst2<- lapply(seq(2,ncol(dat1)),function(i) {x1<- cbind(ID=dat1[,1],dat1[i]);x2<-dcast(melt(x1,id.var="ID"),value~ID,length); x3<-mutate(x2,var=rowSums(cbind(x2[!grepl("value",names(x2))])));colnames(x3)[7]<- colnames(x1)[2];x3[,c(1,7,2:6)]})
names(lst2)<- colnames(dat1)[-1]
identical(lst1,lst2)
#[1] TRUE
A.K.
________________________________
From: Ye Lin <yelin at lbl.gov>
To: arun <smartpink111 at yahoo.com>
Sent: Friday, April 19, 2013 5:49 PM
Subject: Re: [R] count each answer category in each column
Hey A.K
I modified the scripts but it didnt work. Here is my scripts:
lst1<-lapply(list(c(3:25),c(2,4:25),c(2:3,5:25),c(2:4,6:25),c(2:5,7:25),(2:6,8:25),c(2:7,9:25),c(2:8,10:25),c(2:9,11:25),c(2:10,12:25),c(2:11,13:25),c(2:12,14:25),c(2:13,15:25),c(2:14,16:25),c(2:15,17:25),c(2:16,18:25),c(2:17,19:25),c(2:18,20:25),c(2:19,21:25),c(2:20,22:25),c(2:21,23:25),c(2:22,24,25),c(2:23,25),c(2:24)),function(i) mutate(dcast(melt(dat1[,-i],id.var="ID"),value~ID,length),var=rowSums(cbind(A,B,C,D,E))))
ERROR:<text>
lst2<-lapply(seq_along(colnames(dat1)[-1]),function(i) {x1<-lst1[[i]]; colnames(x1)[7]<- colnames(dat1)[i+1]; colnames(x1)[1]<-"Var1";x1[,c(1,7,2:6)]})
names(lst2)<- colnames(dat1)[-1]
I have attached a sample with the same size of my data. I have 25 cols.
Could you help me take a look? Thanks so much!
Ye
On Fri, Apr 19, 2013 at 2:03 PM, arun <smartpink111 at yahoo.com> wrote:
Hi Ye,
>
>The numbers were based on the column numbers for data. I used that to order in the way you asked the output.
>For example:
> dat1[,-c(3,4)]
># ID Gender
>#1 A Female
>#2 A Male
>#3 B Female
>#4 B Male
>#5 C Male
>
>
>dat1[,-c(2,4)]
> ID Age
>1 A 0-10
>2 A 0-10
>3 B 11-20
>4 B 11-20
>5 C >20
>
>
>In your larger dataframe, it depends on how many columns you have. I was doing this just for demonstration.
>
>colnames(x1)[5] was the last column for x1. Its name I wanted to change it so that you can get "Gender", "Age", etc.
>
>x1[,c(1,5,2:4)] is rearranging the columns to make sure that it matches the output order of columns you wanted.
>
>If you find it difficult to process this into your actual data, you can send me the dataset or a dataset that matches the original dataset.
>
>A.K.
>
>
>
>________________________________
> From: Ye Lin <yelin at lbl.gov>
>To: arun <smartpink111 at yahoo.com>
>Sent: Friday, April 19, 2013 4:40 PM
>
>Subject: Re: [R] count each answer category in each column
>
>
>
>Hey A.K,
>
>
>When I apply this to a larger data frame, I assume I should adjust these numbers highlighted below:
>
> lst1<-lapply(list(c(3,4),c(2,4),c(2,3)),function(i) mutate(dcast(melt(dat1[,-i],id.var="ID"),value~ID,length),var=rowSums(cbind(A,B,C))))
>
> lst2<-lapply(seq_along(colnames(dat1)[-1]),function(i) {x1<-lst1[[i]]; colnames(x1)[5]<- colnames(dat1)[i+1]; colnames(x1)[1]<-"Var1";x1[,c(1,5,2:4)]})
>
>
>Could you give me a little more instruction here on the first command for lst1, what is this list used for? e.g my data frame have m cols and n rows, and the first column is still the ID column.
>
>Sorry that I am confused by the long scripts.
>
>
>Thanks for your help!
>
>
>Ye
>
>
>
>
>
>On Fri, Apr 19, 2013 at 12:30 PM, arun <smartpink111 at yahoo.com> wrote:
>
>Hi,
>>Try this:
>>dat1<- read.table(text="
>>
>>ID Gender Age Rate
>> A Female 0-10 Good
>> A Male 0-10 Good
>> B Female 11-20 Bad
>> B Male 11-20 Bad
>> C Male >20 N/A
>>",sep="",header=TRUE,stringsAsFactors=FALSE,na.strings="N/A")
>>library(plyr)
>>library(reshape2)
>>
>> lst1<-lapply(list(c(3,4),c(2,4),c(2,3)),function(i) mutate(dcast(melt(dat1[,-i],id.var="ID"),value~ID,length),var=rowSums(cbind(A,B,C))))
>> lst2<-lapply(seq_along(colnames(dat1)[-1]),function(i) {x1<-lst1[[i]]; colnames(x1)[5]<- colnames(dat1)[i+1]; colnames(x1)[1]<-"Var1";x1[,c(1,5,2:4)]})
>>names(lst2)<- colnames(dat1)[-1]
>> lst2
>>#$Gender
>> # Var1 Gender A B C
>>#1 Female 2 1 1 0
>>#2 Male 3 1 1 1
>>
>>#$Age
>> # Var1 Age A B C
>>#1 0-10 2 2 0 0
>>#2 11-20 2 0 2 0
>>#3 >20 1 0 0 1
>>
>>#$Rate
>> # Var1 Rate A B C
>>#1 Bad 2 0 2 0
>>#2 Good 2 2 0 0
>>#3 <NA> 1 0 0 1
>>
>>A.K.
>>
>>
>>
>>________________________________
>> From: Ye Lin <yelin at lbl.gov>
>>To: arun <smartpink111 at yahoo.com>
>>Sent: Friday, April 19, 2013 1:44 PM
>>
>>Subject: Re: [R] count each answer category in each column
>>
>>
>>
>>Yes, but I am wondering if I can calculate how many kinds of answers and how many under each category together, then maybe results can be sth like :
>>
>>
>>$Gender
>>
>>
>>Var1 Gender A B C
>>
>>Male 3 1 1 1
>>
>>Female 2 1 1 0
>>
>>N/A 0 0 0 0
>>
>>
>>Thanks!
>>
>>
>>Ye
>>
>>
>>
>>
>>On Fri, Apr 19, 2013 at 10:36 AM, arun <smartpink111 at yahoo.com> wrote:
>>
>>Hi Ye,
>>>Just a doubt:
>>>In the example given , for ID "A", you have 1 female and 1 male.
>>>
>>>Do you want to categorize the same thing for each ID?
>>>For example:
>>>A:
>>>Var1 Gender
>>>Female 1
>>>Male 1
>>>NA 0
>>>
>>>Var1 Age
>>>0-10 2
>>>
>>>Var1 Rate
>>>Good 2
>>>
>>>B:
>>>....................
>>>A.K.
>>>
>>>
>>>
>>>
>>>________________________________
>>> From: Ye Lin <yelin at lbl.gov>
>>>To: arun <smartpink111 at yahoo.com>
>>>Cc: R help <r-help at r-project.org>
>>>Sent: Friday, April 19, 2013 1:25 PM
>>>Subject: Re: [R] count each answer category in each column
>>>
>>>
>>>
>>>
>>>Thanks A.K
>>>
>>>
>>>Is it possible to apply this to a more complicated situation , for example, I have an ID column for each row, say:
>>>
>>>
>>>ID Gender Age Rate
>>> A Female 0-10 Good
>>> A Male 0-10 Good
>>> B Female 11-20 Bad
>>> B Male 11-20 Bad
>>> C Male >20 N/A
>>>
>>>
>>>
>>>When return the results indicate how many answers are from each ID, say for gender, we have 2 female, and 1 from category A and 1 from category B??? Thanks.
>>>
>>>Ye
>>>
>>>
>>>
>>>
>>>On Thu, Apr 18, 2013 at 4:04 PM, arun <smartpink111 at yahoo.com> wrote:
>>>
>>>Hi,
>>>>Try this:
>>>>Assuming that "table" is "data.frame"
>>>>
>>>>
>>>>dat1<-read.table(text="
>>>>
>>>>Gender Age Rate
>>>>Female 0-10 Good
>>>>Male 0-10 Good
>>>>Female 11-20 Bad
>>>>Male 11-20 Bad
>>>>Male >20 N/A
>>>>",sep="",header=TRUE,stringsAsFactors=FALSE,na.strings="N/A")
>>>>lapply(seq_len(ncol(dat1)),function(i) {x1<-as.data.frame(table(dat1[,i],useNA="always"));colnames(x1)[2]<-colnames(dat1)[i];x1})
>>>>#[[1]]
>>>> # Var1 Gender
>>>>#1 Female 2
>>>>#2 Male 3
>>>>#3 <NA> 0
>>>>
>>>>#[[2]]
>>>> # Var1 Age
>>>>#1 0-10 2
>>>>#2 11-20 2
>>>>#3 >20 1
>>>>#4 <NA> 0
>>>>
>>>>#[[3]]
>>>> # Var1 Rate
>>>>#1 Bad 2
>>>>#2 Good 2
>>>>#3 <NA> 1
>>>>A.K.
>>>>
>>>>
>>>>
>>>>
>>>>----- Original Message -----
>>>>From: Ye Lin <yelin at lbl.gov>
>>>>To: R help <r-help at r-project.org>
>>>>Cc:
>>>>Sent: Thursday, April 18, 2013 6:46 PM
>>>>Subject: [R] count each answer category in each column
>>>>
>>>>Hey,
>>>>
>>>>Is it possible that R can calculate each options under each column and
>>>>return a summary table?
>>>>
>>>>Suppose I have a table like this:
>>>>
>>>>Gender Age Rate
>>>>Female 0-10 Good
>>>>Male 0-10 Good
>>>>Female 11-20 Bad
>>>>Male 11-20 Bad
>>>>Male >20 N/A
>>>>
>>>>I want to have a summary table including the information that how many
>>>>answers in each category, sth like this:
>>>>
>>>> X Gender
>>>>Male 3
>>>>Female 2
>>>>N/A 0
>>>>
>>>> X Age
>>>>0-10 2
>>>>11-20 2
>>>>>20 1
>>>>N/A 0
>>>>
>>>>X Rate
>>>>Good 2
>>>>Bad 2
>>>>N/A 1
>>>>
>>>>So basically I want to calculate, in each column, how many people choose
>>>>each answer, including N/A. I know I can do it in Excel in a very
>>>>visualized way, but is there anyway to do it in R in a robust way if I have
>>>>a fairly large dataset.
>>>>
>>>>Thanks!
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>>______________________________________________
>>>>R-help at r-project.org mailing list
>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>
>>
>
More information about the R-help
mailing list