[R] Generate Crosstab in R

arun smartpink111 at yahoo.com
Thu Apr 10 03:34:33 CEST 2014

Suppose your data is similar to below:
dat <- structure(list(Custom = c("Judi", "Judi", "Ben", "Tom", "Tom", 
"Bill", "Lindy", "Shary", "Judu", "Judu", "Billy", "Tommy", "Tommy", 
"Benjum", "Linda", "Shiry", "Shiry", "Shiry", "Judu", "Billy", 
"Tommy", "Lindy"), Gender = c("Female", "Female", "Male", "Male", 
"Male", "Male", "Female", "Female", "Female", "Female", "Male", 
"Male", "Male", "Male", "Female", "Female", "Female", "Female", 
"Female", "Male", "Male", "Female"), Product = c("A", "B", "A", 
"A", "B", "B", "A", "B", "A", "B", "A", "A", "B", "B", "A", "B", 
"A", "C", "D", "E", "D", "C"), Payment = c("Credit Card", "Credit Card", 
"Cash", "Cash", "Cash", "Credit Card", "Cash", "Credit Card", 
"Credit Card", "Credit Card", "Cash", "Cash", "Cash", "Credit Card", 
"Cash", "Credit Card", "Credit Card", "Credit Card", "Credit Card", 
"Cash", "Cash", "Cash")), .Names = c("Custom", "Gender", "Product", 
"Payment"), class = "data.frame", row.names = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", 
"16", "17", "18", "19", "20", "21", "22"))

dat1 <- within(dat, Categ <- ave(Product, Custom, FUN= function(x) if(length(x)>1)  paste("Purchase", gsub("(.*)\\,(.*)$","\\1 and \\2",paste(sort(unique(x)),collapse = ","))) else paste("Purchase", x, "only"))) 

res <- acast(dat1,Categ~Gender+Payment,length,value.var="Categ")

res1 <- res/str_count(gsub("Purchase|and|only|\\,"," ",rownames(res)),"\\w+")
#Female_Cash Female_Credit Card Male_Cash Male_Credit Card
# Purchase A and B             0                  1         1                0
# Purchase A and C             1                  0         0                0 
#Purchase A and E             0                  0         1                0 
#Purchase A,B and C           0                  1         0                0 
#Purchase A,B and D           0                  1         1                0 
#Purchase A only              1                  0         1                0 
#Purchase B only              0                  1         0                2

Hello A.K. ,

Thank you so much for your reply.  The error message was fixed.   One more thing I would like to get your kind instruction.

For "res[2,] <- res[2,]/2", I think you divide the count of customers who purchase both product A and B by 2.  If there are more than two products or more ways of payments, how can R handle?

Is there any other way to run distinct count of customers directly (count customers who purchase product both A and B only one time but not two times)?    Thank you so much for your time and help.


On Wednesday, April 9, 2014 3:47 PM, arun <smartpink111 at yahoo.com> wrote:

datNew <- read.csv("customer_samples.csv",stringsAsFactors=FALSE)

#I could reproduce similar error message with:
dat[] <- lapply(dat,as.factor) 

dat1 <- within(dat, Categ <- ave(Product, Custom, FUN= function(x) if(length(x)>1) "A and B" else x)) 

#Warning messages:
1: In `[<-.factor`(`*tmp*`, i, value = "A and B") : invalid factor level, NA generated
2: In `[<-.factor`(`*tmp*`, i, value = "A and B") : invalid factor level, NA generated
3: In `[<-.factor`(`*tmp*`, i, value = "A and B") : invalid factor level, NA generated
4: In `[<-.factor`(`*tmp*`, i, value = "A and B") : invalid factor level, NA generated 


Hello A.K. ,  Thank you very much for your reply.  I tried the following codes but got some warning messages:  ------------------------- Codes I tried -------------- 
dat <- read.csv ("customer samples.csv")  dat1 <- within(dat, Categ <- ave(Product, Custom, FUN= function(x) if(length(x)>1) "A and B" else x))  library(reshape2)  res <- acast(dat1,Categ~Gender+Payment,length,value.var="Categ") #or dcast()  res[2,] <- res[2,]/2 
res  ---------------------------------  Waring messages I got:  1: In '[<-.factor' ('*tmp*', i, value = "A and B"):  invalid factor level, NA generated  2: In '[<-.factor' ('*tmp*', i, value = "A and B"):  invalid factor level, NA generated  3: In '[<-.factor' ('*tmp*', i, value = "A and B"):  invalid factor level, NA generated  4: In '[<-.factor' ('*tmp*', i, value = "A and B"):  invalid factor level, NA generated  -------------------------------------------------  Could you please help me out?  Thanks a lot! 

On Wednesday, April 9, 2014 12:18 PM, arun <smartpink111 at yahoo.com> wrote:

dat <- structure(list(Custom = c("Judi", "Judi", "Ben", "Tom", "Tom", 
"Bill", "Lindy", "Shary", "Judu", "Judu", "Billy", "Tommy", "Tommy", 
"Benjum", "Linda", "Shiry"), Gender = c("Female", "Female", "Male", 
"Male", "Male", "Male", "Female", "Female", "Female", "Female", 
"Male", "Male", "Male", "Male", "Female", "Female"), Product = c("A", 
"B", "A", "A", "B", "B", "A", "B", "A", "B", "A", "A", "B", "B", 
"A", "B"), Payment = c("Credit Card", "Credit Card", "Cash", 
"Cash", "Cash", "Credit Card", "Cash", "Credit Card", "Credit Card", 
"Credit Card", "Cash", "Cash", "Cash", "Credit Card", "Cash", 
"Credit Card")), .Names = c("Custom", "Gender", "Product", "Payment"
), class = "data.frame", row.names = c(NA, -16L))

 dat1 <- within(dat, Categ <- ave(Product, Custom, FUN= function(x) if(length(x)>1) "A and B" else x))

 res <- acast(dat1,Categ~Gender+Payment,length,value.var="Categ") #or dcast()

res[2,] <- res[2,]/2 


Hello experts, I am a beginner of R and need your kind help for a R question. Any advice will be greatly appreciated. I have a sample data set like below: Customs purchase either product A or B or both using either Credit card or Cash. I would like to summarize the data as a crosstab in R ---- show how many customs purchase product A only or product B only or product A and B using either credit card or cash. Is that possible in R? Thank you very much for your time and help. Customer_Sample.xlsx

More information about the R-help mailing list