[R] Loop for CrossTable (gmodels)

Sun May 12 20:03:32 CEST 2013

On May 12, 2013, at 10:44 AM, David Winsemius wrote:

> 
> On May 12, 2013, at 10:30 AM, arun wrote:
> 
>> Hi,
>> According to the error, the variables should have the same length.
>> For example:
>> set.seed(24)
>> dat1<- cbind(RACE=sample(1:10,10,replace=TRUE),as.data.frame(matrix(sample(1:100,20*10,replace=TRUE),ncol=20)))
>> lapply(dat1[,-1],function(x) CrossTable(x,dat1$RACE,format="SPSS",prop.chisq=FALSE,digits=2,dnn=c("VAR","RACE"))) # prints cross tables.
>> 
>> #or
>> lapply(names(dat1)[-1],function(x) CrossTable(dat1[,x],dat1[,"RACE"],format="SPSS",prop.chisq=FALSE,digits=2,dnn=c(x,"RACE")))
>> A.K.
>>> Hi, 
>>> 
>>> I have 20 variables in a data frame (VAR1 ... VAR20) which I would like to crosstab against the variable RACE. 
>>> I would like to use a loop structure instead of 20 statements like: 
>>> CrossTable(VAR1, RACE, format = "SPSS", prop.chisq = FALSE, digits = 2) 
>>> 
>>> 
>>> I have tried following syntax, but failed: 
>>> 
>>> library(gmodels) 
>>> for(i in 1:20){ 
>>> columnname <- ("VAR",i) 
>>> CrossTable(columnname, RACE, format = "SPSS", prop.chisq = FALSE, digits = 2) 
>>> } 
>>> 
>>> I receive following Error: 
>>> Error in CrossTable(columnname, RACE, format = "SPSS", prop.chisq = FALSE,  : 
>>> x and y must have the same length 
> 
> It might be productive in learning R to understand what you were doing wrong and how you could have used that control for-loop structure. It does appear that you have `attach`-ed a data.frame and are referring to the column names. Yes? If so, you should realize that is not a particularly safe practice, but let's push on.
> 
> columnname is just  a character vector with a single element, "VAR1" the first time around. R does not do a double-evaluation to first figure out that `columnname` is "VAR1" and then proceed further to look up its value. To do that you would need to add `get`:
> 
> for(i in 1:20){ 
>   columnname <- ("VAR",i) 
>   CrossTable( get(columnname), RACE, format = "SPSS", prop.chisq = FALSE, digits = 2) 
> } 
> 
> The get function does the extra step of converting the character value to an object name and returning the value of that named data-object.
> 
>>> Any idea how to get 20 crosstables within a loop? 
> 
> That should do it. It would have been better if you had used dput() to produce a workable small example of a few of the columns.

(I probably should have made it clear that the "you" I was addressing was Stefan, whose message has not yet shown up on my mail-client but to whom Arun was responding to the list. I see a lot of messages from Arun that are responses to messages that never make it to the list. I guessed (correctly) he was replying to Nabble postings that are blocked because of the filters on the Nabble-spam-conduit. So in this case the original message may never make it to the archives, because I just cleared the moderation queue of a single spam message and did not see the original posting.)

Actually that single revision to the argument to CrossTable won't do it. When I built my own data example, I also find that Stefan failed to properly construct `columnname` using paste0 and didn't do anything with the value of CrossTable. Since for() is a function, the value needs to either be printed or saved as something.

res<-list()
for(i in 1:3){ 
  columnname <- paste0("VAR",i) 
  res[[i]] <- CrossTable( get(columnname), RACE, format = "SPSS", prop.chisq = FALSE, digits = 2) 
} 

res[[1]]

   Cell Contents
|-------------------------|
|                   Count |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|-------------------------|

Total Observations in Table:  10 

                | RACE 
get(columnname) |        1  |        2  |        3  | Row Total | 
----------------|-----------|-----------|-----------|-----------|
              3 |        0  |        0  |        1  |        1  | 
                |     0.00% |     0.00% |   100.00% |    10.00% | 
                |     0.00% |     0.00% |    20.00% |           | 
                |     0.00% |     0.00% |    10.00% |           | 
----------------|-----------|-----------|-----------|-----------|
              4 |        0  |        0  |        2  |        2  | 
                |     0.00% |     0.00% |   100.00% |    20.00% | 
                |     0.00% |     0.00% |    40.00% |           | 
                |     0.00% |     0.00% |    20.00% |           | 
----------------|-----------|-----------|-----------|-----------|
              7 |        0  |        0  |        1  |        1  | 
                |     0.00% |     0.00% |   100.00% |    10.00% | 

---- snipped rest of output.

> 
> David Winsemius
> Alameda, CA, USA
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA