[R] apply on large arrays

Thu Feb 14 01:41:05 CET 2008

Hmm.  I think this could be faster still:

	tab1 <- with(pisa1, table(CNT,GENDER,ISCOF,ISCOM))
	tab3 <- rowSums(tab1 == 1)

but check it...

Bill Venables
CSIRO Laboratories
PO Box 120, Cleveland, 4163
AUSTRALIA
Office Phone (email preferred): +61 7 3826 7251
Fax (if absolutely necessary):  +61 7 3826 7304
Mobile:                         +61 4 8819 4402
Home Phone:                     +61 7 3286 7700
mailto:Bill.Venables at csiro.au
http://www.cmis.csiro.au/bill.venables/ 

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Venables, Bill (CMIS, Cleveland)
Sent: Thursday, 14 February 2008 10:30 AM
To: erich.neuwirth at univie.ac.at; r-help at stat.math.ethz.ch
Subject: Re: [R] apply on large arrays

Your code is

	tab1 <- with(pisa1, table(CNT,GENDER,ISCOF,ISCOM))
	tab2 <- apply(tab1, 1:4, 
			function(x) ifelse(sum(x) == 1, 1, 0))
	tab3 <- apply(tab2, 1, sum)

As far as I can see, step 2, (the problematic one), merely replaces any
entries in tab1 that are not equal to one by zeros.  I think this would
do the same job a bit faster:

	tab2 <- tab1 <- with(pisa1, table(CNT,GENDER,ISCOF,ISCOM))
	tab2[] <- 0
	tab2[which(tab1 == 1, arr.ind = TRUE)] <- 1
	tab3 <- rowSums(tab2)

If you don't need to keep tab1, you would make things even better by
removing it.

Bill Venables.

Bill Venables
CSIRO Laboratories
PO Box 120, Cleveland, 4163
AUSTRALIA
Office Phone (email preferred): +61 7 3826 7251
Fax (if absolutely necessary):  +61 7 3826 7304
Mobile:                         +61 4 8819 4402
Home Phone:                     +61 7 3286 7700
mailto:Bill.Venables at csiro.au
http://www.cmis.csiro.au/bill.venables/ 

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Erich Neuwirth
Sent: Thursday, 14 February 2008 9:52 AM
To: r-help
Subject: [R] apply on large arrays

I have a big contingency table, approximately of size 60*2*500*500,
and I need to count the number of cells containing a count of 1 for each

of the factors values defining the first dimension.
Here is my attempt:

tab1<-with(pisa1,table(CNT,GENDER,ISCOF,ISCOM))
tab2<-apply(tab1,1:4,function(x)ifelse(sum(x)==1,1,0))
tab3<-apply(tab2,1,sum)

Computing tab2 is very slow.
Is there a faster and/or more elegant way of doing this?
-- 
Erich Neuwirth, University of Vienna
Faculty of Computer Science
Computer Supported Didactics Working Group
Visit our SunSITE at http://sunsite.univie.ac.at
Phone: +43-1-4277-39464 Fax: +43-1-4277-39459

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.