[R] How to "vectorize" subsetting

Rainer Schuermann Rainer.Schuermann at gmx.net
Wed Aug 14 16:44:31 CEST 2013


I'm sure there are better, more elegant ways avoiding the nested loop I'm suggesting - but if it was my problem, here is what I would do (assuming that my understanding of your question is correct):

### separate function for 'doing something' with the data subset
do.something <- function( qA, qB )
{
	# printing the response subsets as a substitute for your "do something"
	print( qA )
	print( qB )
	# you could use a list here to organise or return your results
}

### vector containing the numbers of the areas
areas <- unique( yrData$Area )

### extract question headers
questions <- colnames( yrData )[ !colnames( yrData ) %in% c( "Area", "Facility" ) ]

### loop through your Area
for( i in areas )
{
	# subset per Area
	yrSubData <- yrData[ yrData$Area == i, ]
	# vector containing the Facilities in that Area
	facilities <- yrSubData$Facility
	# loop through your Facilities
	for( j in facilities )
	{
		# get subsets
		A <- yrSubData[ yrSubData$Facility != j, ]
		B <- yrSubData
		# for each combination of subsets, loop through the questions
		for( q in questions )
		{
			do.something( A[ q ], B[ q ] )
		}
	}
}


Output:

  Q1
2  3
3  1
  Q1
1  2
2  3
3  1
  Q1
1  2
3  1
  Q1
1  2
2  3
3  1
  Q1
1  2
2  3
  Q1
1  2
2  3
3  1
  Q1
5  5
6  2
  Q1
4  4
5  5
6  2
  Q1
4  4
6  2
  Q1
4  4
5  5
6  2
  Q1
4  4
5  5
  Q1
4  4
5  5
6  2

which is what I think should satisfy your need as a first step.

Rgds,
Rainer





On Wednesday 14 August 2013 07:20:24 Derickson, Ryan, VHACIN wrote:
> Hello all, 
> 
> I've tried to solve this for weeks and posted to other forums with
> little success- I'd appreciate any help from anyone. 
> 
> I have survey data grouped by facility and area (area is a collection of
> facilities). Questions are q1-q10. 
> 
> For each facility, I need to subset each item into the facility's
> responses, and the facility's area responses excluding the facility.
> This might illustrate it better:
> 
> Area	Facility		Q1... Q10
> 1	1		2
> 1	2		3
> 1	3		1
> 2	4		4
> 2	5		5
> 2	6		2
> 
> A<- Select Q1 for all Area=1 and Facility!=1; B<- Select Q1 for all
> Facility=1; <do something with A and B>
> A<- Select Q1 for all Area=1 and Facility!=2; B<- Select Q1 for all
> Facility=2; <do something with A and B>
> A<- Select Q1 for all Area=1 and Facility!=3; B<- Select Q1 for all
> Facility=3; <do something with A and B>		
> ...
> A<- Select Q10 for all Area=2 and Facility!=6; B<- Select Q10 for all
> Facility=6; <do something with A and B>	 
> 
> I know how to write the code to manually pull each subset, but I have a
> lot of facilities and areas that get renamed from year to year so I need
> to "vectorize" my code so each subset doesn't have to be explicitly
> called by area or facility name.
> 
> Again, I would be incredibly appreciative of any help. I'm at a
> dead-end. 
> 
> 
> Ryan 
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list