[R] data manipulation in R

Huntsinger, Reid reid_huntsinger at merck.com
Mon Apr 16 17:16:57 CEST 2001

I may not have properly understood your problem, but if what you want is to
arrive at is a grouping of observers into non-overlapping groups such that
two observers are in the same group exactly when they have observed a common
event, then it boils down to a basic graph problem: find the connected
components of a graph. There are efficient algorithms for this problem but
in my opinion they are more suited to C than R due to the (data-dependent)
way they trace through the data structure (i.e., I can't see a way to
vectorize it). Perhaps other algorithms exist which might be more suitable
for R but I am not aware of them.

What I do is just have a simple stand-alone program to do this, and
communicate with it via flat files or pipes. If the overhead started to
become a problem it might be worthwhile to explore a better "interface" but
I'm not there yet.

I don't know of an existing R package for this or for other common graph
problems, but there are fairly comprehensive C or C++ libraries available.
S. Skiena, The Algorithm Design Manual, is useful and has a lot of
references, see http://www.algorist.com/ for links.

Assumming I've correctly understood the problem, the algorithm you've
sketched seems not to take proper account of observers (nodes) separated by
more than one other observer. After the first two steps, you have for each i
the set N(i) of its neighbors (nodes connected to i by an edge) and from
this data it is essentially the same problem as before to decide which of
these sets overlap etc. In other words you need to iterate these steps as
many times as the length of the longest distance in the graph... This could
be made to work but isn't efficient.

Hope that helps,

Reid Huntsinger

-----Original Message-----
From: Patrick Ball [mailto:lookout_20005 at yahoo.com]
Sent: Sunday, April 15, 2001 5:02 PM
To: r-help at lists.R-project.org
Subject: [R] data manipulation in R

Dear List:

I have a data manipulation problem that I was unable
to solve in R.  I did it in SQL, and it may be that
the solution in R is to do it in SQL, but I wondered
if people could imagine a vector-based solution.  

Imagine a list A[i] of observers who observe some set
of events B[j].  Each observer i may observe one or
more events, and each event j may have been observed
by one or more observers.  Thus the data are a
lower-triangular array AxB where each cell [i,j] has a
zero or one indicating whether observer i saw event j.

I am interested in how observers cluster in circuits
whereby observer _a_ sees events _1,2,3_, observer _b_
sees events _2,4,5_, observer _c_ sees event _4_, and
observer _d_ sees _4,6,7_.  Observers a, b, c, d
comprise a circuit linked by the events they jointly

Given AxB, how can we use R to articulate the
circuits?  Pseudocode for my SQL solution is below.

For each observation i:  
	- get all the events [j1,j2,...jk] observed by i
	- get all the observations [i_m] which observe one or
more events in  [j1,j2,...jk]
	- count [i_m] 
 	- assign events [i_m] to i's circuit if i_m > i
	- end

What would an R solution look like?  The "get all the"
and the "assign events" are pure SQL reasoning
(select, update).  

Thanks in advance.


- PB

Patrick Ball, PhD.
Deputy Director
AAAS Science and Human Rights Program

Do You Yahoo!?
Get email at your own domain with Yahoo! Mail. 
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list