[R] Transforming relational data

mathijsdevaan mathijsdevaan at gmail.com
Thu Feb 17 17:31:29 CET 2011


Thanks for helping me out so generously. After reading the vignettes and the
other info I still have a question (sorry I am a R novice):

I am not so much trying to construct time series (although it comes very
close). Rather for each pair (Bi,Bj) in project (An) I am trying to sum up
the values of v for (Bi,Bj) where C<focal C. One remark here: some pairs
(Bi,Bj) are involved in more than one project per year. Because I cannot see
which of these projects was initiated first I only want to sum the values of
v for (Bi,Bj) where C<focal C (versus C=focal C). So far, I've executed the
first step and set the key. I don't think I have to permutate the
project-people data again, because that's already in firststep. Ideally, I
would like to add a column to the firststep data.table containing the sum of
v for (Bi,Bj) where C<focal C. Any suggestions? Thanks in advance!

Best,

Mathijs      

>Hello. One (of many) solution might be:

>require(data.table)
>DT = data.table(read.table(textConnection("    A  B  C
>1 1  a  1999
>2 1  b  1999
>3 1  c  1999
>4 1  d  1999
>5 2  c  2001
>6 2  d  2001"),head=TRUE,stringsAsFactors=FALSE))

>firststep = DT[,cbind(expand.grid(B,B),v=1/length(B)),by=C][Var1!=Var2]
>setkey(firststep,Var1,Var2)
>grp3 = c("a","b","d")
>firststep[J(expand.grid(grp3,grp3)),nomatch=0][,sum(v)]
># 2.5

>If I guess the bigger picture correctly, this can be extended
>to make a time series of prior familiarity by including
>the year in the key.

>If you decide to try this, please make sure to grab the latest
>(recent) version of data.table from CRAN (v1.5.3). Suggest that
>you run it first to confirm it does return 2.5, then break it
>down and run it step by step to see how each part works. You
>will need some time to read the vignettes and ?data.table
>(which has recently been improved) but I hope you think it is
>worth it. Support is available at maintainer("data.table").

>HTH
>Matthew


>>On Mon, 14 Feb 2011 09:22:12 -0800, mathijsdevaan wrote:
>> Hi,
>>
>> I have a large dataset with info on individuals (B) that have been
>> involved in projects (A) during multiple years (C). The dataset contains
>> three columns: A, B, C. Example:
>>
>>    A  B  C
>> 1 1  a  1999
>> 2 1  b  1999
>>3 1  c  1999
>> 4 1  d  1999
>>5 2  c  2001
>>6 2  d  2001
>> 7 3  a  2004
>> 8 3  c  2004
>> 9 3  d  2004
>>
>> I am interested in how well all the individuals in a project know each
>> other. To calculate this team familiarity measure I want to sum the
>> familiarity between all individual pairs in a team. The familiarity
>> between each individual pair in a team is calculated as the summation of
>> each pair's prior co-appearance in a project divided by the total number
>> of team members. So the team familiarity in project 3 = (1/4+1/4) +
>> (1/4+1/4+1/2) + (1/4+1/4+1/2) = 2,5 or a has been in project 1 (of size
>> 4) with c and d > 1/4+1/4 and c has been in project 1 (of size 4) with 1
>> and d > 1/4+1/4 and c has been in project 2 (of size 2) with d > 1/2.
>>
>> I think that the best way to do it is to transform the data into an
>> edgelist (each pair in one row/two columns) and then creating two
>> additional columns for the strength of the familiarity and the year of
>> the project in which the pair was active. The problem is that I am stuck
>> already in the first step. So the question is: how do I go from the
>> current data structure to a list of projects and the familiarity of its
>> team members?
>>
>> Your help is very much appreciated. Thanks!

-- 
View this message in context: http://r.789695.n4.nabble.com/Re-Transforming-relational-data-tp3307449p3311101.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list