[R] Transforming relational data

mathijsdevaan mathijsdevaan at gmail.com
Tue Feb 22 11:53:38 CET 2011


Hi Matthew, thanks for your help. There are some things going wrong still.
Consider this (slightly extended) example:

library(data.table) 
DT = data.table(read.table(textConnection("    A  B  C 
1 1  a  1999 
2 1  b  1999 
3 1  c  1999 
4 1  d  1999 
5 2  c  2001 
6 2  d  2001 
7 3  a  2004 
8 3  b  2004 
9 3  d  2004
10 4  c  2001
11 4  d  2001"),head=TRUE,stringsAsFactors=FALSE))
firststep = DT[,cbind(A,expand.grid(B,B),v=1/length(B)),by=C][Var1!=Var2]
firststep
      C A Var1 Var2         v
1  1999 1    b    a 0.2500000
2  1999 1    c    a 0.2500000
3  1999 1    d    a 0.2500000
4  1999 1    a    b 0.2500000
5  1999 1    c    b 0.2500000
6  1999 1    d    b 0.2500000
7  1999 1    a    c 0.2500000
8  1999 1    b    c 0.2500000
9  1999 1    d    c 0.2500000
10 1999 1    a    d 0.2500000
11 1999 1    b    d 0.2500000
12 1999 1    c    d 0.2500000
13 2001 2    b    a 0.2500000
14 2001 4    b    a 0.2500000
15 2001 2    a    b 0.2500000
16 2001 4    a    b 0.2500000
17 2001 2    b    a 0.2500000
18 2001 4    b    a 0.2500000
19 2001 2    a    b 0.2500000
20 2001 4    a    b 0.2500000
21 2004 3    b    a 0.3333333
22 2004 3    c    a 0.3333333
23 2004 3    a    b 0.3333333
24 2004 3    c    b 0.3333333
25 2004 3    a    c 0.3333333
26 2004 3    b    c 0.3333333

Following "firststep", project 2 and 4 involved individuals a and b, while
actually c and d were involved. It seems that there is something going wrong
in transforming the data.

Then going to the final result, a list is generated of years and sums of v,
rather than a list of projects and sums of v. Probably I haven't been clear
enough: I want to produce a list of all projects and the familiarity of all
project members involved right before the start of the project.

Example
project_id  familiarity
4  0.25

Members c and d were jointly involved in 3 projects: 1,2,4. Project 4 took
place in 2001, so only project 1 took place before that (1999 (project 2
took place in the same year and is therefore not included). The average
familiarity between the members in project 1 was 1/4, so:    

project_id  familiarity
4  0.25

Thanks!


Matthew Dowle wrote:
> 
> 
> Thanks for the attempt and required output. How about this?
> 
> firststep = DT[,cbind(expand.grid(B,B),v=1/length(B)),by=C][Var1!=Var2]
> setkey(firststep,Var1,Var2,C)
> firststep = firststep[,transform(.SD,cv=cumsum(v)),by=list(Var1,Var2)]
> setkey(firststep,Var1,Var2,C)
> DT[, {x=data.table(expand.grid(B,B),C[1]-1L)
>       firststep[x,roll=TRUE,nomatch=0][,sum(cv)]   # prior familiarity
>      },by=C]
>         C  V1
> [1,] 1999 0.0
> [2,] 2001 0.5
> [3,] 2004 2.5
> 
> I think you may have said you have large data. If so, this
> method should be fast. Please let us know how you get on.
> 
> HTH
> Matthew
> 
> 
> 
> On Thu, 17 Feb 2011 23:07:19 -0800, mathijsdevaan wrote:
> 
>> OK, for the last step I have tried this (among other things):
>> library(data.table)
>> DT = data.table(read.table(textConnection("    A  B  C 1 1  a  1999
>> 2 1  b  1999
>> 3 1  c  1999
>> 4 1  d  1999
>> 5 2  c  2001
>> 6 2  d  2001
>> 7 3  a  2004
>> 8 3  b  2004
>> 9 3  d  2004"),head=TRUE,stringsAsFactors=FALSE))
>> 
>> firststep = DT[,cbind(expand.grid(B,B),v=1/length(B)),by=C][Var1!=Var2]
>> setkey(firststep,Var1,Var2)
>> list1<-firststep[J(expand.grid(DT$B,DT$B),v=1/length(DT$B)),nomatch=0]
> [,sum(v)]
>> list1
>> #27
>> 
>> What I would like to get:
>> list
>> 1  0
>> 2  0.5
>> 3  2.5
>> 
>> Thanks!
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: http://r.789695.n4.nabble.com/Re-Transforming-relational-data-tp3307449p3318939.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list