[R] Transforming relational data
Matthew Dowle
mdowle at mdowle.plus.com
Tue Feb 22 13:44:23 CET 2011
With the new example, what is the full output, and
what do you need instead? Was it correct for the
previous example?
Matthew
"mathijsdevaan" <mathijsdevaan at gmail.com> wrote in message
news:1298372018181-3318939.post at n4.nabble.com...
>
> Hi Matthew, thanks for your help. There are some things going wrong still.
> Consider this (slightly extended) example:
>
> library(data.table)
> DT = data.table(read.table(textConnection(" A B C
> 1 1 a 1999
> 2 1 b 1999
> 3 1 c 1999
> 4 1 d 1999
> 5 2 c 2001
> 6 2 d 2001
> 7 3 a 2004
> 8 3 b 2004
> 9 3 d 2004
> 10 4 c 2001
> 11 4 d 2001"),head=TRUE,stringsAsFactors=FALSE))
> firststep = DT[,cbind(A,expand.grid(B,B),v=1/length(B)),by=C][Var1!=Var2]
> firststep
> C A Var1 Var2 v
> 1 1999 1 b a 0.2500000
> 2 1999 1 c a 0.2500000
> 3 1999 1 d a 0.2500000
> 4 1999 1 a b 0.2500000
> 5 1999 1 c b 0.2500000
> 6 1999 1 d b 0.2500000
> 7 1999 1 a c 0.2500000
> 8 1999 1 b c 0.2500000
> 9 1999 1 d c 0.2500000
> 10 1999 1 a d 0.2500000
> 11 1999 1 b d 0.2500000
> 12 1999 1 c d 0.2500000
> 13 2001 2 b a 0.2500000
> 14 2001 4 b a 0.2500000
> 15 2001 2 a b 0.2500000
> 16 2001 4 a b 0.2500000
> 17 2001 2 b a 0.2500000
> 18 2001 4 b a 0.2500000
> 19 2001 2 a b 0.2500000
> 20 2001 4 a b 0.2500000
> 21 2004 3 b a 0.3333333
> 22 2004 3 c a 0.3333333
> 23 2004 3 a b 0.3333333
> 24 2004 3 c b 0.3333333
> 25 2004 3 a c 0.3333333
> 26 2004 3 b c 0.3333333
>
> Following "firststep", project 2 and 4 involved individuals a and b, while
> actually c and d were involved. It seems that there is something going
> wrong
> in transforming the data.
>
> Then going to the final result, a list is generated of years and sums of
> v,
> rather than a list of projects and sums of v. Probably I haven't been
> clear
> enough: I want to produce a list of all projects and the familiarity of
> all
> project members involved right before the start of the project.
>
> Example
> project_id familiarity
> 4 0.25
>
> Members c and d were jointly involved in 3 projects: 1,2,4. Project 4 took
> place in 2001, so only project 1 took place before that (1999 (project 2
> took place in the same year and is therefore not included). The average
> familiarity between the members in project 1 was 1/4, so:
>
> project_id familiarity
> 4 0.25
>
> Thanks!
>
>
> Matthew Dowle wrote:
>>
>>
>> Thanks for the attempt and required output. How about this?
>>
>> firststep = DT[,cbind(expand.grid(B,B),v=1/length(B)),by=C][Var1!=Var2]
>> setkey(firststep,Var1,Var2,C)
>> firststep = firststep[,transform(.SD,cv=cumsum(v)),by=list(Var1,Var2)]
>> setkey(firststep,Var1,Var2,C)
>> DT[, {x=data.table(expand.grid(B,B),C[1]-1L)
>> firststep[x,roll=TRUE,nomatch=0][,sum(cv)] # prior familiarity
>> },by=C]
>> C V1
>> [1,] 1999 0.0
>> [2,] 2001 0.5
>> [3,] 2004 2.5
>>
>> I think you may have said you have large data. If so, this
>> method should be fast. Please let us know how you get on.
>>
>> HTH
>> Matthew
>>
>>
>>
>> On Thu, 17 Feb 2011 23:07:19 -0800, mathijsdevaan wrote:
>>
>>> OK, for the last step I have tried this (among other things):
>>> library(data.table)
>>> DT = data.table(read.table(textConnection(" A B C 1 1 a 1999
>>> 2 1 b 1999
>>> 3 1 c 1999
>>> 4 1 d 1999
>>> 5 2 c 2001
>>> 6 2 d 2001
>>> 7 3 a 2004
>>> 8 3 b 2004
>>> 9 3 d 2004"),head=TRUE,stringsAsFactors=FALSE))
>>>
>>> firststep = DT[,cbind(expand.grid(B,B),v=1/length(B)),by=C][Var1!=Var2]
>>> setkey(firststep,Var1,Var2)
>>> list1<-firststep[J(expand.grid(DT$B,DT$B),v=1/length(DT$B)),nomatch=0]
>> [,sum(v)]
>>> list1
>>> #27
>>>
>>> What I would like to get:
>>> list
>>> 1 0
>>> 2 0.5
>>> 3 2.5
>>>
>>> Thanks!
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Re-Transforming-relational-data-tp3307449p3318939.html
> Sent from the R help mailing list archive at Nabble.com.
>
More information about the R-help
mailing list