[R] data.table/ifelse conditional new variable question
Jorge I Velez
jorgeivanvelez at gmail.com
Sun Aug 17 03:53:08 CEST 2014
Perhaps I am missing something but I do not get the same result:
x <- read.table(textConnection("Family.ID Sample.ID Relationship
2702 349 mother
2702 3456 sibling
2702 9980 sibling
3064 3 father
3064 4 mother
3064 5 sibling
3064 86 sibling
3064 87 sibling"), header = TRUE)
closeAllConnections()
xs <- with(x, split(x, Family.ID))
res <- do.call(rbind, lapply(xs, function(l){
l$PID <- l$MID <- 0
father <- with(l, Relationship == 'father')
mother <- with(l, Relationship == 'mother')
if(sum(father) == 0)
l$PID[l$Relationship == 'sibling'] <- 0
else l$PID[l$Relationship == 'sibling'] <- l$Sample.ID[father]
if(sum(mother) == 0)
l$MID[l$Relationship == 'sibling'] <- 0
else l$MID[l$Relationship == 'sibling'] <- l$Sample.ID[mother]
l
}))
#Family.ID Sample.ID Relationship MID PID
#2702.1 2702 349 mother 0 0
#2702.2 2702 3456 sibling 349 0
#2702.3 2702 9980 sibling 349 0
#3064.4 3064 3 father 0 0
#3064.5 3064 4 mother 0 0
#3064.6 3064 5 sibling 4 3
#3064.7 3064 86 sibling 4 3
#3064.8 3064 87 sibling 4 3
HTH,
Jorge.-
On Sun, Aug 17, 2014 at 11:47 AM, Kate Ignatius <kate.ignatius at gmail.com>
wrote:
> Yep - you're right - missing parents are indicated as zero in the M/PID
> field.
>
> The above code worked with a few errors:
>
> 1: In l$PID[l$Relationship == "sibling"] <- l$Sample.ID[father] :
> number of items to replace is not a multiple of replacement length
> 2: In l$PID[l$Relationship == "sibling"] <- l$Sample.ID[father] :
> number of items to replace is not a multiple of replacement length
> 3: In l$PID[l$Relationship == "sibling"] <- l$Sample.ID[father] :
> number of items to replace is not a multiple of replacement length
> 4: In l$MID[l$Relationship == "sibling"] <- l$Sample.ID[mother] :
> number of items to replace is not a multiple of replacement length
>
> looking at the output I get numbers where the father/mother ID should
> be in the M/PID field. For example:
>
> 2702 349 mother 0 0
> 2702 3456 sibling 0 842
> 2702 9980 sibling 0 842
> 3064 3 father 0 0
> 3064 4 mother 0 0
> 3064 5 sibling 879 880
> 3064 86 sibling 879 880
> 3064 87 sibling 879 880
>
> On Sat, Aug 16, 2014 at 9:31 PM, Jorge I Velez <jorgeivanvelez at gmail.com>
> wrote:
> > Dear Kate,
> >
> > Try this:
> >
> > res <- do.call(rbind, lapply(xs, function(l){
> > l$PID <- l$MID <- 0
> > father <- with(l, Relationship == 'father')
> > mother <- with(l, Relationship == 'mother')
> > if(sum(father) == 0)
> > l$PID[l$Relationship == 'sibling'] <- 0
> > else l$PID[l$Relationship == 'sibling'] <- l$Sample.ID[father]
> > if(sum(mother) == 0)
> > l$MID[l$Relationship == 'sibling'] <- 0
> > else l$MID[l$Relationship == 'sibling'] <- l$Sample.ID[mother]
> > l
> > }))
> >
> > It is assumed that when either parent is not available the M/PID is 0.
> >
> > Best,
> > Jorge.-
> >
> >
> > On Sun, Aug 17, 2014 at 10:58 AM, Kate Ignatius <kate.ignatius at gmail.com
> >
> > wrote:
> >>
> >> Actually - I didn't check this before, but these are not all nuclear
> >> families (as I assumed they were). That is, some don't have a father
> >> or don't have a mother.... Usually if this is the case PID or MID will
> >> become 0, respectively, for the child. How can the code be edit to
> >> account for this?
> >>
> >> On Sat, Aug 16, 2014 at 8:02 PM, Kate Ignatius <kate.ignatius at gmail.com
> >
> >> wrote:
> >> > Thanks!
> >> >
> >> > I think I know what is being done here but not sure how to fix the
> >> > following error:
> >> >
> >> > Error in l$PID[l$\Relationship == "sibling"] <- l$Sample.ID[father] :
> >> > replacement has length zero
> >> >
> >> >
> >> >
> >> > On Sat, Aug 16, 2014 at 6:48 PM, Jorge I Velez
> >> > <jorgeivanvelez at gmail.com> wrote:
> >> >> Dear Kate,
> >> >>
> >> >> Assuming you have nuclear families, one option would be:
> >> >>
> >> >> x <- read.table(textConnection("Family.ID Sample.ID Relationship
> >> >> 14 62 sibling
> >> >> 14 94 father
> >> >> 14 63 sibling
> >> >> 14 59 mother
> >> >> 17 6004 father
> >> >> 17 6003 mother
> >> >> 17 6005 sibling
> >> >> 17 368 sibling
> >> >> 130 202 mother
> >> >> 130 203 father
> >> >> 130 204 sibling
> >> >> 130 205 sibling
> >> >> 130 206 sibling
> >> >> 222 9 mother
> >> >> 222 45 sibling
> >> >> 222 34 sibling
> >> >> 222 10 sibling
> >> >> 222 11 sibling
> >> >> 222 18 father"), header = TRUE)
> >> >> closeAllConnections()
> >> >>
> >> >> xs <- with(x, split(x, Family.ID))
> >> >> res <- do.call(rbind, lapply(xs, function(l){
> >> >> l$PID <- l$MID <- 0
> >> >> father <- with(l, Relationship == 'father')
> >> >> mother <- with(l, Relationship == 'mother')
> >> >> l$PID[l$Relationship == 'sibling'] <- l$Sample.ID[father]
> >> >> l$MID[l$Relationship == 'sibling'] <- l$Sample.ID[mother]
> >> >> l
> >> >> }))
> >> >> res
> >> >>
> >> >> HTH,
> >> >> Jorge.-
> >> >>
> >> >>
> >> >> Best regards,
> >> >> Jorge.-
> >> >>
> >> >>
> >> >>
> >> >> On Sun, Aug 17, 2014 at 5:42 AM, Kate Ignatius
> >> >> <kate.ignatius at gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> Hi,
> >> >>>
> >> >>> I have a data.table question (as well as if else statement query).
> >> >>>
> >> >>> I have a large list of families (file has 935 individuals that are
> >> >>> sorted by famiy of varying sizes). At the moment the file has the
> >> >>> columns:
> >> >>>
> >> >>> SampleID FamilyID Relationship
> >> >>>
> >> >>> To prevent from having to make a pedigree file by hand - ie adding a
> >> >>> PaternalID and a MaternalID one by one I want to try write a script
> >> >>> that will quickly do this for me (I eventually want to run this
> >> >>> through a program such as plink) Is there a way to use data.table
> >> >>> (maybe in conjucntion with ifelse to do this effectively)?
> >> >>>
> >> >>> An example of the file is something like:
> >> >>>
> >> >>> Family.ID Sample.ID Relationship
> >> >>> 14 62 sibling
> >> >>> 14 94 father
> >> >>> 14 63 sibling
> >> >>> 14 59 mother
> >> >>> 17 6004 father
> >> >>> 17 6003 mother
> >> >>> 17 6005 sibling
> >> >>> 17 368 sibling
> >> >>> 130 202 mother
> >> >>> 130 203 father
> >> >>> 130 204 sibling
> >> >>> 130 205 sibling
> >> >>> 130 206 sibling
> >> >>> 222 9 mother
> >> >>> 222 45 sibling
> >> >>> 222 34 sibling
> >> >>> 222 10 sibling
> >> >>> 222 11 sibling
> >> >>> 222 18 father
> >> >>>
> >> >>> But the goal is to have a file like this:
> >> >>>
> >> >>> Family.ID Sample.ID Relationship PID MID
> >> >>> 14 62 sibling 94 59
> >> >>> 14 94 father 0 0
> >> >>> 14 63 sibling 94 59
> >> >>> 14 59 mother 0 0
> >> >>> 17 6004 father 0 0
> >> >>> 17 6003 mother 0 0
> >> >>> 17 6005 sibling 6004 6003
> >> >>> 17 368 sibling 6004 6003
> >> >>> 130 202 mother 0 0
> >> >>> 130 203 father 0 0
> >> >>> 130 204 sibling 203 202
> >> >>> 130 205 sibling 203 202
> >> >>> 130 206 sibling 203 202
> >> >>> 222 9 mother 0 0
> >> >>> 222 45 sibling 18 9
> >> >>> 222 34 sibling 18 9
> >> >>> 222 10 sibling 18 9
> >> >>> 222 11 sibling 18 9
> >> >>> 222 18 father 0 0
> >> >>>
> >> >>> I've tried searches for this but with no luck. Greatly appreciate
> any
> >> >>> help - even if its just a link to a great example/solution!
> >> >>>
> >> >>> Thanks!
> >> >>>
> >> >>> ______________________________________________
> >> >>> R-help at r-project.org mailing list
> >> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >> >>> PLEASE do read the posting guide
> >> >>> http://www.R-project.org/posting-guide.html
> >> >>> and provide commented, minimal, self-contained, reproducible code.
> >> >>
> >> >>
> >
> >
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list