[R] data.table/ifelse conditional new variable question

Jorge I Velez jorgeivanvelez at gmail.com
Sun Aug 17 03:53:08 CEST 2014


Perhaps I am missing something but I do not get the same result:

x <- read.table(textConnection("Family.ID Sample.ID Relationship
2702  349       mother
2702  3456  sibling
2702  9980  sibling
3064  3  father
3064  4  mother
3064  5    sibling
3064  86   sibling
3064  87   sibling"), header = TRUE)
closeAllConnections()

xs <- with(x, split(x, Family.ID))
res <- do.call(rbind, lapply(xs, function(l){
 l$PID <- l$MID <- 0
father <- with(l, Relationship == 'father')
 mother <- with(l, Relationship == 'mother')
 if(sum(father) == 0)
l$PID[l$Relationship == 'sibling'] <- 0
 else l$PID[l$Relationship == 'sibling'] <- l$Sample.ID[father]
 if(sum(mother) == 0)
l$MID[l$Relationship == 'sibling'] <- 0
 else l$MID[l$Relationship == 'sibling'] <- l$Sample.ID[mother]
 l
}))
 #Family.ID Sample.ID Relationship MID PID
#2702.1      2702       349       mother   0   0
#2702.2      2702      3456      sibling 349   0
#2702.3      2702      9980      sibling 349   0
#3064.4      3064         3       father   0   0
#3064.5      3064         4       mother   0   0
#3064.6      3064         5      sibling   4   3
#3064.7      3064        86      sibling   4   3
#3064.8      3064        87      sibling   4   3

HTH,
Jorge.-




On Sun, Aug 17, 2014 at 11:47 AM, Kate Ignatius <kate.ignatius at gmail.com>
wrote:

> Yep - you're right - missing parents are indicated as zero in the M/PID
> field.
>
> The above code worked with a few errors:
>
> 1: In l$PID[l$Relationship == "sibling"] <- l$Sample.ID[father] :
>   number of items to replace is not a multiple of replacement length
> 2: In l$PID[l$Relationship == "sibling"] <- l$Sample.ID[father] :
>   number of items to replace is not a multiple of replacement length
> 3: In l$PID[l$Relationship == "sibling"] <- l$Sample.ID[father] :
>   number of items to replace is not a multiple of replacement length
> 4: In l$MID[l$Relationship == "sibling"] <- l$Sample.ID[mother] :
>   number of items to replace is not a multiple of replacement length
>
> looking at the output I get numbers where the father/mother ID should
> be in the M/PID field.  For example:
>
> 2702  349       mother   0   0
> 2702  3456  sibling   0 842
> 2702  9980  sibling   0 842
> 3064  3  father   0   0
> 3064  4  mother   0   0
> 3064  5    sibling 879 880
> 3064  86   sibling 879 880
> 3064  87   sibling 879 880
>
> On Sat, Aug 16, 2014 at 9:31 PM, Jorge I Velez <jorgeivanvelez at gmail.com>
> wrote:
> > Dear Kate,
> >
> > Try this:
> >
> > res <- do.call(rbind, lapply(xs, function(l){
> > l$PID <- l$MID <- 0
> > father <- with(l, Relationship == 'father')
> > mother <- with(l, Relationship == 'mother')
> > if(sum(father) == 0)
> > l$PID[l$Relationship == 'sibling'] <- 0
> > else l$PID[l$Relationship == 'sibling'] <- l$Sample.ID[father]
> > if(sum(mother) == 0)
> > l$MID[l$Relationship == 'sibling'] <- 0
> > else l$MID[l$Relationship == 'sibling'] <- l$Sample.ID[mother]
> > l
> > }))
> >
> > It is assumed that when either parent is not available the M/PID is 0.
> >
> > Best,
> > Jorge.-
> >
> >
> > On Sun, Aug 17, 2014 at 10:58 AM, Kate Ignatius <kate.ignatius at gmail.com
> >
> > wrote:
> >>
> >> Actually - I didn't check this before, but these are not all nuclear
> >> families (as I assumed they were).  That is, some don't have a father
> >> or don't have a mother.... Usually if this is the case PID or MID will
> >> become 0, respectively, for the child.  How can the code be edit to
> >> account for this?
> >>
> >> On Sat, Aug 16, 2014 at 8:02 PM, Kate Ignatius <kate.ignatius at gmail.com
> >
> >> wrote:
> >> > Thanks!
> >> >
> >> > I think I know what is being done here but not sure how to fix the
> >> > following error:
> >> >
> >> > Error in l$PID[l$\Relationship == "sibling"] <- l$Sample.ID[father] :
> >> >   replacement has length zero
> >> >
> >> >
> >> >
> >> > On Sat, Aug 16, 2014 at 6:48 PM, Jorge I Velez
> >> > <jorgeivanvelez at gmail.com> wrote:
> >> >> Dear Kate,
> >> >>
> >> >> Assuming you have nuclear families, one option would be:
> >> >>
> >> >> x <- read.table(textConnection("Family.ID Sample.ID Relationship
> >> >> 14           62  sibling
> >> >> 14          94  father
> >> >> 14           63  sibling
> >> >> 14           59 mother
> >> >> 17         6004  father
> >> >> 17           6003 mother
> >> >> 17         6005   sibling
> >> >> 17         368   sibling
> >> >> 130           202 mother
> >> >> 130           203  father
> >> >> 130           204   sibling
> >> >> 130           205   sibling
> >> >> 130           206   sibling
> >> >> 222         9 mother
> >> >> 222         45  sibling
> >> >> 222         34  sibling
> >> >> 222         10  sibling
> >> >> 222         11  sibling
> >> >> 222         18  father"), header = TRUE)
> >> >> closeAllConnections()
> >> >>
> >> >> xs <- with(x, split(x, Family.ID))
> >> >> res <- do.call(rbind, lapply(xs, function(l){
> >> >> l$PID <- l$MID <- 0
> >> >> father <- with(l, Relationship == 'father')
> >> >> mother <- with(l, Relationship == 'mother')
> >> >> l$PID[l$Relationship == 'sibling'] <- l$Sample.ID[father]
> >> >> l$MID[l$Relationship == 'sibling'] <- l$Sample.ID[mother]
> >> >> l
> >> >> }))
> >> >> res
> >> >>
> >> >> HTH,
> >> >> Jorge.-
> >> >>
> >> >>
> >> >> Best regards,
> >> >> Jorge.-
> >> >>
> >> >>
> >> >>
> >> >> On Sun, Aug 17, 2014 at 5:42 AM, Kate Ignatius
> >> >> <kate.ignatius at gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> Hi,
> >> >>>
> >> >>> I have a data.table question (as well as if else statement query).
> >> >>>
> >> >>> I have a large list of families (file has 935 individuals that are
> >> >>> sorted by famiy of varying sizes).  At the moment the file has the
> >> >>> columns:
> >> >>>
> >> >>> SampleID FamilyID Relationship
> >> >>>
> >> >>> To prevent from having to make a pedigree file by hand - ie adding a
> >> >>> PaternalID and a MaternalID one by one I want to try write a script
> >> >>> that will quickly do this for me  (I eventually want to run this
> >> >>> through a program such as plink)   Is there a way to use data.table
> >> >>> (maybe in conjucntion with ifelse to do this effectively)?
> >> >>>
> >> >>> An example of the file is something like:
> >> >>>
> >> >>> Family.ID Sample.ID Relationship
> >> >>> 14           62  sibling
> >> >>> 14          94  father
> >> >>> 14           63  sibling
> >> >>> 14           59 mother
> >> >>> 17         6004  father
> >> >>> 17           6003 mother
> >> >>> 17         6005   sibling
> >> >>> 17         368   sibling
> >> >>> 130           202 mother
> >> >>> 130           203  father
> >> >>> 130           204   sibling
> >> >>> 130           205   sibling
> >> >>> 130           206   sibling
> >> >>> 222         9 mother
> >> >>> 222         45  sibling
> >> >>> 222         34  sibling
> >> >>> 222         10  sibling
> >> >>> 222         11  sibling
> >> >>> 222         18  father
> >> >>>
> >> >>> But the goal is to have a file like this:
> >> >>>
> >> >>> Family.ID Sample.ID Relationship PID MID
> >> >>> 14           62  sibling 94 59
> >> >>> 14          94  father 0 0
> >> >>> 14           63  sibling 94 59
> >> >>> 14           59 mother 0 0
> >> >>> 17         6004  father 0 0
> >> >>> 17           6003 mother 0 0
> >> >>> 17         6005   sibling 6004 6003
> >> >>> 17         368   sibling 6004 6003
> >> >>> 130           202 mother 0 0
> >> >>> 130           203  father 0 0
> >> >>> 130           204   sibling 203 202
> >> >>> 130           205   sibling 203 202
> >> >>> 130           206   sibling 203 202
> >> >>> 222         9 mother 0 0
> >> >>> 222         45  sibling 18 9
> >> >>> 222         34  sibling 18 9
> >> >>> 222         10  sibling 18 9
> >> >>> 222         11  sibling 18 9
> >> >>> 222         18  father 0 0
> >> >>>
> >> >>> I've tried searches for this but with no luck.  Greatly appreciate
> any
> >> >>> help - even if its just a link to a great example/solution!
> >> >>>
> >> >>> Thanks!
> >> >>>
> >> >>> ______________________________________________
> >> >>> R-help at r-project.org mailing list
> >> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >> >>> PLEASE do read the posting guide
> >> >>> http://www.R-project.org/posting-guide.html
> >> >>> and provide commented, minimal, self-contained, reproducible code.
> >> >>
> >> >>
> >
> >
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list