[R] data.table/ifelse conditional new variable question

Kate Ignatius kate.ignatius at gmail.com
Sun Aug 17 04:02:55 CEST 2014


Actually - your code is not wrong... because this is a large file I
went through the file to see if there was anything wrong with it -
looks like there are two fathers or three mothers in some families.
Taking these duplicates out fixed the problem.

Sorry about the confusion!  And thanks so much for your help!

On Sat, Aug 16, 2014 at 9:53 PM, Jorge I Velez <jorgeivanvelez at gmail.com> wrote:
> Perhaps I am missing something but I do not get the same result:
>
> x <- read.table(textConnection("Family.ID Sample.ID Relationship
> 2702  349       mother
> 2702  3456  sibling
> 2702  9980  sibling
> 3064  3  father
> 3064  4  mother
> 3064  5    sibling
> 3064  86   sibling
> 3064  87   sibling"), header = TRUE)
> closeAllConnections()
>
> xs <- with(x, split(x, Family.ID))
> res <- do.call(rbind, lapply(xs, function(l){
> l$PID <- l$MID <- 0
> father <- with(l, Relationship == 'father')
> mother <- with(l, Relationship == 'mother')
> if(sum(father) == 0)
> l$PID[l$Relationship == 'sibling'] <- 0
> else l$PID[l$Relationship == 'sibling'] <- l$Sample.ID[father]
> if(sum(mother) == 0)
> l$MID[l$Relationship == 'sibling'] <- 0
> else l$MID[l$Relationship == 'sibling'] <- l$Sample.ID[mother]
> l
> }))
> #Family.ID Sample.ID Relationship MID PID
> #2702.1      2702       349       mother   0   0
> #2702.2      2702      3456      sibling 349   0
> #2702.3      2702      9980      sibling 349   0
> #3064.4      3064         3       father   0   0
> #3064.5      3064         4       mother   0   0
> #3064.6      3064         5      sibling   4   3
> #3064.7      3064        86      sibling   4   3
> #3064.8      3064        87      sibling   4   3
>
> HTH,
> Jorge.-
>
>
>
>
> On Sun, Aug 17, 2014 at 11:47 AM, Kate Ignatius <kate.ignatius at gmail.com>
> wrote:
>>
>> Yep - you're right - missing parents are indicated as zero in the M/PID
>> field.
>>
>> The above code worked with a few errors:
>>
>> 1: In l$PID[l$Relationship == "sibling"] <- l$Sample.ID[father] :
>>   number of items to replace is not a multiple of replacement length
>> 2: In l$PID[l$Relationship == "sibling"] <- l$Sample.ID[father] :
>>   number of items to replace is not a multiple of replacement length
>> 3: In l$PID[l$Relationship == "sibling"] <- l$Sample.ID[father] :
>>   number of items to replace is not a multiple of replacement length
>> 4: In l$MID[l$Relationship == "sibling"] <- l$Sample.ID[mother] :
>>   number of items to replace is not a multiple of replacement length
>>
>> looking at the output I get numbers where the father/mother ID should
>> be in the M/PID field.  For example:
>>
>> 2702  349       mother   0   0
>> 2702  3456  sibling   0 842
>> 2702  9980  sibling   0 842
>> 3064  3  father   0   0
>> 3064  4  mother   0   0
>> 3064  5    sibling 879 880
>> 3064  86   sibling 879 880
>> 3064  87   sibling 879 880
>>
>> On Sat, Aug 16, 2014 at 9:31 PM, Jorge I Velez <jorgeivanvelez at gmail.com>
>> wrote:
>> > Dear Kate,
>> >
>> > Try this:
>> >
>> > res <- do.call(rbind, lapply(xs, function(l){
>> > l$PID <- l$MID <- 0
>> > father <- with(l, Relationship == 'father')
>> > mother <- with(l, Relationship == 'mother')
>> > if(sum(father) == 0)
>> > l$PID[l$Relationship == 'sibling'] <- 0
>> > else l$PID[l$Relationship == 'sibling'] <- l$Sample.ID[father]
>> > if(sum(mother) == 0)
>> > l$MID[l$Relationship == 'sibling'] <- 0
>> > else l$MID[l$Relationship == 'sibling'] <- l$Sample.ID[mother]
>> > l
>> > }))
>> >
>> > It is assumed that when either parent is not available the M/PID is 0.
>> >
>> > Best,
>> > Jorge.-
>> >
>> >
>> > On Sun, Aug 17, 2014 at 10:58 AM, Kate Ignatius
>> > <kate.ignatius at gmail.com>
>> > wrote:
>> >>
>> >> Actually - I didn't check this before, but these are not all nuclear
>> >> families (as I assumed they were).  That is, some don't have a father
>> >> or don't have a mother.... Usually if this is the case PID or MID will
>> >> become 0, respectively, for the child.  How can the code be edit to
>> >> account for this?
>> >>
>> >> On Sat, Aug 16, 2014 at 8:02 PM, Kate Ignatius
>> >> <kate.ignatius at gmail.com>
>> >> wrote:
>> >> > Thanks!
>> >> >
>> >> > I think I know what is being done here but not sure how to fix the
>> >> > following error:
>> >> >
>> >> > Error in l$PID[l$\Relationship == "sibling"] <- l$Sample.ID[father] :
>> >> >   replacement has length zero
>> >> >
>> >> >
>> >> >
>> >> > On Sat, Aug 16, 2014 at 6:48 PM, Jorge I Velez
>> >> > <jorgeivanvelez at gmail.com> wrote:
>> >> >> Dear Kate,
>> >> >>
>> >> >> Assuming you have nuclear families, one option would be:
>> >> >>
>> >> >> x <- read.table(textConnection("Family.ID Sample.ID Relationship
>> >> >> 14           62  sibling
>> >> >> 14          94  father
>> >> >> 14           63  sibling
>> >> >> 14           59 mother
>> >> >> 17         6004  father
>> >> >> 17           6003 mother
>> >> >> 17         6005   sibling
>> >> >> 17         368   sibling
>> >> >> 130           202 mother
>> >> >> 130           203  father
>> >> >> 130           204   sibling
>> >> >> 130           205   sibling
>> >> >> 130           206   sibling
>> >> >> 222         9 mother
>> >> >> 222         45  sibling
>> >> >> 222         34  sibling
>> >> >> 222         10  sibling
>> >> >> 222         11  sibling
>> >> >> 222         18  father"), header = TRUE)
>> >> >> closeAllConnections()
>> >> >>
>> >> >> xs <- with(x, split(x, Family.ID))
>> >> >> res <- do.call(rbind, lapply(xs, function(l){
>> >> >> l$PID <- l$MID <- 0
>> >> >> father <- with(l, Relationship == 'father')
>> >> >> mother <- with(l, Relationship == 'mother')
>> >> >> l$PID[l$Relationship == 'sibling'] <- l$Sample.ID[father]
>> >> >> l$MID[l$Relationship == 'sibling'] <- l$Sample.ID[mother]
>> >> >> l
>> >> >> }))
>> >> >> res
>> >> >>
>> >> >> HTH,
>> >> >> Jorge.-
>> >> >>
>> >> >>
>> >> >> Best regards,
>> >> >> Jorge.-
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Sun, Aug 17, 2014 at 5:42 AM, Kate Ignatius
>> >> >> <kate.ignatius at gmail.com>
>> >> >> wrote:
>> >> >>>
>> >> >>> Hi,
>> >> >>>
>> >> >>> I have a data.table question (as well as if else statement query).
>> >> >>>
>> >> >>> I have a large list of families (file has 935 individuals that are
>> >> >>> sorted by famiy of varying sizes).  At the moment the file has the
>> >> >>> columns:
>> >> >>>
>> >> >>> SampleID FamilyID Relationship
>> >> >>>
>> >> >>> To prevent from having to make a pedigree file by hand - ie adding
>> >> >>> a
>> >> >>> PaternalID and a MaternalID one by one I want to try write a script
>> >> >>> that will quickly do this for me  (I eventually want to run this
>> >> >>> through a program such as plink)   Is there a way to use data.table
>> >> >>> (maybe in conjucntion with ifelse to do this effectively)?
>> >> >>>
>> >> >>> An example of the file is something like:
>> >> >>>
>> >> >>> Family.ID Sample.ID Relationship
>> >> >>> 14           62  sibling
>> >> >>> 14          94  father
>> >> >>> 14           63  sibling
>> >> >>> 14           59 mother
>> >> >>> 17         6004  father
>> >> >>> 17           6003 mother
>> >> >>> 17         6005   sibling
>> >> >>> 17         368   sibling
>> >> >>> 130           202 mother
>> >> >>> 130           203  father
>> >> >>> 130           204   sibling
>> >> >>> 130           205   sibling
>> >> >>> 130           206   sibling
>> >> >>> 222         9 mother
>> >> >>> 222         45  sibling
>> >> >>> 222         34  sibling
>> >> >>> 222         10  sibling
>> >> >>> 222         11  sibling
>> >> >>> 222         18  father
>> >> >>>
>> >> >>> But the goal is to have a file like this:
>> >> >>>
>> >> >>> Family.ID Sample.ID Relationship PID MID
>> >> >>> 14           62  sibling 94 59
>> >> >>> 14          94  father 0 0
>> >> >>> 14           63  sibling 94 59
>> >> >>> 14           59 mother 0 0
>> >> >>> 17         6004  father 0 0
>> >> >>> 17           6003 mother 0 0
>> >> >>> 17         6005   sibling 6004 6003
>> >> >>> 17         368   sibling 6004 6003
>> >> >>> 130           202 mother 0 0
>> >> >>> 130           203  father 0 0
>> >> >>> 130           204   sibling 203 202
>> >> >>> 130           205   sibling 203 202
>> >> >>> 130           206   sibling 203 202
>> >> >>> 222         9 mother 0 0
>> >> >>> 222         45  sibling 18 9
>> >> >>> 222         34  sibling 18 9
>> >> >>> 222         10  sibling 18 9
>> >> >>> 222         11  sibling 18 9
>> >> >>> 222         18  father 0 0
>> >> >>>
>> >> >>> I've tried searches for this but with no luck.  Greatly appreciate
>> >> >>> any
>> >> >>> help - even if its just a link to a great example/solution!
>> >> >>>
>> >> >>> Thanks!
>> >> >>>
>> >> >>> ______________________________________________
>> >> >>> R-help at r-project.org mailing list
>> >> >>> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> >>> PLEASE do read the posting guide
>> >> >>> http://www.R-project.org/posting-guide.html
>> >> >>> and provide commented, minimal, self-contained, reproducible code.
>> >> >>
>> >> >>
>> >
>> >
>
>



More information about the R-help mailing list