[R] create group variable -- family data -- for siblings

Gabor Grothendieck ggrothendieck at gmail.com
Sat Oct 25 18:52:40 CEST 2008


Create a distance metric which is 0 if there are common mothers or
fathers and 1 otherwise using that to cluster your points:

dd <- with(famdat, outer(momid, momid, "!=") * outer(dadid, dadid, "!="))
dd[is.na(dd)] <- 1
hc <- hclust(as.dist(dd))
cutree(hc, h = 0.1)

On Sat, Oct 25, 2008 at 11:08 AM, Juliet Hannah <juliet.hannah at gmail.com> wrote:
> For the following data:
>
> famdat <- read.table(textConnection("ind momid dadid
> 1   18    19
> 2   18    19
> 3   18    19
> 4   21    22
> 5   21    22
> 6   23    25
> 7   23    27
> 8   29    30
> 9   31    30
> 10  40    41
> 11  NA    NA
> 12  50    51"),header=TRUE)
> closeAllConnections();
>
> I would like to create a label (1,2,3..) for siblings. Siblings will
> be defined by those who have both the same momid and dadid, but also
> those who
> just have the same momid or the same dadid. In addition, there will be
> those without siblings and those whose parents are missing, and they
> will
> get unique ids. For the data above, the result would be:
>
>   ind momid dadid sibid
> 1    1    18    19      1
> 2    2    18    19      1
> 3    3    18    19      1
> 4    4    21    22      2
> 5    5    21    22      2
> 6    6    23    25      3
> 7    7    23    27      3
> 8    8    29    30      4
> 9    9    31    30      4
> 10  10    40    41     5
> 11  11    NA    NA   6
> 12  12    50    51     7
>
> Thanks!
>
> Juliet
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list