[R] create group variable -- family data -- for siblings
Gabor Grothendieck
ggrothendieck at gmail.com
Sat Oct 25 18:52:40 CEST 2008
Create a distance metric which is 0 if there are common mothers or
fathers and 1 otherwise using that to cluster your points:
dd <- with(famdat, outer(momid, momid, "!=") * outer(dadid, dadid, "!="))
dd[is.na(dd)] <- 1
hc <- hclust(as.dist(dd))
cutree(hc, h = 0.1)
On Sat, Oct 25, 2008 at 11:08 AM, Juliet Hannah <juliet.hannah at gmail.com> wrote:
> For the following data:
>
> famdat <- read.table(textConnection("ind momid dadid
> 1 18 19
> 2 18 19
> 3 18 19
> 4 21 22
> 5 21 22
> 6 23 25
> 7 23 27
> 8 29 30
> 9 31 30
> 10 40 41
> 11 NA NA
> 12 50 51"),header=TRUE)
> closeAllConnections();
>
> I would like to create a label (1,2,3..) for siblings. Siblings will
> be defined by those who have both the same momid and dadid, but also
> those who
> just have the same momid or the same dadid. In addition, there will be
> those without siblings and those whose parents are missing, and they
> will
> get unique ids. For the data above, the result would be:
>
> ind momid dadid sibid
> 1 1 18 19 1
> 2 2 18 19 1
> 3 3 18 19 1
> 4 4 21 22 2
> 5 5 21 22 2
> 6 6 23 25 3
> 7 7 23 27 3
> 8 8 29 30 4
> 9 9 31 30 4
> 10 10 40 41 5
> 11 11 NA NA 6
> 12 12 50 51 7
>
> Thanks!
>
> Juliet
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list