[R] create group variable -- family data -- for siblings
Gabor Grothendieck
ggrothendieck at gmail.com
Sat Oct 25 19:28:19 CEST 2008
Here is one other solution. For each row it finds the
earliest row that has the same momid or popid:
f <- function(i) {
if (is.na(famdat[i, 1]) || is.na(famdat[i, 2])) {
i
} else {
i1 <- match(famdat[i, 1], famdat[1:i, 1])
i2 <- match(famdat[i, 2], famdat[1:i, 2])
min(i1, i2)
}
}
as.numeric(factor(sapply(1:nrow(famdat), f)))
On Sat, Oct 25, 2008 at 12:52 PM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
> Create a distance metric which is 0 if there are common mothers or
> fathers and 1 otherwise using that to cluster your points:
>
> dd <- with(famdat, outer(momid, momid, "!=") * outer(dadid, dadid, "!="))
> dd[is.na(dd)] <- 1
> hc <- hclust(as.dist(dd))
> cutree(hc, h = 0.1)
>
> On Sat, Oct 25, 2008 at 11:08 AM, Juliet Hannah <juliet.hannah at gmail.com> wrote:
>> For the following data:
>>
>> famdat <- read.table(textConnection("ind momid dadid
>> 1 18 19
>> 2 18 19
>> 3 18 19
>> 4 21 22
>> 5 21 22
>> 6 23 25
>> 7 23 27
>> 8 29 30
>> 9 31 30
>> 10 40 41
>> 11 NA NA
>> 12 50 51"),header=TRUE)
>> closeAllConnections();
>>
>> I would like to create a label (1,2,3..) for siblings. Siblings will
>> be defined by those who have both the same momid and dadid, but also
>> those who
>> just have the same momid or the same dadid. In addition, there will be
>> those without siblings and those whose parents are missing, and they
>> will
>> get unique ids. For the data above, the result would be:
>>
>> ind momid dadid sibid
>> 1 1 18 19 1
>> 2 2 18 19 1
>> 3 3 18 19 1
>> 4 4 21 22 2
>> 5 5 21 22 2
>> 6 6 23 25 3
>> 7 7 23 27 3
>> 8 8 29 30 4
>> 9 9 31 30 4
>> 10 10 40 41 5
>> 11 11 NA NA 6
>> 12 12 50 51 7
>>
>> Thanks!
>>
>> Juliet
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
More information about the R-help
mailing list