[R] create group variable -- family data -- for siblings
Gabor Grothendieck
ggrothendieck at gmail.com
Sat Oct 25 19:56:03 CEST 2008
Correction and shortening:
f <- function(i) {
i1 <- if (is.na(famdat[i, 2])) i else match(famdat[i, 2], famdat[1:i, 2])
i2 <- if (is.na(famdat[i, 3])) i else match(famdat[i, 3], famdat[1:i, 3])
min(i1, i2)
}
as.numeric(factor(sapply(1:nrow(famdat), f)))
On Sat, Oct 25, 2008 at 1:28 PM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
> Here is one other solution. For each row it finds the
> earliest row that has the same momid or popid:
>
>
> f <- function(i) {
> if (is.na(famdat[i, 1]) || is.na(famdat[i, 2])) {
> i
> } else {
> i1 <- match(famdat[i, 1], famdat[1:i, 1])
> i2 <- match(famdat[i, 2], famdat[1:i, 2])
> min(i1, i2)
> }
> }
> as.numeric(factor(sapply(1:nrow(famdat), f)))
>
>
> On Sat, Oct 25, 2008 at 12:52 PM, Gabor Grothendieck
> <ggrothendieck at gmail.com> wrote:
>> Create a distance metric which is 0 if there are common mothers or
>> fathers and 1 otherwise using that to cluster your points:
>>
>> dd <- with(famdat, outer(momid, momid, "!=") * outer(dadid, dadid, "!="))
>> dd[is.na(dd)] <- 1
>> hc <- hclust(as.dist(dd))
>> cutree(hc, h = 0.1)
>>
>> On Sat, Oct 25, 2008 at 11:08 AM, Juliet Hannah <juliet.hannah at gmail.com> wrote:
>>> For the following data:
>>>
>>> famdat <- read.table(textConnection("ind momid dadid
>>> 1 18 19
>>> 2 18 19
>>> 3 18 19
>>> 4 21 22
>>> 5 21 22
>>> 6 23 25
>>> 7 23 27
>>> 8 29 30
>>> 9 31 30
>>> 10 40 41
>>> 11 NA NA
>>> 12 50 51"),header=TRUE)
>>> closeAllConnections();
>>>
>>> I would like to create a label (1,2,3..) for siblings. Siblings will
>>> be defined by those who have both the same momid and dadid, but also
>>> those who
>>> just have the same momid or the same dadid. In addition, there will be
>>> those without siblings and those whose parents are missing, and they
>>> will
>>> get unique ids. For the data above, the result would be:
>>>
>>> ind momid dadid sibid
>>> 1 1 18 19 1
>>> 2 2 18 19 1
>>> 3 3 18 19 1
>>> 4 4 21 22 2
>>> 5 5 21 22 2
>>> 6 6 23 25 3
>>> 7 7 23 27 3
>>> 8 8 29 30 4
>>> 9 9 31 30 4
>>> 10 10 40 41 5
>>> 11 11 NA NA 6
>>> 12 12 50 51 7
>>>
>>> Thanks!
>>>
>>> Juliet
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>
More information about the R-help
mailing list