[R] Pedigree / Identifying Immediate Family of Index Animal

Charles C. Berry cberry at tajo.ucsd.edu
Thu Mar 18 18:32:22 CET 2010


On Thu, 18 Mar 2010, Ben Bimber wrote:

> I have a data frame containing the Id, Mother, Father and Sex from about
> 10,000 animals in our colony.  I am interested in graphing simple family
> trees for a given subject or small number of subjects.  The basic idea is:
> start with data frame from entire colony and list of index animals.  I need
> to identify all immediate relatives of these index animals and plot the
> pedigree for them.  We're not trying to do any sort of real analysis, just
> present a visualization of the family structure.  I have used the kinship
> and pedigree packages to plot the pedigree.  My question relates to
> efficiently identifying the animals to include in the pedigree:
>
> Starting with the data frame of ~10,000 records, I want to use a set of
> index animals to extract the immediate relatives and plot only a small
> number in the pedigree.  'Immediate relatives' is somewhat of an ambiguous
> term - I am currently defining it as 3 generations forward and 3 backward.
> Currently, I have a somewhat ugly approach where I recursively calculate
> each generation forward or backward and build a new dataframe.  Is there a
> better approach or package that does this?  I realize my code should be
> written better to get rid of the loops, so if anyone has suggestions there I
> would appreciate this as well.  Thanks in advance.
>

Using an indicator matrix for parent/child relations, you can identify 
future/past generations using matrix multiplication(s).

Since you have 10000 animals, the matrix indicating parents/children will 
be 10000 x 10000, but will have <20000 non-zero elements.

To me, this sounds like a good candidate for a sparse matrix 
representation. Packages 'Matrix' and 'SparseM' provide these.

HTH,

Chuck



> Code to calculate generations forward and backward:
>
> #queryIds holds the unique Ids for parents of the index animals
> queryIds = unique(c(ped$Sire, ped$Dam));
> for(i in 1:gens){
>    if (length(queryIds) == 0){break};
>
>    #allPed is the dataframe with Id,Dam,Sire and Sex for animals in our
> colony
>    newRows <- subset(allPed, Id %in% queryIds);
>    queryIds = c(newRows$Sire, newRows$Dam);
>    ped <- unique(rbind(newRows,ped));
> }
>
>
> #build forwards
> #when calculating children, queryIds holds the Ids of the previous
> generation
> queryIds = unique(ped$Id);
> for(i in 1:gens){
>    if (length(queryIds)==0){break};
>
>    #allPed is the dataframe with Id,Dam,Sire and Sex for animals in our
> colony
>    newRows <- subset(allPed, Sire %in% queryIds | Dam %in% queryIds);
>    queryIds = newRows$Id;
>    ped <- unique(rbind(newRows,ped));
> }
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901



More information about the R-help mailing list