[Rd] Genetics package: Heterozygosity and PIC incorrect (PR#3639)

gerard.tromp at sanger.med.wayne.edu gerard.tromp at sanger.med.wayne.edu
Mon Aug 4 01:03:00 MEST 2003


Full_Name: Gerard Tromp
Version: 1.7.0
OS: Windows 2000
Submission from: (NULL) (146.9.5.86)


In function summary.genotype:
Heterozygosity (H) and PIC are computed using af (allele frequency), however, af
is derived as a frequency count whereas the computation of H and PIC are in
terms of frequency as a proportion. 

Using the example data provided (genetics.R) H and PIC return large negative
numbers whereas they are bounded 0,1.

Output from buggy version:
*************************************************************************
Marker: MBP2:C-104T (9q35:-104) Type: SNP

Number persons typed: 100 (100%)

Allele Frequency: (2 alleles)
  Count Proportion
C   137       0.69
T    63       0.32


Genotype Frequency:
    Count Proportion
C/C    59       0.59
C/T    19       0.19
T/T    22       0.22

Heterozygosity (Hu)  = -22967    <------ Impossible
Poly. Inf. Content   = -1.5e+08  <------ Impossible
***************************************************************************

Output from fixed version:
***************************************************************************
Marker: MBP2:C-104T (9q35:-104) Type: SNP

Number persons typed: 100 (100%)

Allele Frequency: (2 alleles)
  Count Proportion
C   137       0.69
T    63       0.32


Genotype Frequency:
    Count Proportion
C/C    59       0.59
C/T    19       0.19
T/T    22       0.22

Heterozygosity (Hu)  = 0.44     <--- Correct
Poly. Inf. Content   = 0.34     <--- Correct
***************************************************************************




Fix is detailed below.
***************** 2539,2554 c 2539,2565
********************************************
    ### from code submitted by David Duffy <davidD at qimr.edu.au>
    #
    n.typed<-sum(gf)
    correction<-n.typed/max(1,n.typed-1)
#  Gerard Tromp 20030803 
#  Heterozygosity and PIC reported are incorrect.
#  Problem due to use of af which is a count rather than prop.table(af) a 
#    proportion (fraction) less than 1. Definition of af changed?
#    
##  Original  
#    ehet<-(1-sum(af*af))
#   the following produces desired result
    ehet<-(1-sum(prop.table(af)*prop.table(af)))
##  Original  
#    matings<- (af %*% t(af))^2
#   the following produces desired result
    matings <- (prop.table(af) %*% t(prop.table(af)))^2
    uninf.mating.freq <- sum(matings)-sum(diag(matings))
    pic<- ehet - uninf.mating.freq

    retval$Hu <- correction * ehet
    retval$pic <- pic
    retval$n.typed <- n.typed
    retval$n.total <- length(object)
    retval$nallele <- nallele(object)
    #
    ###



More information about the R-devel mailing list