[Rd] R package {genetics} function summary.genotype -- H and PIC
return incorrect values. (PR#3642)
gerard.tromp at sanger.med.wayne.edu
gerard.tromp at sanger.med.wayne.edu
Mon Aug 4 05:45:09 MEST 2003
Greetings!
I found a bug in the function summary.genotype (details below). Essentially
it appears that the original code from David Duffy used af, allele
frequency, as a proportion but in the genetics package af is allele
frequency as a count.
I tried to submit a bug report together with the fix to CRAN (below). I am
unsure of the success since the bug report does not appear in the list. I am
therefore sending it by e-mail.
Sincerely,
Gerard
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Gerard Tromp, Ph.D. vox: 313-577-8773
Center for Molecular Medicine and Genetics fax: 313-577-7736
Wayne State Univ. Schl. of Medicine
540 E. Canfield, Detroit, MI 48201 gtromp at cmb.biosci.wayne.edu
mailto:tromp at sanger.med.wayne.edu
http://cmmg.biosci.wayne.edu
In function summary.genotype:
Heterozygosity (H) and PIC are computed using af (allele frequency),
however, af is derived as a frequency count whereas the computation of H and
PIC are in terms of frequency as a proportion.
Using the example data provided (genetics.R) H and PIC return large negative
numbers whereas they are bounded 0,1.
Output from buggy version:
*************************************************************************
Marker: MBP2:C-104T (9q35:-104) Type: SNP
Number persons typed: 100 (100%)
Allele Frequency: (2 alleles)
Count Proportion
C 137 0.69
T 63 0.32
Genotype Frequency:
Count Proportion
C/C 59 0.59
C/T 19 0.19
T/T 22 0.22
Heterozygosity (Hu) = -22967 <------ Impossible
Poly. Inf. Content = -1.5e+08 <------ Impossible
***************************************************************************
Output from fixed version:
***************************************************************************
Marker: MBP2:C-104T (9q35:-104) Type: SNP
Number persons typed: 100 (100%)
Allele Frequency: (2 alleles)
Count Proportion
C 137 0.69
T 63 0.32
Genotype Frequency:
Count Proportion
C/C 59 0.59
C/T 19 0.19
T/T 22 0.22
Heterozygosity (Hu) = 0.44 <--- Correct
Poly. Inf. Content = 0.34 <--- Correct
***************************************************************************
Fix is detailed below.
***************** 2539,2554 c 2539,2565
********************************************
### from code submitted by David Duffy <davidD at qimr.edu.au>
#
n.typed<-sum(gf)
correction<-n.typed/max(1,n.typed-1)
# Gerard Tromp 20030803
# Heterozygosity and PIC reported are incorrect.
# Problem due to use of af which is a count rather than prop.table(af) a
# proportion (fraction) less than 1. Definition of af changed?
#
## Original
# ehet<-(1-sum(af*af))
# the following produces desired result
ehet<-(1-sum(prop.table(af)*prop.table(af)))
## Original
# matings<- (af %*% t(af))^2
# the following produces desired result
matings <- (prop.table(af) %*% t(prop.table(af)))^2
uninf.mating.freq <- sum(matings)-sum(diag(matings))
pic<- ehet - uninf.mating.freq
retval$Hu <- correction * ehet
retval$pic <- pic
retval$n.typed <- n.typed
retval$n.total <- length(object)
retval$nallele <- nallele(object)
#
###
****************************************************************************
******
More information about the R-devel
mailing list