[R] R genetics package now available
Warnes, Gregory R
gregory_r_warnes at groton.pfizer.com
Wed Nov 27 15:34:31 CET 2002
The "genetics" package for handling single-locus genetic data is now
available on CRAN in both source and Windows binary formats. The purpose of
this package is to make it easy to create and manipulate genetic
information, and to facility use of this information in statistical models.
The library includes classes and methods for creating, representing, and
manipulating genotypes (unordered allele pairs) and haplotypes (ordered
allele pairs). Genotypes and
haplotypes can be annotated with chromosome, locus, gene, and marker
information. Utility functions compute genotype and allele frequencies, flag
homozygotes or heterozygotes, flag allele carriers
of certain alleles, count the number of a specific allele carried by an
individual, extract one or both alleles, estimate and generate confidence
intervals for measures of single-marker disequlibrium, and test for
departure from Hardy-Weinberg equilibrium.
The package description file and a simple example are appended below.
Comments and contributions are, of course, welcome.
-Greg
DESCRIPTION
===========
Package: genetics
Title: Population Genetics
Version: 0.6.4
Date: 2002-11-13
Author: Gregory Warnes and Friedrich Leisch
Maintainer: Gregory Warnes <gregory_r_warnes at groton.pfizer.com>
Depends: combinat
Description: Classes and methods for handling genetic data. Includes
classes to represent genotypes and haplotypes at single
markers up to multiple markers on multiple chromosomes.
Function include allele frequencies, flagging
homo/heterozygotes, flagging carriers of certain alleles,
computing disequlibrium, testing Hardy-Weinberg equilibrium,
...
License: GPL
Built: R 1.6.0; sparc-sun-solaris2.8; Tue Nov 12 15:43:20 EST 2002
Index:
HWE.test Estimate Disequlibrium and Test for
Hardy-Weinberg Equilibrium
ci.balance Experimental Function to Correct Confidence
Intervals At or Near Boundaries of the
Parameter Space by 'Sliding' the Interval on
the Quantile Scale.
diseq Estimate or Compute Confidence Interval for the
Disequlibrium Parameter
genotype Genotype or Haplotype Objects.
homozygote Extract Features of Genotype objects
locus Create and Manipulate Locus, Gene, and Marker
Objects
summary.genotype Allele and Genotype Frequency from a Genotype
or Haplotype Object
undocumented Undocumented functions
SIMPLE EXAMPLE
==============
Attaching package `genetics':
The following object(s) are masked from package:base :
as.factor
> ## Create a sample dataset with 3 SNP markers
>
> g1 <- sample( x=c('C/C', 'C/T', 'T/T'),
+ prob=c(.6,.2,.2), 20, replace=T)
> g2 <- sample( x=c('A/A', 'A/G', 'G/G'),
+ prob=c(.6,.1,.5), 20, replace=T)
> g3 <- sample( x=c('C/C', 'C/T', 'T/T'),
+ prob=c(.2,.4, 4), 20, replace=T)
>
> y <- rnorm(20) + (g1=='C/C') +
+ 0.25 * (g2=='A/A' | g2=='A/G')
>
> ## Form into a data frame
> data <- data.frame( y, g1, g2, g3)
>
> # Create marker labels for the data
[...]
> a1691g <- marker(name="A1691G",
+ type="SNP",
+ locus.name="MBP2",
+ chromosome=9,
+ arm="q",
+ index.start=35,
+ bp.start=1691,
+ relative.to="intron 1")
>
>
[...]
>
> data$g1 <- genotype(data$g1, locus=c104t)
> data$g2 <- genotype(data$g2, locus=a1691g)
> data$g3 <- genotype(data$g3, locus=c2249t)
>
> data
y g1 g2 g3
1 -0.084796634 T/T G/G T/C
2 1.454537575 C/C G/G T/T
3 -0.899625344 T/T G/G T/T
4 -1.980679630 C/T A/A T/T
5 0.231087028 C/T A/A T/T
6 2.588083646 C/C A/A T/C
7 0.209338731 C/C A/A T/T
8 1.435823157 C/T G/G T/T
9 -0.078796949 C/C G/G T/T
10 -2.091110058 C/T A/A T/T
11 -0.842655686 C/T G/G T/T
12 1.316828279 C/C G/G T/T
13 0.470126626 C/T A/A T/T
14 -0.364828611 T/T G/A T/T
15 -0.002438264 C/T A/A T/C
16 0.949432430 C/C G/G T/T
17 -0.096626850 C/T G/A T/T
18 1.065637984 T/T A/A T/T
19 0.817213289 C/C A/A T/T
20 0.644714638 C/T G/G T/T
>
> data$g2
Marker: MBP2:A1691G (9q35:1691) Type: SNP
[1] "G/G" "G/G" "G/G" "A/A" "A/A" "A/A" "A/A" "G/G" "G/G" "A/A" "G/G" "G/G"
[13] "A/A" "G/A" "A/A" "G/G" "G/A" "A/A" "A/A" "G/G"
Alleles: G A
>
> summary(data$g2)
Marker: MBP2:A1691G (9q35:1691) Type: SNP
Allele Frequency:
Count Proportion
A 20 0.5
G 20 0.5
Genotype Frequency:
Count Proportion
A/A 9 0.45
G/A 2 0.10
G/G 9 0.45
> HWE.test(data$g2)
-----------------------------------
Test for Hardy-Wienburg-Equilibrium
-----------------------------------
Call:
HWE.test.genotype(x = data$g2)
Raw Disequlibrium for each allele pair (D)
G A
G -0.2
A -0.2
Scaled Disequlibrium for each allele pair (D')
G A
G -0.8
A -0.8
Correlation coefficient for each allele pair (r)
G A
G 1.0 0.8
A 0.8 1.0
Overall Values (mean absolute-value weighted by expected allele frequency)
Value
D -0.2
D' -0.8
r 0.8
Confidence intervals computed via bootstrap using 1000 samples
Observed 95% CI NA's Contains Zero?
Overall D -0.2000000 (-0.2475000, -0.1093750) 0 *NO*
Overall D' -0.8000000 (-1.0000000, -0.4666667) 0 *NO*
Overall r 0.8000000 ( 0.4666667, 1.0000000) 0 *NO*
Significance Test:
Pearson's Chi-squared test with simulated p-value (based on 10000
replicates)
data: data$g2
X-squared = 12.8, df = NA, p-value = 7e-04
>
> summary(lm( y ~ homozygote(g1,'C') +
allele.count(g2, 'G') +
+ g3, data=data))
+
Call:
lm(formula = y ~ homozygote(g1, "C") + allele.count(g2, "G") +
g3, data = data)
Residuals:
Min 1Q Median 3Q Max
-1.6686 -0.6625 -0.0172 0.6973 1.6196
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.3499 0.6229 0.562 0.5821
homozygote(g1, "C")TRUE 1.2124 0.4778 2.537 0.0220 *
allele.count(g2, "G") 0.1193 0.2429 0.491 0.6298
g3T/T -0.7724 0.6414 -1.204 0.2460
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Residual standard error: 1.013 on 16 degrees of freedom
Multiple R-Squared: 0.3405, Adjusted R-squared: 0.2169
F-statistic: 2.754 on 3 and 16 DF, p-value: 0.07661
LEGAL NOTICE
Unless expressly stated otherwise, this message is confidential and may be privileged. It is intended for the addressee(s) only. Access to this E-mail by anyone else is unauthorized. If you are not an addressee, any disclosure or copying of the contents of this E-mail or any action taken (or not taken) in reliance on it is unauthorized and may be unlawful. If you are not an addressee, please inform the sender immediately.
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list