[BioC] New SNPlocs data package for Human (dbSNP BUILD 130)
Hervé Pagès
hpages at fhcrc.org
Sat Jun 6 00:16:21 CEST 2009
Hi SNPlocs users,
I've added SNPlocs.Hsapiens.dbSNP.20090506 to the BioC repo (in BioC release
only, source tarball only, but that's just for now). It contains the SNP
locations and alleles for Homo sapiens extracted from dbSNP BUILD 130 (the
latest dbSNP build).
From within R-2.9:
> library(BSgenome)
> available.SNPs()
[1] "SNPlocs.Hsapiens.dbSNP.20071016" "SNPlocs.Hsapiens.dbSNP.20080617"
[3] "SNPlocs.Hsapiens.dbSNP.20090506"
Install with:
source("http://bioconductor.org/biocLite.R")
biocLite("SNPlocs.Hsapiens.dbSNP.20090506")
Then:
> library(SNPlocs.Hsapiens.dbSNP.20090506)
> ?SNPlocs.Hsapiens.dbSNP.20090506 # now there is a man page!
> getSNPcount()
chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11
920233 933616 789121 798603 706109 760249 655873 612367 496064 583240 577300
chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22
558759 427010 365742 331501 354239 316396 322866 268235 323041 160580 187392
chrX chrY
391414 6539
Overall, that's 10% more SNPs than in the previous build (BUILD 129).
Note that, like with the previous builds, there are still different RefSNP
IDs that are mapped to the same location:
> chr1_snps <- getSNPlocs("chr1")
> sum(duplicated(chr1_snps$loc))
[1] 950
Twice more than with BUILD 129!
> which(duplicated(chr1_snps$loc))[1:10]
[1] 3142 3365 7835 8161 8327 10638 12113 14060 14640 15538
> chr1_snps[chr1_snps$loc == chr1_snps$loc[3142], ]
RefSNP_id alleles_as_ambig loc
3141 3766175 D 1476802
3142 59009700 W 1476802
Please let me know if you find any problem with this new package.
Cheers,
H.
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list