[BioC] dbSNP build for R package SNPlocs.Hsapiens.dbSNP.20080617
Lin Tang
ltang at scmmlab.com
Thu Jun 4 20:04:53 CEST 2009
Thanks all for the discussion. Really looking forward for the updated package!
Lin
-----Original Message-----
From: Hervé Pagès [mailto:hpages at fhcrc.org]
Sent: Thursday, June 04, 2009 10:59 AM
To: James W. MacDonald
Cc: Lin Tang; bioconductor
Subject: Re: [BioC] dbSNP build for R package SNPlocs.Hsapiens.dbSNP.20080617
Hi Jim,
James W. MacDonald wrote:
> Hi Herve,
>
> I've been dealing with these data myself recently, and can confirm that
> the data in March were build 129. They put the build 130 data up in
> early May.
>
> As a side note, build 129 is known to be problematic, as there are
> multiple RS numbers that map to the same location:
>
> http://www.ncbi.nlm.nih.gov/mailman/pipermail/dbsnp-announce/2008q2/000082.html
>
Indeed:
> library(SNPlocs.Hsapiens.dbSNP.20080617)
> data(chr1_snplocs)
> sum(duplicated(chr1_snplocs$loc))
[1] 413
> which(duplicated(chr1_snplocs$loc))[1:10]
[1] 2822 3030 9547 10865 12604 12641 16854 17898 21175 21977
> chr1_snplocs[chr1_snplocs$loc == chr1_snplocs$loc[2822], ]
RefSNP_id alleles_as_ambig loc
2821 3766175 D 1476802
2822 59009700 W 1476802
Something that puzzled me when I first started to work on the SNPlocs.*
packages (I saw this in Build 128 too).
>
> According to their help team, this problem has been resolved in build 130.
Good. I'll make a new SNPlocs.Hsapiens.dbSNP.* from this build.
Thanks!
H.
>
> Best,
>
> Jim
>
>
>
> Hervé Pagès wrote:
>> Hi Lin,
>>
>> I'm cc'ing the BioC list so other users might benefit from this.
>>
>> Lin Tang wrote:
>>> Dear Dr. Pages,
>>>
>>>
>>>
>>>
>>> I am using R package SNPlocs.Hsapiens.dbSNP.20080617 currently, I want
>>> to check with you that whether this package corresponds to dbSNP build
>>> 129 ? Although from the release date of this R package which is two
>>> months after the release of dbSNP build 129, it is logical to be so. I
>>> want to have it confirmed from you. I'd appreciate your kind reply on
>>> this. Thanks!
>>
>> It's hard to tell.
>>
>> According to these pages:
>>
>> http://www.ncbi.nlm.nih.gov/mailman/pipermail/dbsnp-announce/2008q2/000081.html
>>
>> http://www.ncbi.nlm.nih.gov/projects/SNP/buildhistory.cgi
>> Build 129 was released in April 2008 (note that the exact dates found
>> on these
>> 2 pages don't match).
>>
>> A similar research shows that Build 130 was released about 1 month ago.
>>
>> So at the time I downloaded the ds_flat_ch*.flat files from here
>> ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/ASN1_flat
>> in order to build SNPlocs.Hsapiens.dbSNP.20080617 (that was in March
>> 2009), I assume that these files were a dump from Build 129.
>>
>> Note that the files under
>> ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/ASN1_flat
>> can change at anytime (and today they are indeed different from what they
>> were back in March). It's a sad thing that the SNP team at NCBI doesn't
>> provide permanent URLs for their past builds. And it doesn't help that
>> the ds_flat_ch*.flat files they provide don't contain any information
>> about the build that they're coming from.
>>
>> Anyway, in the future I'll put the Build information in the DESCRIPTION
>> file of the SNPlocs packages.
>>
>> One last note. According to the SNP team at NCBI "Human SNPs in Build 129
>> are mapping to NCBI build 36.3". That is, to our
>> BSgenome.Hsapiens.UCSC.hg18
>> package. According to UCSC, hg18 is NCBI Build 36.1 but NCBI Build
>> 36.1 and
>> NCBI Build 36.3 are identical from a *sequence* point of view (I think
>> what
>> makes them different are the annotations provided by NCBI).
>> This means that, if you are planning to inject
>> SNPlocs.Hsapiens.dbSNP.20080617
>> in a genome, it only makes sense to do it with
>> BSgenome.Hsapiens.UCSC.hg18.
>>
>> In the future we will put in place a mechanism to make this injection
>> safer
>> i.e. check that the injected stuff and the host are compatible.
>>
>> Cheers,
>> H.
>>
>>
>>>
>>>
>>> Regards,
>>>
>>> Lin Tang, Ph.D.
>>>
>>> Scientist , Informatics | Sequenom Inc.
>>>
>>> T: 1 858 202 9106 | F: 1 858 202 9084 | E: ltang at sequenom.com
>>>
>>>
>>>
>>>
>>>
>>> THIS EMAIL MESSAGE IS FOR THE SOLE USE OF THE INTENDED RECIPIENT(S)
>>> AND MAY CONTAIN CONFIDENTIAL INFORMATION. ANY UNAUTHORIZED REVIEW,
>>> USE, DISCLOSURE OR DISTRIBUTION IS PROHIBITED. IF YOU ARE NOT THE
>>> INTENDED RECIPIENT, PLEASE CONTACT THE SENDER BY REPLY EMAIL AND
>>> DESTROY ALL COPIES OF THE ORIGINAL MESSAGE.
>>>
>>
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list