[BioC] where to get chr_rpts file for dbSNP human 36.3 assembly

Sean Davis sdavis2 at mail.nih.gov
Thu Nov 3 19:16:16 CET 2011


On Thu, Nov 3, 2011 at 2:05 PM, shirley zhang <shirley0818 at gmail.com> wrote:
> Dear Herve and Sean,
>
> Thanks for your reply.  May I ask one more help from you?
>
> Do you know where I can get the list of SNPs (rs# ) mapped to more
> than 1 location on the reference genome NCBI Build 36.3?

Hi, Shirley.  This level of detail might need to go to NCBI for an
answer if you REALLY need to use NCBI annotations directly.  That
said, UCSC does some reannotation before releasing dbSNP on their
site.  There is a table that described dbSNP exceptions including
Multiple Locations.  You can download that file here:

http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/snp130Exceptions.txt.gz

The table format is described here:

http://genome.ucsc.edu/cgi-bin/hgTables?hgta_doSchemaDb=hg18&hgta_doSchemaTable=snp130Exceptions

Sean


> Thanks,
> Shirley
>
> On Tue, Nov 1, 2011 at 9:20 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>> 2011/11/1 shirley zhang <shirley0818 at gmail.com>:
>>> Dear Hever,
>>>
>>> Also, I just checked that there is no liftOver function in the
>>> rtracklayer package. Is it a different function name?  Thanks, Shirley
>>>
>>>> sessionInfo()
>>> R version 2.11.1 (2010-05-31)
>>
>> Hi, Shirley.
>>
>> You'll definitely need to update your R.  R was just released and is
>> now at version 2.14.0.  With the new version of R, you'll get new
>> versions of packages.  The most recent couple of versions of
>> rtracklayer include liftover()
>>
>> Sean
>>
>>
>>> x86_64-unknown-linux-gnu
>>>
>>> locale:
>>>  [1] LC_CTYPE=en_US.iso885915       LC_NUMERIC=C
>>>  [3] LC_TIME=en_US.iso885915        LC_COLLATE=en_US.iso885915
>>>  [5] LC_MONETARY=C                  LC_MESSAGES=en_US.iso885915
>>>  [7] LC_PAPER=en_US.iso885915       LC_NAME=C
>>>  [9] LC_ADDRESS=C                   LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] rtracklayer_1.8.1 RCurl_1.4-3       bitops_1.0-4.1
>>>
>>> loaded via a namespace (and not attached):
>>> [1] Biobase_2.8.0       Biostrings_2.16.9   BSgenome_1.16.5
>>> [4] GenomicRanges_1.0.9 IRanges_1.6.15      XML_3.1-1
>>>
>>>
>>> 2011/11/1 shirley zhang <shirley0818 at gmail.com>:
>>>> Dear Herve,
>>>>
>>>> Thanks for your quick response.
>>>>
>>>> I need to get the chr position (hg18, build36.3)  for a huge list of
>>>> SNPs with rs#. As you suggested before, I first tried the library
>>>> "SNPlocs.Hsapiens.dbSNP.20090506", and got the chr position for 90% of
>>>> my SNPs. For the remaining 10% of SNPs, I would like to get the chr
>>>> position from the NCBI dbSNP website ( build 130, reference 36.3). I
>>>> understand that I could use the batch query. However, I have to do
>>>> this kind of mapping routinely for different sets of SNPs. So I am
>>>> thinking to download those chr_rpts files for dbSNP human 36.3
>>>> assembly to our server, then use them to do the mapping.
>>>>
>>>> I don't know what I've tried or will going to do is the right way to
>>>> do. Could you give me any comments or suggestions?
>>>>
>>>> Thanks a lot!
>>>> Shirley
>>>>
>>>> 2011/11/1 Hervé Pagès <hpages at fhcrc.org>:
>>>>> Hi Shirley,
>>>>>
>>>>> On 11-11-01 01:51 PM, shirley zhang wrote:
>>>>>>
>>>>>> Dear list,
>>>>>>
>>>>>> In terms of dbSNP database in NCBI, I can get the chr_rpts files for
>>>>>> the most recent 37.3 assembly from the following FTP site,
>>>>>>
>>>>>> ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/chr_rpts/
>>>>>>
>>>>>> My question is how/where I can get these chr_rpts files based on the
>>>>>> 36.3 assembly
>>>>>
>>>>> Please don't cross post. This sounds like a question for the dbSNP
>>>>> folks.
>>>>>
>>>>> FWIW, right now it doesn't seem like those files have been updated yet:
>>>>> they are still from August 15 (i.e. dbSNP build 134, based on reference
>>>>> genome GRCh37.p2). AFAIK the last build based of the 36.3 assembly was
>>>>> dbSNP build 130.
>>>>>
>>>>> Not sure what you want to do with those files, but if you only need
>>>>> to access the genome coordinates and alleles of your SNPs, you might
>>>>> want to have a look at the SNPlocs.* packages.
>>>>>
>>>>> Alternatively, you could always use a tool like UCSC liftOver (also
>>>>> available in Bioconductor, in the rtracklayer package) to remap things
>>>>> between different genome assemblies.
>>>>>
>>>>> Cheers,
>>>>> H.
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Shirley
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at r-project.org
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives:
>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>
>>>>>
>>>>> --
>>>>> Hervé Pagès
>>>>>
>>>>> Program in Computational Biology
>>>>> Division of Public Health Sciences
>>>>> Fred Hutchinson Cancer Research Center
>>>>> 1100 Fairview Ave. N, M1-B514
>>>>> P.O. Box 19024
>>>>> Seattle, WA 98109-1024
>>>>>
>>>>> E-mail: hpages at fhcrc.org
>>>>> Phone:  (206) 667-5791
>>>>> Fax:    (206) 667-1319
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Xiaoling (Shirley) Zhang
>>>>
>>>> M.D., Ph.D. (Bioinformatics)
>>>> Boston University, Boston, MA
>>>> Tel: (857) 233-9862
>>>> Email: zhangxl at bu.edu
>>>>
>>>
>>>
>>>
>>> --
>>> Xiaoling (Shirley) Zhang
>>>
>>> M.D., Ph.D. (Bioinformatics)
>>> Boston University, Boston, MA
>>> Tel: (857) 233-9862
>>> Email: zhangxl at bu.edu
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>
>
>
> --
> Xiaoling (Shirley) Zhang
>
> M.D., Ph.D. (Bioinformatics)
> Boston University, Boston, MA
> Tel: (857) 233-9862
> Email: zhangxl at bu.edu
>



More information about the Bioconductor mailing list