[BioC] get chr position for a batch of human SNPs

shirley zhang shirley0818 at gmail.com
Fri Sep 23 02:48:59 CEST 2011


Dear Herve,

Thanks for your quick reply.

I've checked the link you suggested. The SNPs in
"SNPlocs.Hsapiens.dbSNP.20090506" do map the hg18 genome (NCBI Build
36.1). But when I manually check the hg18 positions for a few SNPs on
dbSNP at NCBI. I can only get position information for NCBI Build 36.3
and NCBI Build 37. What is the difference between NCBI Build 36.3 and
NCBI Build 36.1? Are they basically the same? Further, in dbSNP at
NCBI, even for NCBI Build 36.3, there are 3 versions of the hg18
position per SNP (each version has a different Group Label, i.e.
"reference", "Celera" or "HuRef"). Could you kindly tell me what's the
difference among these 3 versions?

For my 2nd question, getting SNP position for SNPs without knowing the
chr information, since I need to map SNPids to hg18 positions very
often in my research, after all SNPs from all of chromosomes  were
retrieved using "SNPlocs.Hsapiens.dbSNP.20090506",  I combined them
into one big data frame and saved it on the server. But later on if I
need to map SNPids to hg19, I will follow your suggestions by creating
a big GRanges object using more recent SNPlocs packages.

Thanks a lot for your detailed explanation. It is very helpful.

Shirley

2011/9/22 Hervé Pagès <hpages at fhcrc.org>:
> Hi Shirley,
>
> On 11-09-22 11:29 AM, shirley zhang wrote:
>>
>> Dear All,
>>
>> I am planing to map the SNPids to hg18 positions (chr and position)
>> for a huge list of human snps. I've tried the package
>> "SNPlocs.Hsapiens.dbSNP.20090506" and have 2 questions regarding this
>> package:
>>
>> 1. Do the SNPs in this package map the hg18 genome (NCBI Build 36.3
>> with Group Label "reference" instead of "Celera" or "HuRef"?
>
> Yes, they are mapped to hg18. See:
>
>
> http://bioconductor.org/packages/release/data/annotation/html/SNPlocs.Hsapiens.dbSNP.20090506.html
>
> and the man page of the package for additional details:
>
>  > library(SNPlocs.Hsapiens.dbSNP.20090506)
>  > ?SNPlocs.Hsapiens.dbSNP.20090506
>
>>
>> 2. If I don't know the chr information (seqname), can I obtain the
>> position with dbSNP Id only?
>
> Unfortunately, because SNPs are stored in one data frame per
> chromosome, if you don't know the chr then you need to load and
> query each data frame individually.
>
> With more recent SNPlocs packages (e.g.
> SNPlocs.Hsapiens.dbSNP.20100427), provision was added to
> let the user load SNPs from more than 1 chromosome in a single
> GRanges object, so you can do something like:
>
>  ## Load all the SNPs in a big GRanges object (takes about
>  ## 13 minutes and requires 6GB of RAM!):
>  all_snps <- getSNPlocs(names(getSNPcount()), as.GRanges=TRUE)
>
>  ## Use the rs ids to set the names (takes about 6 minutes):
>  names(all_snps) <- paste("rs", elementMetadata(all_snps)$RefSNP_id,
>                           sep="")
>
>  ## Then extract your SNPs from the big GRanges object (again,
>  ## this can take a long time, depending on how many SNPs you
>  ## extract):
>  my_rs_ids <- sample(names(all_snps), 1000)
>  my_snps <- all_snps[my_rs_ids]
>
> However, please note that, starting with
> SNPlocs.Hsapiens.dbSNP.20100427 (i.e. dbSNP Build 131),
> SNPs are mapped to GRCh37 (UCSC hg19) instead of hg18.
>
> Hope this helps,
> H.
>
>>
>> Further, I find dbSNP batch queries a little more difficult to work
>> with because they map to different versions of the hg18 like Celera,
>> HumanRef, etc.Can anybody let me know a better option to get hg18 chr
>> position with the most popular or confident version of dbSNP?
>>
>> Thanks in advance
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>



More information about the Bioconductor mailing list