[BioC] get chr position for a batch of human SNPs

Hervé Pagès hpages at fhcrc.org
Thu Sep 22 23:24:19 CEST 2011


Hi Shirley,

On 11-09-22 11:29 AM, shirley zhang wrote:
> Dear All,
>
> I am planing to map the SNPids to hg18 positions (chr and position)
> for a huge list of human snps. I've tried the package
> "SNPlocs.Hsapiens.dbSNP.20090506" and have 2 questions regarding this
> package:
>
> 1. Do the SNPs in this package map the hg18 genome (NCBI Build 36.3
> with Group Label "reference" instead of "Celera" or "HuRef"?

Yes, they are mapped to hg18. See:

 
http://bioconductor.org/packages/release/data/annotation/html/SNPlocs.Hsapiens.dbSNP.20090506.html

and the man page of the package for additional details:

   > library(SNPlocs.Hsapiens.dbSNP.20090506)
   > ?SNPlocs.Hsapiens.dbSNP.20090506

>
> 2. If I don't know the chr information (seqname), can I obtain the
> position with dbSNP Id only?

Unfortunately, because SNPs are stored in one data frame per
chromosome, if you don't know the chr then you need to load and
query each data frame individually.

With more recent SNPlocs packages (e.g.
SNPlocs.Hsapiens.dbSNP.20100427), provision was added to
let the user load SNPs from more than 1 chromosome in a single
GRanges object, so you can do something like:

   ## Load all the SNPs in a big GRanges object (takes about
   ## 13 minutes and requires 6GB of RAM!):
   all_snps <- getSNPlocs(names(getSNPcount()), as.GRanges=TRUE)

   ## Use the rs ids to set the names (takes about 6 minutes):
   names(all_snps) <- paste("rs", elementMetadata(all_snps)$RefSNP_id,
                            sep="")

   ## Then extract your SNPs from the big GRanges object (again,
   ## this can take a long time, depending on how many SNPs you
   ## extract):
   my_rs_ids <- sample(names(all_snps), 1000)
   my_snps <- all_snps[my_rs_ids]

However, please note that, starting with
SNPlocs.Hsapiens.dbSNP.20100427 (i.e. dbSNP Build 131),
SNPs are mapped to GRCh37 (UCSC hg19) instead of hg18.

Hope this helps,
H.

>
> Further, I find dbSNP batch queries a little more difficult to work
> with because they map to different versions of the hg18 like Celera,
> HumanRef, etc.Can anybody let me know a better option to get hg18 chr
> position with the most popular or confident version of dbSNP?
>
> Thanks in advance
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list