[Bioc-devel] seqnames of SNPlocs.*

Hervé Pagès hpages at fhcrc.org
Wed Jun 18 23:48:06 CEST 2014


Hi Peter,

Just added support for "dbSNP" seqlevels style for Human (in
GenomeInfoDb 1.1.9, will become available tomorrow):

   library(SNPlocs.Hsapiens.dbSNP.20120608)
   myrsids <- c("rs2639606", "rs75264089", "rs73396229", "rs55871206",
                "rs10932221", "rs56219727", "rs73709730", "rs55838886",
                "rs3734153", "rs79381275", "rs1516535")
   gr <- rsidsToGRanges(myrsids)

Then:

   > seqnames(gr)
   factor-Rle of length 11 with 11 runs
     Lengths:    1    1    1    1    1    1    1    1    1    1    1
     Values :  ch9  ch6 ch11 ch13  ch2  ch4  ch7  ch2  ch5 ch11  ch4
   Levels(25): ch1 ch2 ch3 ch4 ch5 ch6 ch7 ... ch19 ch20 ch21 ch22 chX 
chY chMT

   > seqlevelsStyle(gr)
   [1] "dbSNP"

   > seqlevelsStyle(gr) <- "NCBI"

   > seqnames(gr)
   factor-Rle of length 11 with 11 runs
     Lengths:  1  1  1  1  1  1  1  1  1  1  1
     Values :  9  6 11 13  2  4  7  2  5 11  4
   Levels(25): 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 
X Y MT

   > seqlevelsStyle(gr) <- "UCSC"

   > seqnames(gr)
   factor-Rle of length 11 with 11 runs
     Lengths:     1     1     1     1     1     1     1     1     1 
1     1
     Values :  chr9  chr6 chr11 chr13  chr2  chr4  chr7  chr2  chr5 
chr11  chr4
   Levels(25): chr1 chr2 chr3 chr4 chr5 chr6 ... chr20 chr21 chr22 chrX 
chrY chrM

Make the seqlevelsStyle() setter work directly on the
SNPlocs.Hsapiens.dbSNP.20120608 object itself will take more time
though. It'll actually be part of some more important SNPlocs
refactoring plans I've had on my list for a while now. Won't happen
before a couple of months.

Cheers,
H.

On 06/17/2014 10:37 PM, Hervé Pagès wrote:
> Hi Peter,
>
> Yes, as Vince said, the chromosome names are those used by dbSNP. For
> whatever reason, dbSNP, which is part of NCBI, felt the need to use
> a different naming convention than the rest of NCBI :-/
>
> On 06/17/2014 07:57 PM, Peter Hickey wrote:
>> Thanks for the explanation, Vincent. GenomeInfoDb has NCBI and UCSC
>> support, but doesn't seem to support the dbSNP format. Perhaps this
>> should be added?
>
> The seqlevelsStyle() setter first requires that the seqlevels() setter
> works on a SNPlocs object, which itself requires that the seqinfo()
> setter works. Unfortunately, it doesn't at the moment:
>
>    > library(SNPlocs.Hsapiens.dbSNP.20120608)
>
>    > snps <- SNPlocs.Hsapiens.dbSNP.20120608
>
>    > seqlevels(snps) <- sub("^ch", "chr", seqlevels(snps))
>    Error in (function (classes, fdef, mtable)  :
>      unable to find an inherited method for function ‘seqinfo<-’ for
> signature ‘"SNPlocs"’
>
> Something I'm adding on my list.
>
> In the mean time you can do the renaming on the GRanges objects
> you extract with 'getSNPlocs(..., as.GRanges=TRUE)' or with
> 'rsidsToGRanges(...)'. Maybe it's not very convenient to have to do
> this each time you extract snps in a GRanges object but OTOH it's
> really easy those days now that we have seqlevelsStyle().
>
> Hope this helps.
>
> Cheers,
> H.
>
>>
>>> seqlevelsStyle(seqnames(SNPlocs.Hsapiens.dbSNP.20120608))
>> Error in .guessSpeciesStyle(seqnames) :
>>    The style does not have a compatible entry for the species
>> supported by Seqname. Please
>>    see genomeStyles() for supported species/style
>>
>> On 18/06/2014, at 12:40 PM, Vincent Carey <stvjc at channing.harvard.edu>
>> wrote:
>>
>>> it is the convention used in dbSNP, just propagated directly.  indeed
>>> one typically has to relabel, but there
>>> is seqnamesStyle infrastructure in GenomeInfoDb that may help.
>>>
>>>
>>> On Tue, Jun 17, 2014 at 8:17 PM, Peter Hickey <hickey at wehi.edu.au>
>>> wrote:
>>> Is there a reason why the seqnames of SNPlocs.Hsapiens.dbSNP.20120608
>>> (and possibly the other SNPlocs.*) use the prefix "ch" instead of
>>> "chr"? E.g. "ch1" instead of "chr1". It doesn't seem to fit with any
>>> standard way of naming chromosomes and means that these need to be
>>> renamed to use with most other Bioconductor data sources.
>>> Thanks,
>>> Pete
>>> --------------------------------
>>> Peter Hickey,
>>> PhD Student/Research Assistant,
>>> Bioinformatics Division,
>>> Walter and Eliza Hall Institute of Medical Research,
>>> 1G Royal Parade, Parkville, Vic 3052, Australia.
>>> Ph: +613 9345 2324
>>>
>>> hickey at wehi.edu.au
>>> http://www.wehi.edu.au
>>>
>>>
>>> ______________________________________________________________________
>>> The information in this email is confidential and inte...{{dropped:28}}
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list