[BioC] From location to get the SNP rs ID

Hervé Pagès hpages at fhcrc.org
Fri May 31 13:01:54 CEST 2013


On 05/31/2013 03:31 AM, Fabrice Tourre wrote:
> Hervé Pagès,
>
> It helps. Thank you very much.
>
> But loc2rsid seems does not in version SNPlocs.Hsapiens.dbSNP.2010042
> and SNPlocs.Hsapiens.dbSNP.20120608.

Nope. That's why I sent you the code ;-)

H.

>
> On Fri, May 31, 2013 at 3:42 AM, Hervé Pagès <hpages at fhcrc.org> wrote:
>> Hi Fabrice,
>>
>> With the loc2rsid() function (see code at the bottom) and using
>> SNPlocs.Hsapiens.dbSNP.20100427 (I don't have
>> SNPlocs.Hsapiens.dbSNP.20120608 installed at the moment):
>>
>>
>>    mylocs <- GRanges(Rle(c("ch22", "chM", "ch21", "ch22"), c(2, 2, 1, 1)),
>>                      IRanges(c(200, 16369861, 146, 73, 9411609, 16051107),
>> width=1))
>>
>>    > mylocs
>>    GRanges with 6 ranges and 0 metadata columns:
>>          seqnames               ranges strand
>>             <Rle>            <IRanges>  <Rle>
>>      [1]     ch22 [     200,      200]      *
>>      [2]     ch22 [16369861, 16369861]      *
>>      [3]      chM [     146,      146]      *
>>      [4]      chM [      73,       73]      *
>>      [5]     ch21 [ 9411609,  9411609]      *
>>      [6]     ch22 [16051107, 16051107]      *
>>      ---
>>      seqlengths:
>>       ch21 ch22  chM
>>         NA   NA   NA
>>
>> Then:
>>
>>    library(SNPlocs.Hsapiens.dbSNP.20100427)
>>
>>    > loc2rsid(mylocs)
>>    CharacterList of length 6
>>    [[1]] character(0)
>>    [[2]] rs75258394 rs78180314
>>    [[3]] character(0)
>>    [[4]] character(0)
>>    [[5]] rs76676778
>>    [[6]] rs6518357
>>
>> You need to be careful to use the same chromosome naming convention as
>> the SNPlocs package (which uses dbSNP naming convention). For example
>> in the 'mylocs' object, the mitochondrial chromosome is named chM but
>> in the SNPlocs package it's chMT:
>>
>>    > getSNPcount()
>>        ch1     ch2     ch3     ch4     ch5     ch6     ch7     ch8 ch9
>> ch10
>>    1369185 1422439 1186807 1191984 1064540 1040203  977275  919207 740516
>> 859433
>>       ch11    ch12    ch13    ch14    ch15    ch16    ch17    ch18 ch19
>> ch20
>>     855225  822825  615382  543041  499101  556394  464686  482359 375259
>> 450685
>>       ch21    ch22     chX     chY    chMT
>>     253480  244482  445596   53908     667
>>
>> This is why the locations on chM are not mapped to anything. But
>> after renaming:
>>
>>    > seqlevels(mylocs)[3] <- "chMT"
>>    > seqlevels(mylocs)
>>    [1] "ch21" "ch22" "chMT"
>>
>>    > loc2rsid(mylocs)
>>    CharacterList of length 6
>>    [[1]] character(0)
>>    [[2]] rs75258394 rs78180314
>>    [[3]] rs72619361
>>    [[4]] rs3087742
>>    [[5]] rs76676778
>>    [[6]] rs6518357
>>
>> they get mapped.
>>
>> The CharacterList object returned by loc2rsid() can be set as a
>> metadata column of the input GRanges object:
>>
>>    > mcols(mylocs)$rsid <- loc2rsid(mylocs)
>>    > mylocs
>>    GRanges with 6 ranges and 1 metadata column:
>>          seqnames               ranges strand |                  rsid
>>             <Rle>            <IRanges>  <Rle> |       <CharacterList>
>>      [1]     ch22 [     200,      200]      * |
>>      [2]     ch22 [16369861, 16369861]      * | rs75258394,rs78180314
>>      [3]     chMT [     146,      146]      * |            rs72619361
>>      [4]     chMT [      73,       73]      * |             rs3087742
>>      [5]     ch21 [ 9411609,  9411609]      * |            rs76676778
>>      [6]     ch22 [16051107, 16051107]      * |             rs6518357
>>      ---
>>      seqlengths:
>>       ch21 ch22 chMT
>>         NA   NA   NA
>>
>> Hope this helps,
>> H.
>>
>>
>> ## Maps the genomic locations in 'locs' to the corresponding rs ids.
>> ## 'locs' must be a GRanges object where all ranges are of with 1.
>> ## Because dbSNP can assign distinct ids to SNPs located at the same
>> ## position, the mapping is a 1-to-many relationship.
>> ## Returns a CharacterList of the same length as 'locs'.
>> loc2rsid <- function(locs)
>> {
>>      if (!is(locs, "GRanges"))
>>          stop("'locs' must be a GRanges object")
>>      if (!all(width(locs) == 1L))
>>          stop("all ranges in 'locs' must be of width 1")
>>      common_seqlevels <- intersect(seqlevels(locs), names(getSNPcount()))
>>      if (length(common_seqlevels) == 0L)
>>          stop("chromosome names (a.k.a. seqlevels) in 'locs' don't seem to ",
>>               "be\n  compatible with the chromosome names in the SNPlocs ",
>>               "package. Maybe they\n  use a different naming convention? ",
>>               "If that's the case then you first need\n  to rename the ",
>>               "seqlevels in 'locs'. See '?seqlevels' for how to do this.")
>>      f <- as.factor(seqnames(locs))
>>      locs_by_chrom <- split(start(locs), f)
>>      rsids_by_chrom <- lapply(seq_along(locs_by_chrom),
>>             function(i)
>>             {
>>                 seqname <- levels(f)[i]
>>                 locs2 <- locs_by_chrom[[i]]
>>                 ans2 <- vector("list", length=length(locs2))
>>                 if (length(locs2) == 0L || !(seqname %in% common_seqlevels))
>>                     return(ans2)
>>                 snplocs <- getSNPlocs(seqname, as.GRanges=FALSE)
>>                 hits <- findMatches(locs2, snplocs$loc)
>>                 rsids <- paste0("rs", snplocs$RefSNP_id[subjectHits(hits)])
>>                 q_hits <- queryHits(hits)
>>                 tmp <- split(rsids, q_hits)
>>                 ans2[as.integer(names(tmp))] <- tmp
>>                 ans2
>>             })
>>      CharacterList(unsplit(rsids_by_chrom, f))
>> }
>>
>> PS: This functionality will make it to the SNPlocs packages (in this
>> form or another).
>>
>>
>>
>> On 05/30/2013 09:54 PM, Fabrice Tourre wrote:
>>>
>>> Dear list,
>>>
>>> I have a list of genome positions. Is there is a package to get the
>>> snp name at this position?
>>>
>>> It is the reverse function as rsid2loc in SNPlocs.Hsapiens.dbSNP.20120608
>>>
>>> Thank you.
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fhcrc.org
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list