[BioC] From location to get the SNP rs ID

Fri May 31 12:31:00 CEST 2013

Hervé Pagès,

It helps. Thank you very much.

But loc2rsid seems does not in version SNPlocs.Hsapiens.dbSNP.2010042
and SNPlocs.Hsapiens.dbSNP.20120608.

On Fri, May 31, 2013 at 3:42 AM, Hervé Pagès <hpages at fhcrc.org> wrote:
> Hi Fabrice,
>
> With the loc2rsid() function (see code at the bottom) and using
> SNPlocs.Hsapiens.dbSNP.20100427 (I don't have
> SNPlocs.Hsapiens.dbSNP.20120608 installed at the moment):
>
>
>   mylocs <- GRanges(Rle(c("ch22", "chM", "ch21", "ch22"), c(2, 2, 1, 1)),
>                     IRanges(c(200, 16369861, 146, 73, 9411609, 16051107),
> width=1))
>
>   > mylocs
>   GRanges with 6 ranges and 0 metadata columns:
>         seqnames               ranges strand
>            <Rle>            <IRanges>  <Rle>
>     [1]     ch22 [     200,      200]      *
>     [2]     ch22 [16369861, 16369861]      *
>     [3]      chM [     146,      146]      *
>     [4]      chM [      73,       73]      *
>     [5]     ch21 [ 9411609,  9411609]      *
>     [6]     ch22 [16051107, 16051107]      *
>     ---
>     seqlengths:
>      ch21 ch22  chM
>        NA   NA   NA
>
> Then:
>
>   library(SNPlocs.Hsapiens.dbSNP.20100427)
>
>   > loc2rsid(mylocs)
>   CharacterList of length 6
>   [[1]] character(0)
>   [[2]] rs75258394 rs78180314
>   [[3]] character(0)
>   [[4]] character(0)
>   [[5]] rs76676778
>   [[6]] rs6518357
>
> You need to be careful to use the same chromosome naming convention as
> the SNPlocs package (which uses dbSNP naming convention). For example
> in the 'mylocs' object, the mitochondrial chromosome is named chM but
> in the SNPlocs package it's chMT:
>
>   > getSNPcount()
>       ch1     ch2     ch3     ch4     ch5     ch6     ch7     ch8 ch9
> ch10
>   1369185 1422439 1186807 1191984 1064540 1040203  977275  919207 740516
> 859433
>      ch11    ch12    ch13    ch14    ch15    ch16    ch17    ch18 ch19
> ch20
>    855225  822825  615382  543041  499101  556394  464686  482359 375259
> 450685
>      ch21    ch22     chX     chY    chMT
>    253480  244482  445596   53908     667
>
> This is why the locations on chM are not mapped to anything. But
> after renaming:
>
>   > seqlevels(mylocs)[3] <- "chMT"
>   > seqlevels(mylocs)
>   [1] "ch21" "ch22" "chMT"
>
>   > loc2rsid(mylocs)
>   CharacterList of length 6
>   [[1]] character(0)
>   [[2]] rs75258394 rs78180314
>   [[3]] rs72619361
>   [[4]] rs3087742
>   [[5]] rs76676778
>   [[6]] rs6518357
>
> they get mapped.
>
> The CharacterList object returned by loc2rsid() can be set as a
> metadata column of the input GRanges object:
>
>   > mcols(mylocs)$rsid <- loc2rsid(mylocs)
>   > mylocs
>   GRanges with 6 ranges and 1 metadata column:
>         seqnames               ranges strand |                  rsid
>            <Rle>            <IRanges>  <Rle> |       <CharacterList>
>     [1]     ch22 [     200,      200]      * |
>     [2]     ch22 [16369861, 16369861]      * | rs75258394,rs78180314
>     [3]     chMT [     146,      146]      * |            rs72619361
>     [4]     chMT [      73,       73]      * |             rs3087742
>     [5]     ch21 [ 9411609,  9411609]      * |            rs76676778
>     [6]     ch22 [16051107, 16051107]      * |             rs6518357
>     ---
>     seqlengths:
>      ch21 ch22 chMT
>        NA   NA   NA
>
> Hope this helps,
> H.
>
>
> ## Maps the genomic locations in 'locs' to the corresponding rs ids.
> ## 'locs' must be a GRanges object where all ranges are of with 1.
> ## Because dbSNP can assign distinct ids to SNPs located at the same
> ## position, the mapping is a 1-to-many relationship.
> ## Returns a CharacterList of the same length as 'locs'.
> loc2rsid <- function(locs)
> {
>     if (!is(locs, "GRanges"))
>         stop("'locs' must be a GRanges object")
>     if (!all(width(locs) == 1L))
>         stop("all ranges in 'locs' must be of width 1")
>     common_seqlevels <- intersect(seqlevels(locs), names(getSNPcount()))
>     if (length(common_seqlevels) == 0L)
>         stop("chromosome names (a.k.a. seqlevels) in 'locs' don't seem to ",
>              "be\n  compatible with the chromosome names in the SNPlocs ",
>              "package. Maybe they\n  use a different naming convention? ",
>              "If that's the case then you first need\n  to rename the ",
>              "seqlevels in 'locs'. See '?seqlevels' for how to do this.")
>     f <- as.factor(seqnames(locs))
>     locs_by_chrom <- split(start(locs), f)
>     rsids_by_chrom <- lapply(seq_along(locs_by_chrom),
>            function(i)
>            {
>                seqname <- levels(f)[i]
>                locs2 <- locs_by_chrom[[i]]
>                ans2 <- vector("list", length=length(locs2))
>                if (length(locs2) == 0L || !(seqname %in% common_seqlevels))
>                    return(ans2)
>                snplocs <- getSNPlocs(seqname, as.GRanges=FALSE)
>                hits <- findMatches(locs2, snplocs$loc)
>                rsids <- paste0("rs", snplocs$RefSNP_id[subjectHits(hits)])
>                q_hits <- queryHits(hits)
>                tmp <- split(rsids, q_hits)
>                ans2[as.integer(names(tmp))] <- tmp
>                ans2
>            })
>     CharacterList(unsplit(rsids_by_chrom, f))
> }
>
> PS: This functionality will make it to the SNPlocs packages (in this
> form or another).
>
>
>
> On 05/30/2013 09:54 PM, Fabrice Tourre wrote:
>>
>> Dear list,
>>
>> I have a list of genome positions. Is there is a package to get the
>> snp name at this position?
>>
>> It is the reverse function as rsid2loc in SNPlocs.Hsapiens.dbSNP.20120608
>>
>> Thank you.
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319