[BioC] From location to get the SNP rs ID
Fabrice Tourre
fabrice.ciup at gmail.com
Fri May 31 12:31:00 CEST 2013
Hervé Pagès,
It helps. Thank you very much.
But loc2rsid seems does not in version SNPlocs.Hsapiens.dbSNP.2010042
and SNPlocs.Hsapiens.dbSNP.20120608.
On Fri, May 31, 2013 at 3:42 AM, Hervé Pagès <hpages at fhcrc.org> wrote:
> Hi Fabrice,
>
> With the loc2rsid() function (see code at the bottom) and using
> SNPlocs.Hsapiens.dbSNP.20100427 (I don't have
> SNPlocs.Hsapiens.dbSNP.20120608 installed at the moment):
>
>
> mylocs <- GRanges(Rle(c("ch22", "chM", "ch21", "ch22"), c(2, 2, 1, 1)),
> IRanges(c(200, 16369861, 146, 73, 9411609, 16051107),
> width=1))
>
> > mylocs
> GRanges with 6 ranges and 0 metadata columns:
> seqnames ranges strand
> <Rle> <IRanges> <Rle>
> [1] ch22 [ 200, 200] *
> [2] ch22 [16369861, 16369861] *
> [3] chM [ 146, 146] *
> [4] chM [ 73, 73] *
> [5] ch21 [ 9411609, 9411609] *
> [6] ch22 [16051107, 16051107] *
> ---
> seqlengths:
> ch21 ch22 chM
> NA NA NA
>
> Then:
>
> library(SNPlocs.Hsapiens.dbSNP.20100427)
>
> > loc2rsid(mylocs)
> CharacterList of length 6
> [[1]] character(0)
> [[2]] rs75258394 rs78180314
> [[3]] character(0)
> [[4]] character(0)
> [[5]] rs76676778
> [[6]] rs6518357
>
> You need to be careful to use the same chromosome naming convention as
> the SNPlocs package (which uses dbSNP naming convention). For example
> in the 'mylocs' object, the mitochondrial chromosome is named chM but
> in the SNPlocs package it's chMT:
>
> > getSNPcount()
> ch1 ch2 ch3 ch4 ch5 ch6 ch7 ch8 ch9
> ch10
> 1369185 1422439 1186807 1191984 1064540 1040203 977275 919207 740516
> 859433
> ch11 ch12 ch13 ch14 ch15 ch16 ch17 ch18 ch19
> ch20
> 855225 822825 615382 543041 499101 556394 464686 482359 375259
> 450685
> ch21 ch22 chX chY chMT
> 253480 244482 445596 53908 667
>
> This is why the locations on chM are not mapped to anything. But
> after renaming:
>
> > seqlevels(mylocs)[3] <- "chMT"
> > seqlevels(mylocs)
> [1] "ch21" "ch22" "chMT"
>
> > loc2rsid(mylocs)
> CharacterList of length 6
> [[1]] character(0)
> [[2]] rs75258394 rs78180314
> [[3]] rs72619361
> [[4]] rs3087742
> [[5]] rs76676778
> [[6]] rs6518357
>
> they get mapped.
>
> The CharacterList object returned by loc2rsid() can be set as a
> metadata column of the input GRanges object:
>
> > mcols(mylocs)$rsid <- loc2rsid(mylocs)
> > mylocs
> GRanges with 6 ranges and 1 metadata column:
> seqnames ranges strand | rsid
> <Rle> <IRanges> <Rle> | <CharacterList>
> [1] ch22 [ 200, 200] * |
> [2] ch22 [16369861, 16369861] * | rs75258394,rs78180314
> [3] chMT [ 146, 146] * | rs72619361
> [4] chMT [ 73, 73] * | rs3087742
> [5] ch21 [ 9411609, 9411609] * | rs76676778
> [6] ch22 [16051107, 16051107] * | rs6518357
> ---
> seqlengths:
> ch21 ch22 chMT
> NA NA NA
>
> Hope this helps,
> H.
>
>
> ## Maps the genomic locations in 'locs' to the corresponding rs ids.
> ## 'locs' must be a GRanges object where all ranges are of with 1.
> ## Because dbSNP can assign distinct ids to SNPs located at the same
> ## position, the mapping is a 1-to-many relationship.
> ## Returns a CharacterList of the same length as 'locs'.
> loc2rsid <- function(locs)
> {
> if (!is(locs, "GRanges"))
> stop("'locs' must be a GRanges object")
> if (!all(width(locs) == 1L))
> stop("all ranges in 'locs' must be of width 1")
> common_seqlevels <- intersect(seqlevels(locs), names(getSNPcount()))
> if (length(common_seqlevels) == 0L)
> stop("chromosome names (a.k.a. seqlevels) in 'locs' don't seem to ",
> "be\n compatible with the chromosome names in the SNPlocs ",
> "package. Maybe they\n use a different naming convention? ",
> "If that's the case then you first need\n to rename the ",
> "seqlevels in 'locs'. See '?seqlevels' for how to do this.")
> f <- as.factor(seqnames(locs))
> locs_by_chrom <- split(start(locs), f)
> rsids_by_chrom <- lapply(seq_along(locs_by_chrom),
> function(i)
> {
> seqname <- levels(f)[i]
> locs2 <- locs_by_chrom[[i]]
> ans2 <- vector("list", length=length(locs2))
> if (length(locs2) == 0L || !(seqname %in% common_seqlevels))
> return(ans2)
> snplocs <- getSNPlocs(seqname, as.GRanges=FALSE)
> hits <- findMatches(locs2, snplocs$loc)
> rsids <- paste0("rs", snplocs$RefSNP_id[subjectHits(hits)])
> q_hits <- queryHits(hits)
> tmp <- split(rsids, q_hits)
> ans2[as.integer(names(tmp))] <- tmp
> ans2
> })
> CharacterList(unsplit(rsids_by_chrom, f))
> }
>
> PS: This functionality will make it to the SNPlocs packages (in this
> form or another).
>
>
>
> On 05/30/2013 09:54 PM, Fabrice Tourre wrote:
>>
>> Dear list,
>>
>> I have a list of genome positions. Is there is a package to get the
>> snp name at this position?
>>
>> It is the reverse function as rsid2loc in SNPlocs.Hsapiens.dbSNP.20120608
>>
>> Thank you.
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org
> Phone: (206) 667-5791
> Fax: (206) 667-1319
More information about the Bioconductor
mailing list