[BioC] From location to get the SNP rs ID
Hervé Pagès
hpages at fhcrc.org
Fri May 31 13:01:54 CEST 2013
On 05/31/2013 03:31 AM, Fabrice Tourre wrote:
> Hervé Pagès,
>
> It helps. Thank you very much.
>
> But loc2rsid seems does not in version SNPlocs.Hsapiens.dbSNP.2010042
> and SNPlocs.Hsapiens.dbSNP.20120608.
Nope. That's why I sent you the code ;-)
H.
>
> On Fri, May 31, 2013 at 3:42 AM, Hervé Pagès <hpages at fhcrc.org> wrote:
>> Hi Fabrice,
>>
>> With the loc2rsid() function (see code at the bottom) and using
>> SNPlocs.Hsapiens.dbSNP.20100427 (I don't have
>> SNPlocs.Hsapiens.dbSNP.20120608 installed at the moment):
>>
>>
>> mylocs <- GRanges(Rle(c("ch22", "chM", "ch21", "ch22"), c(2, 2, 1, 1)),
>> IRanges(c(200, 16369861, 146, 73, 9411609, 16051107),
>> width=1))
>>
>> > mylocs
>> GRanges with 6 ranges and 0 metadata columns:
>> seqnames ranges strand
>> <Rle> <IRanges> <Rle>
>> [1] ch22 [ 200, 200] *
>> [2] ch22 [16369861, 16369861] *
>> [3] chM [ 146, 146] *
>> [4] chM [ 73, 73] *
>> [5] ch21 [ 9411609, 9411609] *
>> [6] ch22 [16051107, 16051107] *
>> ---
>> seqlengths:
>> ch21 ch22 chM
>> NA NA NA
>>
>> Then:
>>
>> library(SNPlocs.Hsapiens.dbSNP.20100427)
>>
>> > loc2rsid(mylocs)
>> CharacterList of length 6
>> [[1]] character(0)
>> [[2]] rs75258394 rs78180314
>> [[3]] character(0)
>> [[4]] character(0)
>> [[5]] rs76676778
>> [[6]] rs6518357
>>
>> You need to be careful to use the same chromosome naming convention as
>> the SNPlocs package (which uses dbSNP naming convention). For example
>> in the 'mylocs' object, the mitochondrial chromosome is named chM but
>> in the SNPlocs package it's chMT:
>>
>> > getSNPcount()
>> ch1 ch2 ch3 ch4 ch5 ch6 ch7 ch8 ch9
>> ch10
>> 1369185 1422439 1186807 1191984 1064540 1040203 977275 919207 740516
>> 859433
>> ch11 ch12 ch13 ch14 ch15 ch16 ch17 ch18 ch19
>> ch20
>> 855225 822825 615382 543041 499101 556394 464686 482359 375259
>> 450685
>> ch21 ch22 chX chY chMT
>> 253480 244482 445596 53908 667
>>
>> This is why the locations on chM are not mapped to anything. But
>> after renaming:
>>
>> > seqlevels(mylocs)[3] <- "chMT"
>> > seqlevels(mylocs)
>> [1] "ch21" "ch22" "chMT"
>>
>> > loc2rsid(mylocs)
>> CharacterList of length 6
>> [[1]] character(0)
>> [[2]] rs75258394 rs78180314
>> [[3]] rs72619361
>> [[4]] rs3087742
>> [[5]] rs76676778
>> [[6]] rs6518357
>>
>> they get mapped.
>>
>> The CharacterList object returned by loc2rsid() can be set as a
>> metadata column of the input GRanges object:
>>
>> > mcols(mylocs)$rsid <- loc2rsid(mylocs)
>> > mylocs
>> GRanges with 6 ranges and 1 metadata column:
>> seqnames ranges strand | rsid
>> <Rle> <IRanges> <Rle> | <CharacterList>
>> [1] ch22 [ 200, 200] * |
>> [2] ch22 [16369861, 16369861] * | rs75258394,rs78180314
>> [3] chMT [ 146, 146] * | rs72619361
>> [4] chMT [ 73, 73] * | rs3087742
>> [5] ch21 [ 9411609, 9411609] * | rs76676778
>> [6] ch22 [16051107, 16051107] * | rs6518357
>> ---
>> seqlengths:
>> ch21 ch22 chMT
>> NA NA NA
>>
>> Hope this helps,
>> H.
>>
>>
>> ## Maps the genomic locations in 'locs' to the corresponding rs ids.
>> ## 'locs' must be a GRanges object where all ranges are of with 1.
>> ## Because dbSNP can assign distinct ids to SNPs located at the same
>> ## position, the mapping is a 1-to-many relationship.
>> ## Returns a CharacterList of the same length as 'locs'.
>> loc2rsid <- function(locs)
>> {
>> if (!is(locs, "GRanges"))
>> stop("'locs' must be a GRanges object")
>> if (!all(width(locs) == 1L))
>> stop("all ranges in 'locs' must be of width 1")
>> common_seqlevels <- intersect(seqlevels(locs), names(getSNPcount()))
>> if (length(common_seqlevels) == 0L)
>> stop("chromosome names (a.k.a. seqlevels) in 'locs' don't seem to ",
>> "be\n compatible with the chromosome names in the SNPlocs ",
>> "package. Maybe they\n use a different naming convention? ",
>> "If that's the case then you first need\n to rename the ",
>> "seqlevels in 'locs'. See '?seqlevels' for how to do this.")
>> f <- as.factor(seqnames(locs))
>> locs_by_chrom <- split(start(locs), f)
>> rsids_by_chrom <- lapply(seq_along(locs_by_chrom),
>> function(i)
>> {
>> seqname <- levels(f)[i]
>> locs2 <- locs_by_chrom[[i]]
>> ans2 <- vector("list", length=length(locs2))
>> if (length(locs2) == 0L || !(seqname %in% common_seqlevels))
>> return(ans2)
>> snplocs <- getSNPlocs(seqname, as.GRanges=FALSE)
>> hits <- findMatches(locs2, snplocs$loc)
>> rsids <- paste0("rs", snplocs$RefSNP_id[subjectHits(hits)])
>> q_hits <- queryHits(hits)
>> tmp <- split(rsids, q_hits)
>> ans2[as.integer(names(tmp))] <- tmp
>> ans2
>> })
>> CharacterList(unsplit(rsids_by_chrom, f))
>> }
>>
>> PS: This functionality will make it to the SNPlocs packages (in this
>> form or another).
>>
>>
>>
>> On 05/30/2013 09:54 PM, Fabrice Tourre wrote:
>>>
>>> Dear list,
>>>
>>> I have a list of genome positions. Is there is a package to get the
>>> snp name at this position?
>>>
>>> It is the reverse function as rsid2loc in SNPlocs.Hsapiens.dbSNP.20120608
>>>
>>> Thank you.
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fhcrc.org
>> Phone: (206) 667-5791
>> Fax: (206) 667-1319
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list