[BioC] A basic question about retrieve the absolute genome position from the output of of function matchPWM .
Steve Lianoglou
mailinglist.honeypot at gmail.com
Sun Oct 28 02:19:16 CEST 2012
Hi Holly,
On Sat, Oct 27, 2012 at 7:23 PM, Holly <xyang2 at uchicago.edu> wrote:
> Dear all,
> I am learning how to "Finding Candidate Binding Sites for Known
> Transcription Factors via Sequence Matching" from the Bioconductor
> online help file at
> http://www.bioconductor.org/help/workflows/gene-regulation-tfbs/
>
> When I ran the following codes, I can not match the results back to the
> UCSC genome browser results.
>
> ||>|orfs <- as.character(mget("RHOA", org.Hs.egSYMBOL2EG))
> ||||>|| pfm.grhl <- query(MotifDb,"GRHL")[[1]]
>> pcm.grhl <- round(100 * pfm.grhl)
>> chromosomal.loc <- transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, by="gene") [orfs]
>> promoter.seqs <- getPromoterSeq(chromosomal.loc, Hsapiens, upstream=1000, downstream=0)
>> promoter.seqs <- unlist(promoter.seqs)
>> matchPWM(pcm.grhl, promoter.seqs[[1]], "80%")
> Views on a 1000-letter DNAString subject
> subject: AGCCCAGGTCAGGTTAGAGACCACTGGTATTGTT...GGCACTCGGAGGCGCGCACGTCGTTCCCCGCCCT
> views:
> start end width
> [1] 347 354 8 [AAGCCTCA]
>
>> chromosomal.loc
> GRangesList of length 1:
> $387
> GRanges with 2 ranges and 2 metadata columns:
> seqnames ranges strand | tx_id tx_name
> <Rle> <IRanges> <Rle> | <integer> <character>
> [1] chr3 [49396579, 49449428] - | 15517 uc010hku.3
> [2] chr3 [49396579, 49449526] - | 15518 uc003cwu.3
>
> ---
> seqlengths:
> chr1 chr2 ... chrUn_gl000249
> 249250621 243199373 ... 38502
>
> As|||49396579-1000+347=49395926, I use it as an absolute position.|
> However, when I visit the UCSC genome browser, for the region chr3:||||49395926||-||||49395934.||
> I got [GTGCTCCTC], instead of [|||AAGCCTCA] or similar.|
>
> So, could anyone tell me how to calculate the absolute genome position from the output of of function matchPWM ?.
Beware of the evil strand.
This transcript is on the negative strand, so the 1000bp upstream of
the first transcript you've listed starts at its end, namely
`(end+1):(end+999)`
HTH,
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the Bioconductor
mailing list