[BioC] xmapcore

Tim Yates tyates at picr.man.ac.uk
Thu Mar 17 09:18:41 CET 2011


I realised last night on my way home that there was a slight issue with this
code for the reverse strand transcripts.

This version should fix that.  (I have updated the code on the github gist
link as well -- I'm just posting it here in case someone finds this email in
a future search)

transcript.to.translatedprobes = function( transcript.ids ) {
  # Get all the exons for this transcript
  exons = transcript.to.exon( transcript.ids, as.vector=F )
  sapply( split( exons[['stable_id']], exons[['IN1']] ), function( exons ) {
    exons = exon.details( exons )
    fwd.strand = all( exons$strand == 1 )
    translated.width = sum( width( exons ) )
    # Go along each exon
    probes = sapply( seq_along( exons$stable_id ), function( eidx ) {
      # Get the details for this exon
      exon = exons[ eidx, ]

      # Find the probes which hit the same region as the exon
      probes = probes.in.range( as.character( space( exon ) ),
                                start( exon ),
                                end( exon ),
                                exon$strand,
                                as.vector=F )

      if( !is.null( probes ) ) {
        # Remove those that hang off the ends of the exon
        probes = probes[ start( probes ) >= start( exon ), ]
        probes = probes[ end( probes ) <= end( exon ), ]
      }

      # Return the probes for this exon
      probes
    } )

    # Then, change these offsets so that they are relative to the start of
the
    # transcript
    exon.offset = 0
    for( idx in seq_along( exons$stable_id ) ) {
      # For the reverse strand, go through the list backwards
      eidx = if( fwd.strand ) idx else length( exons$stable_id ) - idx + 1
      if( !is.null( probes[[ eidx ]] ) ) {
        ranges( probes[[ eidx ]] ) = shift( ranges( probes[[ eidx ]] ),
                                           -start( exons[ eidx, ] ) +
exon.offset )
        # If on the reverse strand, change offsets to 5' from 3'
        if( !fwd.strand ) {
          st = start( probes[[ eidx ]] )
          start( probes[[ eidx ]] ) = translated.width - end( probes[[ eidx
]] )
          end( probes[[ eidx ]] )   = translated.width - st
        }
      }
      exon.offset = exon.offset + width( exons[ eidx, ] )
    }

    # Remove all NULLs and combine
    probes = do.call( 'rbind', probes[ !sapply( probes, is.null ) ] )
  } )
}



On 16/03/2011 15:43, "tim" <tyates at picr.man.ac.uk> wrote:

> Hi again Brian,
> 
> Sorry for the delay in replying, I wanted to make sure I got this code right
> (I think it's right now) ;-) -- Just in case the email gets the code
> garbled, I've also pasted it here:
> https://gist.github.com/89d35075b969af50a3f9
> 
> transcript.to.translatedprobes = function( transcript.ids ) {
>   # Get all the exons for this transcript
>   exons = transcript.to.exon( transcript.ids, as.vector=F )
>   sapply( split( exons[['stable_id']], exons[['IN1']] ), function( exons ) {
>     exons = exon.details( exons )
>     fwd.strand = all( exons$strand == 1 )
>     # Go along each exon
>     probes = sapply( seq_along( exons$stable_id ), function( eidx ) {
>       # Get the details for this exon
>       exon = exons[ eidx, ]
> 
>       # Find the probes which hit the same region as the exon
>       probes = probes.in.range( as.character( space( exon ) ),
>                                 start( exon ),
>                                 end( exon ),
>                                 exon$strand,
>                                 as.vector=F )
> 
>       if( !is.null( probes ) ) {
>         # Remove those that hang off the ends of the exon
>         probes = probes[ start( probes ) >= start( exon ), ]
>         probes = probes[ end( probes ) <= end( exon ), ]
>       }
> 
>       # Return the probes for this exon
>       probes
>     } )
> 
>     # Then, change these offsets so that they are relative to the start of
> the
>     # transcript
>     exon.offset = 0
>     for( idx in seq_along( exons$stable_id ) ) {
>       # For the reverse strand, go through the list backwards
>       eidx = if( fwd.strand ) idx else length( exons$stable_id ) - idx + 1
>       if( !is.null( probes[[ eidx ]] ) ) {
>         ranges( probes[[ eidx ]] ) = shift( ranges( probes[[ eidx ]] ),
>                                            -start( exons[ eidx, ] ) +
> exon.offset )
>       }
>       exon.offset = exon.offset + width( exons[ eidx, ] )
>     }
> 
>     # Remove all NULLs and combine
>     probes = do.call( 'rbind', probes[ !sapply( probes, is.null ) ] )
>   } )
> }
> 
> If you call this function, eg:
> 
> transcript.to.translatedprobes( c( 'ENST00000371953', 'ENST00000462694',
> 'ENST00000329958' ) )
> 
> It will return you a list with an entry per transcript, the value being the
> probes and their distance from the 5' end of the transcript
> 
> I hope this is what you wanted.
> 
> Tim
> 
> On 15/03/2011 15:16, "ekspiulo" <ekspiulo at gmail.com> wrote:
> 
>> 
>> Hi Tim,
>> 
>> Thanks for your insight.  I want to clarify my understanding of the code
>> you've provided here.  That second line will subtract from the start and
>> end values of each probe the value of the start coordinate of the
>> transcript itself?  I think I was probably pretty unclear, I'm looking
>> for the transcribed transcripts' coordinate space, a means to calculate
>> the mRNA relative positions of the probes for the transcript which is
>> shot out in to the cell.  Any thoughts on that?
>> 
>> 
>> Thanks for your help!
>> 
>> -Brian
>> 
>> 
>> On 03/15/2011 05:55 AM, Tim Yates wrote:
>>> Hiya!
>>> 
>>> It depends what sort of output format you want for your data.
>>> 
>>> For example, say you are interested in 'ENST00000413465' (a transcript
>>> of TP53)
>>> 
>>> http://bit.ly/e9rSMg
>>> 
>>> You could do the following:
>>> 
>>>> probe.rd = xmap.range.apply( transcript.details( 'ENST00000413465' ),
>>> function( ch, s, e, st ) probes.in.range( ch, s, e, st, as.vector=F ) )
>>>> ranges( probe.rd ) = shift( ranges( probe.rd ), -start(
>>> transcript.details( 'ENST00000413465' ) ) )
>>>> probe.rd
>>> 
>>> RangedData with 297 rows and 3 value columns across 1 space
>>>           space         ranges   |                  sequence
>>> probe_hit_count    strand
>>> <character> <IRanges>   | <character> <numeric> <integer>
>>> 1            17     [215, 239]   | ATTAGCTGGGCATGGTGGCACGTGC
>>>               2        -1
>>> 2            17     [216, 240]   | AATTAGCTGGGCATGGTGGCACGTG
>>>               2        -1
>>> 3            17     [556, 580]   | CGTAAGCCAAGATCCAAAGAAAATG
>>>               1        -1
>>> 4            17     [597, 621]   | CAGAGCCTGTATGCAGAAGACCACA
>>>               1        -1
>>> 5            17     [758, 782]   | CCCAGGAGGCAGAGATTGCAGTGAG
>>>               2        -1
>>> 6            17     [762, 786]   | TGAACCCAGGAGGCAGAGATTGCAG
>>>               2        -1
>>> 7            17     [769, 793]   | AATCACTTGAACCCAGGAGGCAGAG
>>>               2        -1
>>> 8            17     [825, 849]   | CCAGCTATGGTAGTGTAAGCCTGTA
>>>               1        -1
>>> 9            17     [956, 980]   | GGGCATGGTGGCTCACACCTGTAAT
>>>               2        -1
>>> ...         ...            ... ...                       ...
>>>             ...       ...
>>> 289          17 [23912, 23936]   | TGCCTACCAAGTCACAGACCCTTTT
>>>               1        -1
>>> 290          17 [24128, 24152]   | AGGAGGCGGAACTCGAATTCATTTC
>>>               1        -1
>>> 291          17 [24401, 24425]   | GCAAGCCCGGAGGTATTTTCAAGAA
>>>               1        -1
>>> 292          17 [24624, 24648]   | AGGCCGGTTCCTCTTACTTGGCAGA
>>>               1        -1
>>> 293          17 [24681, 24705]   | GGAGCAGCTCACTATTCACCCGATG
>>>               1        -1
>>> 294          17 [24871, 24895]   | CCGGCTCCGCTAGATGGAGAAAATC
>>>               1        -1
>>> 295          17 [24985, 25009]   | ACTGGGGCTCCATTCCGAAATGATC
>>>               1        -1
>>> 296          17 [25150, 25174]   | CTTGAGCCGGCCTAAAGCGTACTTC
>>>               1        -1
>>> 297          17 [25372, 25396]   | GGGCTCTCGGCTCCGTGTATTTTCA
>>>               1        -1
>>> 
>>> So, that is all 297 probes that hit inside the range of the
>>> transcript, with the start and end of each probe relative to the start
>>> of the transcript
>>> 
>>> Tim
>>> 
>>> 
>>> On 14/03/2011 18:17, "ekspiulo" <ekspiulo at gmail.com> wrote:
>>> 
>>>> Hello all,
>>>> 
>>>> I'm endeavouring to produce transcript relative coordinates for the
>>>> probe sets on an affymetrix exon st array.  I've normalized the data at
>>>> the probe set level using the oligo package, and I want to use the xmap
>>>> projects annotations, but for the life of me I can not figure out how to
>>>> query its data in a fashion that can produce either a single coordinate
>>>> range or even the set of all possible coordinate ranges for a probe set,
>>>> can anyone tell me where to start with this?
>>>> 
>>>> Alternatively, can I get probe level data from the normalized results
>>>> from Oligo, and does it even make sense for an exon array?  I'm doing
>>>> some custom alternative splicing analysis, so ideally I'd be able to get
>>>> the data needed to produce a list of all of the probes/probesets which
>>>> target each transcript in xmap along with either the genomic coordinates
>>>> for those probes/probesets or their transcript relative coordinates.
>>>> 
>>>> e.g.
>>>> 
>>>> transcript-id: [ [probe-id, start, end], [probe-id, start, end], ... ]
>>>> 
>>>> 
>>>> Thanks!
>>>> 
>>>> -Brian
>>>> 
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> ------------------------------------------------------------------------
>>> This email is confidential and intended solely for the...{{dropped:16}}
>> 
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
> --------------------------------------------------------
> This email is confidential and intended solely for the...{{dropped:22}}



More information about the Bioconductor mailing list