[BioC] xmapcore

Wed Mar 16 16:43:38 CET 2011

Hi again Brian,

Sorry for the delay in replying, I wanted to make sure I got this code right
(I think it's right now) ;-) -- Just in case the email gets the code
garbled, I've also pasted it here:
https://gist.github.com/89d35075b969af50a3f9

transcript.to.translatedprobes = function( transcript.ids ) {
  # Get all the exons for this transcript
  exons = transcript.to.exon( transcript.ids, as.vector=F )
  sapply( split( exons[['stable_id']], exons[['IN1']] ), function( exons ) {
    exons = exon.details( exons )
    fwd.strand = all( exons$strand == 1 )
    # Go along each exon
    probes = sapply( seq_along( exons$stable_id ), function( eidx ) {
      # Get the details for this exon
      exon = exons[ eidx, ]

      # Find the probes which hit the same region as the exon
      probes = probes.in.range( as.character( space( exon ) ),
                                start( exon ),
                                end( exon ),
                                exon$strand,
                                as.vector=F )

      if( !is.null( probes ) ) {
        # Remove those that hang off the ends of the exon
        probes = probes[ start( probes ) >= start( exon ), ]
        probes = probes[ end( probes ) <= end( exon ), ]
      }

      # Return the probes for this exon
      probes
    } )

    # Then, change these offsets so that they are relative to the start of
the
    # transcript
    exon.offset = 0
    for( idx in seq_along( exons$stable_id ) ) {
      # For the reverse strand, go through the list backwards
      eidx = if( fwd.strand ) idx else length( exons$stable_id ) - idx + 1
      if( !is.null( probes[[ eidx ]] ) ) {
        ranges( probes[[ eidx ]] ) = shift( ranges( probes[[ eidx ]] ),
                                           -start( exons[ eidx, ] ) +
exon.offset )
      }
      exon.offset = exon.offset + width( exons[ eidx, ] )
    }

    # Remove all NULLs and combine
    probes = do.call( 'rbind', probes[ !sapply( probes, is.null ) ] )
  } )
}

If you call this function, eg:

transcript.to.translatedprobes( c( 'ENST00000371953', 'ENST00000462694',
'ENST00000329958' ) )

It will return you a list with an entry per transcript, the value being the
probes and their distance from the 5' end of the transcript

I hope this is what you wanted.

Tim

On 15/03/2011 15:16, "ekspiulo" <ekspiulo at gmail.com> wrote:

> 
> Hi Tim,
> 
> Thanks for your insight.  I want to clarify my understanding of the code
> you've provided here.  That second line will subtract from the start and
> end values of each probe the value of the start coordinate of the
> transcript itself?  I think I was probably pretty unclear, I'm looking
> for the transcribed transcripts' coordinate space, a means to calculate
> the mRNA relative positions of the probes for the transcript which is
> shot out in to the cell.  Any thoughts on that?
> 
> 
> Thanks for your help!
> 
> -Brian
> 
> 
> On 03/15/2011 05:55 AM, Tim Yates wrote:
>> Hiya!
>> 
>> It depends what sort of output format you want for your data.
>> 
>> For example, say you are interested in 'ENST00000413465' (a transcript
>> of TP53)
>> 
>> http://bit.ly/e9rSMg
>> 
>> You could do the following:
>> 
>>> probe.rd = xmap.range.apply( transcript.details( 'ENST00000413465' ),
>> function( ch, s, e, st ) probes.in.range( ch, s, e, st, as.vector=F ) )
>>> ranges( probe.rd ) = shift( ranges( probe.rd ), -start(
>> transcript.details( 'ENST00000413465' ) ) )
>>> probe.rd
>> 
>> RangedData with 297 rows and 3 value columns across 1 space
>>           space         ranges   |                  sequence
>> probe_hit_count    strand
>> <character> <IRanges>   | <character> <numeric> <integer>
>> 1            17     [215, 239]   | ATTAGCTGGGCATGGTGGCACGTGC
>>               2        -1
>> 2            17     [216, 240]   | AATTAGCTGGGCATGGTGGCACGTG
>>               2        -1
>> 3            17     [556, 580]   | CGTAAGCCAAGATCCAAAGAAAATG
>>               1        -1
>> 4            17     [597, 621]   | CAGAGCCTGTATGCAGAAGACCACA
>>               1        -1
>> 5            17     [758, 782]   | CCCAGGAGGCAGAGATTGCAGTGAG
>>               2        -1
>> 6            17     [762, 786]   | TGAACCCAGGAGGCAGAGATTGCAG
>>               2        -1
>> 7            17     [769, 793]   | AATCACTTGAACCCAGGAGGCAGAG
>>               2        -1
>> 8            17     [825, 849]   | CCAGCTATGGTAGTGTAAGCCTGTA
>>               1        -1
>> 9            17     [956, 980]   | GGGCATGGTGGCTCACACCTGTAAT
>>               2        -1
>> ...         ...            ... ...                       ...
>>             ...       ...
>> 289          17 [23912, 23936]   | TGCCTACCAAGTCACAGACCCTTTT
>>               1        -1
>> 290          17 [24128, 24152]   | AGGAGGCGGAACTCGAATTCATTTC
>>               1        -1
>> 291          17 [24401, 24425]   | GCAAGCCCGGAGGTATTTTCAAGAA
>>               1        -1
>> 292          17 [24624, 24648]   | AGGCCGGTTCCTCTTACTTGGCAGA
>>               1        -1
>> 293          17 [24681, 24705]   | GGAGCAGCTCACTATTCACCCGATG
>>               1        -1
>> 294          17 [24871, 24895]   | CCGGCTCCGCTAGATGGAGAAAATC
>>               1        -1
>> 295          17 [24985, 25009]   | ACTGGGGCTCCATTCCGAAATGATC
>>               1        -1
>> 296          17 [25150, 25174]   | CTTGAGCCGGCCTAAAGCGTACTTC
>>               1        -1
>> 297          17 [25372, 25396]   | GGGCTCTCGGCTCCGTGTATTTTCA
>>               1        -1
>> 
>> So, that is all 297 probes that hit inside the range of the
>> transcript, with the start and end of each probe relative to the start
>> of the transcript
>> 
>> Tim
>> 
>> 
>> On 14/03/2011 18:17, "ekspiulo" <ekspiulo at gmail.com> wrote:
>> 
>>> Hello all,
>>> 
>>> I'm endeavouring to produce transcript relative coordinates for the
>>> probe sets on an affymetrix exon st array.  I've normalized the data at
>>> the probe set level using the oligo package, and I want to use the xmap
>>> projects annotations, but for the life of me I can not figure out how to
>>> query its data in a fashion that can produce either a single coordinate
>>> range or even the set of all possible coordinate ranges for a probe set,
>>> can anyone tell me where to start with this?
>>> 
>>> Alternatively, can I get probe level data from the normalized results
>>> from Oligo, and does it even make sense for an exon array?  I'm doing
>>> some custom alternative splicing analysis, so ideally I'd be able to get
>>> the data needed to produce a list of all of the probes/probesets which
>>> target each transcript in xmap along with either the genomic coordinates
>>> for those probes/probesets or their transcript relative coordinates.
>>> 
>>> e.g.
>>> 
>>> transcript-id: [ [probe-id, start, end], [probe-id, start, end], ... ]
>>> 
>>> 
>>> Thanks!
>>> 
>>> -Brian
>>> 
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> ------------------------------------------------------------------------
>> This email is confidential and intended solely for the...{{dropped:16}}
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor 
--------------------------------------------------------
This email is confidential and intended solely for the u...{{dropped:12}}