[BioC] xmapcore
Tim Yates
tyates at picr.man.ac.uk
Thu Mar 17 09:18:41 CET 2011
I realised last night on my way home that there was a slight issue with this
code for the reverse strand transcripts.
This version should fix that. (I have updated the code on the github gist
link as well -- I'm just posting it here in case someone finds this email in
a future search)
transcript.to.translatedprobes = function( transcript.ids ) {
# Get all the exons for this transcript
exons = transcript.to.exon( transcript.ids, as.vector=F )
sapply( split( exons[['stable_id']], exons[['IN1']] ), function( exons ) {
exons = exon.details( exons )
fwd.strand = all( exons$strand == 1 )
translated.width = sum( width( exons ) )
# Go along each exon
probes = sapply( seq_along( exons$stable_id ), function( eidx ) {
# Get the details for this exon
exon = exons[ eidx, ]
# Find the probes which hit the same region as the exon
probes = probes.in.range( as.character( space( exon ) ),
start( exon ),
end( exon ),
exon$strand,
as.vector=F )
if( !is.null( probes ) ) {
# Remove those that hang off the ends of the exon
probes = probes[ start( probes ) >= start( exon ), ]
probes = probes[ end( probes ) <= end( exon ), ]
}
# Return the probes for this exon
probes
} )
# Then, change these offsets so that they are relative to the start of
the
# transcript
exon.offset = 0
for( idx in seq_along( exons$stable_id ) ) {
# For the reverse strand, go through the list backwards
eidx = if( fwd.strand ) idx else length( exons$stable_id ) - idx + 1
if( !is.null( probes[[ eidx ]] ) ) {
ranges( probes[[ eidx ]] ) = shift( ranges( probes[[ eidx ]] ),
-start( exons[ eidx, ] ) +
exon.offset )
# If on the reverse strand, change offsets to 5' from 3'
if( !fwd.strand ) {
st = start( probes[[ eidx ]] )
start( probes[[ eidx ]] ) = translated.width - end( probes[[ eidx
]] )
end( probes[[ eidx ]] ) = translated.width - st
}
}
exon.offset = exon.offset + width( exons[ eidx, ] )
}
# Remove all NULLs and combine
probes = do.call( 'rbind', probes[ !sapply( probes, is.null ) ] )
} )
}
On 16/03/2011 15:43, "tim" <tyates at picr.man.ac.uk> wrote:
> Hi again Brian,
>
> Sorry for the delay in replying, I wanted to make sure I got this code right
> (I think it's right now) ;-) -- Just in case the email gets the code
> garbled, I've also pasted it here:
> https://gist.github.com/89d35075b969af50a3f9
>
> transcript.to.translatedprobes = function( transcript.ids ) {
> # Get all the exons for this transcript
> exons = transcript.to.exon( transcript.ids, as.vector=F )
> sapply( split( exons[['stable_id']], exons[['IN1']] ), function( exons ) {
> exons = exon.details( exons )
> fwd.strand = all( exons$strand == 1 )
> # Go along each exon
> probes = sapply( seq_along( exons$stable_id ), function( eidx ) {
> # Get the details for this exon
> exon = exons[ eidx, ]
>
> # Find the probes which hit the same region as the exon
> probes = probes.in.range( as.character( space( exon ) ),
> start( exon ),
> end( exon ),
> exon$strand,
> as.vector=F )
>
> if( !is.null( probes ) ) {
> # Remove those that hang off the ends of the exon
> probes = probes[ start( probes ) >= start( exon ), ]
> probes = probes[ end( probes ) <= end( exon ), ]
> }
>
> # Return the probes for this exon
> probes
> } )
>
> # Then, change these offsets so that they are relative to the start of
> the
> # transcript
> exon.offset = 0
> for( idx in seq_along( exons$stable_id ) ) {
> # For the reverse strand, go through the list backwards
> eidx = if( fwd.strand ) idx else length( exons$stable_id ) - idx + 1
> if( !is.null( probes[[ eidx ]] ) ) {
> ranges( probes[[ eidx ]] ) = shift( ranges( probes[[ eidx ]] ),
> -start( exons[ eidx, ] ) +
> exon.offset )
> }
> exon.offset = exon.offset + width( exons[ eidx, ] )
> }
>
> # Remove all NULLs and combine
> probes = do.call( 'rbind', probes[ !sapply( probes, is.null ) ] )
> } )
> }
>
> If you call this function, eg:
>
> transcript.to.translatedprobes( c( 'ENST00000371953', 'ENST00000462694',
> 'ENST00000329958' ) )
>
> It will return you a list with an entry per transcript, the value being the
> probes and their distance from the 5' end of the transcript
>
> I hope this is what you wanted.
>
> Tim
>
> On 15/03/2011 15:16, "ekspiulo" <ekspiulo at gmail.com> wrote:
>
>>
>> Hi Tim,
>>
>> Thanks for your insight. I want to clarify my understanding of the code
>> you've provided here. That second line will subtract from the start and
>> end values of each probe the value of the start coordinate of the
>> transcript itself? I think I was probably pretty unclear, I'm looking
>> for the transcribed transcripts' coordinate space, a means to calculate
>> the mRNA relative positions of the probes for the transcript which is
>> shot out in to the cell. Any thoughts on that?
>>
>>
>> Thanks for your help!
>>
>> -Brian
>>
>>
>> On 03/15/2011 05:55 AM, Tim Yates wrote:
>>> Hiya!
>>>
>>> It depends what sort of output format you want for your data.
>>>
>>> For example, say you are interested in 'ENST00000413465' (a transcript
>>> of TP53)
>>>
>>> http://bit.ly/e9rSMg
>>>
>>> You could do the following:
>>>
>>>> probe.rd = xmap.range.apply( transcript.details( 'ENST00000413465' ),
>>> function( ch, s, e, st ) probes.in.range( ch, s, e, st, as.vector=F ) )
>>>> ranges( probe.rd ) = shift( ranges( probe.rd ), -start(
>>> transcript.details( 'ENST00000413465' ) ) )
>>>> probe.rd
>>>
>>> RangedData with 297 rows and 3 value columns across 1 space
>>> space ranges | sequence
>>> probe_hit_count strand
>>> <character> <IRanges> | <character> <numeric> <integer>
>>> 1 17 [215, 239] | ATTAGCTGGGCATGGTGGCACGTGC
>>> 2 -1
>>> 2 17 [216, 240] | AATTAGCTGGGCATGGTGGCACGTG
>>> 2 -1
>>> 3 17 [556, 580] | CGTAAGCCAAGATCCAAAGAAAATG
>>> 1 -1
>>> 4 17 [597, 621] | CAGAGCCTGTATGCAGAAGACCACA
>>> 1 -1
>>> 5 17 [758, 782] | CCCAGGAGGCAGAGATTGCAGTGAG
>>> 2 -1
>>> 6 17 [762, 786] | TGAACCCAGGAGGCAGAGATTGCAG
>>> 2 -1
>>> 7 17 [769, 793] | AATCACTTGAACCCAGGAGGCAGAG
>>> 2 -1
>>> 8 17 [825, 849] | CCAGCTATGGTAGTGTAAGCCTGTA
>>> 1 -1
>>> 9 17 [956, 980] | GGGCATGGTGGCTCACACCTGTAAT
>>> 2 -1
>>> ... ... ... ... ...
>>> ... ...
>>> 289 17 [23912, 23936] | TGCCTACCAAGTCACAGACCCTTTT
>>> 1 -1
>>> 290 17 [24128, 24152] | AGGAGGCGGAACTCGAATTCATTTC
>>> 1 -1
>>> 291 17 [24401, 24425] | GCAAGCCCGGAGGTATTTTCAAGAA
>>> 1 -1
>>> 292 17 [24624, 24648] | AGGCCGGTTCCTCTTACTTGGCAGA
>>> 1 -1
>>> 293 17 [24681, 24705] | GGAGCAGCTCACTATTCACCCGATG
>>> 1 -1
>>> 294 17 [24871, 24895] | CCGGCTCCGCTAGATGGAGAAAATC
>>> 1 -1
>>> 295 17 [24985, 25009] | ACTGGGGCTCCATTCCGAAATGATC
>>> 1 -1
>>> 296 17 [25150, 25174] | CTTGAGCCGGCCTAAAGCGTACTTC
>>> 1 -1
>>> 297 17 [25372, 25396] | GGGCTCTCGGCTCCGTGTATTTTCA
>>> 1 -1
>>>
>>> So, that is all 297 probes that hit inside the range of the
>>> transcript, with the start and end of each probe relative to the start
>>> of the transcript
>>>
>>> Tim
>>>
>>>
>>> On 14/03/2011 18:17, "ekspiulo" <ekspiulo at gmail.com> wrote:
>>>
>>>> Hello all,
>>>>
>>>> I'm endeavouring to produce transcript relative coordinates for the
>>>> probe sets on an affymetrix exon st array. I've normalized the data at
>>>> the probe set level using the oligo package, and I want to use the xmap
>>>> projects annotations, but for the life of me I can not figure out how to
>>>> query its data in a fashion that can produce either a single coordinate
>>>> range or even the set of all possible coordinate ranges for a probe set,
>>>> can anyone tell me where to start with this?
>>>>
>>>> Alternatively, can I get probe level data from the normalized results
>>>> from Oligo, and does it even make sense for an exon array? I'm doing
>>>> some custom alternative splicing analysis, so ideally I'd be able to get
>>>> the data needed to produce a list of all of the probes/probesets which
>>>> target each transcript in xmap along with either the genomic coordinates
>>>> for those probes/probesets or their transcript relative coordinates.
>>>>
>>>> e.g.
>>>>
>>>> transcript-id: [ [probe-id, start, end], [probe-id, start, end], ... ]
>>>>
>>>>
>>>> Thanks!
>>>>
>>>> -Brian
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> ------------------------------------------------------------------------
>>> This email is confidential and intended solely for the...{{dropped:16}}
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
> --------------------------------------------------------
> This email is confidential and intended solely for the...{{dropped:22}}
More information about the Bioconductor
mailing list