[BioC] xmapcore
Tim Yates
tyates at picr.man.ac.uk
Wed Mar 16 16:43:38 CET 2011
Hi again Brian,
Sorry for the delay in replying, I wanted to make sure I got this code right
(I think it's right now) ;-) -- Just in case the email gets the code
garbled, I've also pasted it here:
https://gist.github.com/89d35075b969af50a3f9
transcript.to.translatedprobes = function( transcript.ids ) {
# Get all the exons for this transcript
exons = transcript.to.exon( transcript.ids, as.vector=F )
sapply( split( exons[['stable_id']], exons[['IN1']] ), function( exons ) {
exons = exon.details( exons )
fwd.strand = all( exons$strand == 1 )
# Go along each exon
probes = sapply( seq_along( exons$stable_id ), function( eidx ) {
# Get the details for this exon
exon = exons[ eidx, ]
# Find the probes which hit the same region as the exon
probes = probes.in.range( as.character( space( exon ) ),
start( exon ),
end( exon ),
exon$strand,
as.vector=F )
if( !is.null( probes ) ) {
# Remove those that hang off the ends of the exon
probes = probes[ start( probes ) >= start( exon ), ]
probes = probes[ end( probes ) <= end( exon ), ]
}
# Return the probes for this exon
probes
} )
# Then, change these offsets so that they are relative to the start of
the
# transcript
exon.offset = 0
for( idx in seq_along( exons$stable_id ) ) {
# For the reverse strand, go through the list backwards
eidx = if( fwd.strand ) idx else length( exons$stable_id ) - idx + 1
if( !is.null( probes[[ eidx ]] ) ) {
ranges( probes[[ eidx ]] ) = shift( ranges( probes[[ eidx ]] ),
-start( exons[ eidx, ] ) +
exon.offset )
}
exon.offset = exon.offset + width( exons[ eidx, ] )
}
# Remove all NULLs and combine
probes = do.call( 'rbind', probes[ !sapply( probes, is.null ) ] )
} )
}
If you call this function, eg:
transcript.to.translatedprobes( c( 'ENST00000371953', 'ENST00000462694',
'ENST00000329958' ) )
It will return you a list with an entry per transcript, the value being the
probes and their distance from the 5' end of the transcript
I hope this is what you wanted.
Tim
On 15/03/2011 15:16, "ekspiulo" <ekspiulo at gmail.com> wrote:
>
> Hi Tim,
>
> Thanks for your insight. I want to clarify my understanding of the code
> you've provided here. That second line will subtract from the start and
> end values of each probe the value of the start coordinate of the
> transcript itself? I think I was probably pretty unclear, I'm looking
> for the transcribed transcripts' coordinate space, a means to calculate
> the mRNA relative positions of the probes for the transcript which is
> shot out in to the cell. Any thoughts on that?
>
>
> Thanks for your help!
>
> -Brian
>
>
> On 03/15/2011 05:55 AM, Tim Yates wrote:
>> Hiya!
>>
>> It depends what sort of output format you want for your data.
>>
>> For example, say you are interested in 'ENST00000413465' (a transcript
>> of TP53)
>>
>> http://bit.ly/e9rSMg
>>
>> You could do the following:
>>
>>> probe.rd = xmap.range.apply( transcript.details( 'ENST00000413465' ),
>> function( ch, s, e, st ) probes.in.range( ch, s, e, st, as.vector=F ) )
>>> ranges( probe.rd ) = shift( ranges( probe.rd ), -start(
>> transcript.details( 'ENST00000413465' ) ) )
>>> probe.rd
>>
>> RangedData with 297 rows and 3 value columns across 1 space
>> space ranges | sequence
>> probe_hit_count strand
>> <character> <IRanges> | <character> <numeric> <integer>
>> 1 17 [215, 239] | ATTAGCTGGGCATGGTGGCACGTGC
>> 2 -1
>> 2 17 [216, 240] | AATTAGCTGGGCATGGTGGCACGTG
>> 2 -1
>> 3 17 [556, 580] | CGTAAGCCAAGATCCAAAGAAAATG
>> 1 -1
>> 4 17 [597, 621] | CAGAGCCTGTATGCAGAAGACCACA
>> 1 -1
>> 5 17 [758, 782] | CCCAGGAGGCAGAGATTGCAGTGAG
>> 2 -1
>> 6 17 [762, 786] | TGAACCCAGGAGGCAGAGATTGCAG
>> 2 -1
>> 7 17 [769, 793] | AATCACTTGAACCCAGGAGGCAGAG
>> 2 -1
>> 8 17 [825, 849] | CCAGCTATGGTAGTGTAAGCCTGTA
>> 1 -1
>> 9 17 [956, 980] | GGGCATGGTGGCTCACACCTGTAAT
>> 2 -1
>> ... ... ... ... ...
>> ... ...
>> 289 17 [23912, 23936] | TGCCTACCAAGTCACAGACCCTTTT
>> 1 -1
>> 290 17 [24128, 24152] | AGGAGGCGGAACTCGAATTCATTTC
>> 1 -1
>> 291 17 [24401, 24425] | GCAAGCCCGGAGGTATTTTCAAGAA
>> 1 -1
>> 292 17 [24624, 24648] | AGGCCGGTTCCTCTTACTTGGCAGA
>> 1 -1
>> 293 17 [24681, 24705] | GGAGCAGCTCACTATTCACCCGATG
>> 1 -1
>> 294 17 [24871, 24895] | CCGGCTCCGCTAGATGGAGAAAATC
>> 1 -1
>> 295 17 [24985, 25009] | ACTGGGGCTCCATTCCGAAATGATC
>> 1 -1
>> 296 17 [25150, 25174] | CTTGAGCCGGCCTAAAGCGTACTTC
>> 1 -1
>> 297 17 [25372, 25396] | GGGCTCTCGGCTCCGTGTATTTTCA
>> 1 -1
>>
>> So, that is all 297 probes that hit inside the range of the
>> transcript, with the start and end of each probe relative to the start
>> of the transcript
>>
>> Tim
>>
>>
>> On 14/03/2011 18:17, "ekspiulo" <ekspiulo at gmail.com> wrote:
>>
>>> Hello all,
>>>
>>> I'm endeavouring to produce transcript relative coordinates for the
>>> probe sets on an affymetrix exon st array. I've normalized the data at
>>> the probe set level using the oligo package, and I want to use the xmap
>>> projects annotations, but for the life of me I can not figure out how to
>>> query its data in a fashion that can produce either a single coordinate
>>> range or even the set of all possible coordinate ranges for a probe set,
>>> can anyone tell me where to start with this?
>>>
>>> Alternatively, can I get probe level data from the normalized results
>>> from Oligo, and does it even make sense for an exon array? I'm doing
>>> some custom alternative splicing analysis, so ideally I'd be able to get
>>> the data needed to produce a list of all of the probes/probesets which
>>> target each transcript in xmap along with either the genomic coordinates
>>> for those probes/probesets or their transcript relative coordinates.
>>>
>>> e.g.
>>>
>>> transcript-id: [ [probe-id, start, end], [probe-id, start, end], ... ]
>>>
>>>
>>> Thanks!
>>>
>>> -Brian
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> ------------------------------------------------------------------------
>> This email is confidential and intended solely for the...{{dropped:16}}
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
--------------------------------------------------------
This email is confidential and intended solely for the u...{{dropped:12}}
More information about the Bioconductor
mailing list