[BioC] xmapcore get intronic sequence

Thu Jul 29 10:09:58 CEST 2010

Hi Tim,

> There are a couple of options:
>
> 1) Some of your probesets don't hit exons
> 2) Some of your probesets hit the same exons
>
> Not much can be done if it's the first case, but you can detect the second
> by passing as.vector=F to the probeset.to.exon method, ie:
>
>> probesetIds = c( '3081222', '3081223' )
>> probeset.to.exon( probesetIds )
> [1] "ENSE00001149618"
>
>> probeset.to.exon( probesetIds, as.vector=F )
> RangedData with 2 rows and 6 value columns across 1 space
>          space                 ranges |         IN1       stable_id    strand
>    <character>               <IRanges>  |<character>      <character>  <integer>
> 1           7 [155592680, 155596420] |     3081222 ENSE00001149618        -1
> 2           7 [155592680, 155596420] |     3081223 ENSE00001149618        -1
>
> You can see the IN1 column is the probeset name that caused the result, and
> the stable_id column shows that both probesets hit the same exon
>
> Fingers crossed this gets to the bottom of it ;-)

Some of these are obviously not hitting exons despite having valid probesetids (I just double checked they are real probesets by going to the Netaffx site).  So:

> dim(as.data.frame(probeset.to.exon( probesetids, rm.unreliable=F, as.vector=F)))
[1] 1666   10

My input list of probesetids was 1771. So there are still some missing :-(.

It seems the only way round this for me is to write a loop and test if probeset.to.exon returns anything. How about a rm.notfound=T or F parameter at some point in the future (he asked hopefully! :-)).

Thanks,

Steve

>
> Cheers,
>
> Tim
>
> On 28/07/2010 15:56, "Stephen Taylor"<stephen.taylor at imm.ox.ac.uk>  wrote:
>
>> Hi Tim,
>>
>>> probeset.to.exon( probesetids, rm.unreliable=F )
>>
>> Unfortunately this is still not the same size:
>>
>>> length(probeset.to.exon( probesetids, rm.unreliable=F ))
>> [1] 1274
>>> length(probesetids)
>> [1] 1771
>>
>> Thanks,
>>
>> Steve
>