[BioC] xmapcore get intronic sequence
Steve Taylor
stephen.taylor at imm.ox.ac.uk
Thu Jul 29 10:09:58 CEST 2010
Hi Tim,
> There are a couple of options:
>
> 1) Some of your probesets don't hit exons
> 2) Some of your probesets hit the same exons
>
> Not much can be done if it's the first case, but you can detect the second
> by passing as.vector=F to the probeset.to.exon method, ie:
>
>> probesetIds = c( '3081222', '3081223' )
>> probeset.to.exon( probesetIds )
> [1] "ENSE00001149618"
>
>> probeset.to.exon( probesetIds, as.vector=F )
> RangedData with 2 rows and 6 value columns across 1 space
> space ranges | IN1 stable_id strand
> <character> <IRanges> |<character> <character> <integer>
> 1 7 [155592680, 155596420] | 3081222 ENSE00001149618 -1
> 2 7 [155592680, 155596420] | 3081223 ENSE00001149618 -1
>
> You can see the IN1 column is the probeset name that caused the result, and
> the stable_id column shows that both probesets hit the same exon
>
> Fingers crossed this gets to the bottom of it ;-)
Some of these are obviously not hitting exons despite having valid probesetids (I just double checked they are real probesets by going to the Netaffx site). So:
> dim(as.data.frame(probeset.to.exon( probesetids, rm.unreliable=F, as.vector=F)))
[1] 1666 10
My input list of probesetids was 1771. So there are still some missing :-(.
It seems the only way round this for me is to write a loop and test if probeset.to.exon returns anything. How about a rm.notfound=T or F parameter at some point in the future (he asked hopefully! :-)).
Thanks,
Steve
>
> Cheers,
>
> Tim
>
> On 28/07/2010 15:56, "Stephen Taylor"<stephen.taylor at imm.ox.ac.uk> wrote:
>
>> Hi Tim,
>>
>>> probeset.to.exon( probesetids, rm.unreliable=F )
>>
>> Unfortunately this is still not the same size:
>>
>>> length(probeset.to.exon( probesetids, rm.unreliable=F ))
>> [1] 1274
>>> length(probesetids)
>> [1] 1771
>>
>> Thanks,
>>
>> Steve
>
More information about the Bioconductor
mailing list