[BioC] feature selection according to sequence information
Christof Winter
winter at biotec.tu-dresden.de
Tue Mar 31 14:35:05 CEST 2009
qinghua xu wrote, On 30.03.2009 10:43:
> Hi everyone,
>
> As a matter of fact, we are aware that in affymetrix U133plus2.0
> Chip, probesets are targeted to different regions of sequence. Some
> probesets locate at the coding sequence regions, and others locate at
> untranslated regions. My question is that is there any function or
> package existed to distinguish probesets according to their
> location on the sequences?
>
> Any suggestions and comments are welcome and highly appreciated!
> Thank you!
>
> Best wishes Qinghua
Hi Qinghua,
As a starting point, the hgu133plus2probe package should tell you for
each probe set the locations of its probes on the transcript sequence:
> library(hgu133plus2probe)
> as.data.frame(hgu133plus2probe[1:5,])
sequence x y Probe.Set.Name
1 CACCCAGCTGGTCCTGTGGATGGGA 718 317 1007_s_at
2 GCCCCACTGGACAACACTGATTCCT 1105 483 1007_s_at
3 TGGACCCCACTGGCTGAGAATCTGG 584 901 1007_s_at
4 AAATGTTTCCTTGTGCCTGCTCCTG 192 205 1007_s_at
5 TCCTTGTGCCTGCTCCTGTACTTGT 844 979 1007_s_at
Probe.Interrogation.Position Target.Strandedness
1 3330 Antisense
2 3443 Antisense
3 3512 Antisense
4 3563 Antisense
5 3570 Antisense
You might consider using biomaRt to get the UTR annotation for the
transcripts from Ensembl and then check which probes fall into the 3'
UTR and which do not, although this could be a bit tricky.
Hope that helps,
cheers,
Christof
--
Christof Winter
Bioinformatics Group
Biotechnologisches Zentrum
Technische Universität Dresden
Tatzberg 47-51
01307 Dresden
Germany
More information about the Bioconductor
mailing list