[BioC] feature selection according to sequence information

Tue Mar 31 14:35:05 CEST 2009

qinghua xu wrote, On 30.03.2009 10:43:
> Hi everyone,
> 
> As a matter of fact, we are aware that in affymetrix U133plus2.0
> Chip, probesets are targeted to different regions of sequence. Some
> probesets locate at the coding sequence regions, and others locate at
> untranslated regions. My question is that is there any function or
> package existedÂ toÂ distinguish probesetsÂ according toÂ their
> location on the sequences?
> 
> Any suggestions and comments are welcome and highly appreciated!
> Thank you!
> 
> Best wishes Qinghua

Hi Qinghua,

As a starting point, the hgu133plus2probe package should tell you for 
each probe set the locations of its probes on the transcript sequence:

 > library(hgu133plus2probe)
 > as.data.frame(hgu133plus2probe[1:5,])
                    sequence    x   y Probe.Set.Name
1 CACCCAGCTGGTCCTGTGGATGGGA  718 317      1007_s_at
2 GCCCCACTGGACAACACTGATTCCT 1105 483      1007_s_at
3 TGGACCCCACTGGCTGAGAATCTGG  584 901      1007_s_at
4 AAATGTTTCCTTGTGCCTGCTCCTG  192 205      1007_s_at
5 TCCTTGTGCCTGCTCCTGTACTTGT  844 979      1007_s_at
   Probe.Interrogation.Position Target.Strandedness
1                         3330           Antisense
2                         3443           Antisense
3                         3512           Antisense
4                         3563           Antisense
5                         3570           Antisense

You might consider using biomaRt to get the UTR annotation for the 
transcripts from Ensembl and then check which probes fall into the 3' 
UTR and which do not, although this could be a bit tricky.

Hope that helps,
cheers,

Christof

-- 
Christof Winter
Bioinformatics Group
Biotechnologisches Zentrum
Technische Universität Dresden
Tatzberg 47-51
01307 Dresden
Germany