[BioC] selecting/filtering probesets from exprSet object prior to diff. exp. anal.

James W. MacDonald jmacdon at med.umich.edu
Wed Nov 23 20:01:46 CET 2011


Hi Mark,

On 11/23/2011 1:00 PM, Mark Baumeister wrote:
> Hi all,
>
> I am new to this list and have a question (below) related to -
> selecting/filtering probesets from exprSet object prior to diff. exp. anal.
>
> I'm also new to Bioconductor and am currently learning preprocessing of
> microarray data (i.e. raw CEL files from the Affymetrix UG-133A array) and
> then working
> with the normlized exprSet object to detect differential gene expression of
> tumor
> (ovarian) samples compared with normal samples.  I am currently working
> with a set
> of ~33 tumor samples and ~7 normal samples.
>
> Because my machine is 32 bit and cannot handle that much memmory
> allocation,
> for the preprocessing I am using a program called RMAExpress to produce the
> normalized exprSet object.  With the exprSet object (I am calling "eset") I
> am then using
> Bioconductor for the differential gene expression analysis.
>
> To start I have been creating a desgin matrix (as below)
> (which I name "design") for linear modeling steps I am using
> that come with the limma package.
>
>   Normal Tumor
> T1   0  1
> T2   0  1
> T3   0  1
> T5   0  1
> T7   0  1
> N1  1  0
> T8   0  1
> T9   0  1
> T10 0  1
> T11 0  1
> N2  1  0
> T12 0  1
> T13 0  1
> T14 0  1
> T15 0  1
> N3  1  0
>
>
>
> and then I am using the following code to produce a linear model, a
> contrast matrix,
> and a list of differentially expressed genes.
>
>
> fit<- lmFit(eset, design)
> cont.matrix<- makeContrasts(NormalvsTumor=Tumor-Normal, levels=design)
> fit2<- contrasts.fit(fit, cont.matrix)
> fit2<- eBayes(fit2)
> topTable(fit2, number=100, adjust="BH") # use BH method
>
> My question is this,
> Is there a way to select or exclude ceratin probesets that I want or don't
> want to be included in the
> linear model before I produce the list (topTable) of differentially
> expressed genes?

There are ways to do this, but note that the eBayes() step above is 
estimating a prior for the probeset variance that uses all probesets on 
the array. If you selectively remove some probesets (say, all the 
low-variance probesets), you will be biasing the prior, which may have 
unintended effects.

That said, both ExpressionSets and MArrayLM objects (the output from 
eBayes()) can be subset using the conventional square-bracket functions 
in R. So for example, you could remove the first ten probesets from your 
fit2 object thusly:

fit2 <- fit2[-c(1:10),]

or you could create an indicator of TRUE/FALSE, based on some metric

ind <- fit2$p.value < 0.25

fit2 <- fit2[ind,]

The same thing can be done to the ExpressionSet object as well.

Best,

Jim


>
> I have looked at the genefilter function but have not found specific
> examples of how to do what I want.
>
>
> Thanks in advance,
> -M
>
>
>
>

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826

**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 



More information about the Bioconductor mailing list