[BioC] selecting/filtering probesets from exprSet object prior to diff. exp. anal.
James W. MacDonald
jmacdon at med.umich.edu
Wed Nov 23 20:45:09 CET 2011
Hi Mark,
On 11/23/2011 2:28 PM, Mark Baumeister wrote:
> Thanks a lot, James for your help.
> That seems pretty straightforward.
> That said, both ExpressionSets and MArrayLM objects (the output from
> eBayes()) can be subset using the conventional square-bracket
> functions in R. So for example, you could remove the first ten
> probesets from your fit2 object thusly:
> fit2 <- fit2[-c(1:10),]
> or you could create an indicator of TRUE/FALSE, based on some metric
> ind <- fit2$p.value < 0.25
> fit2 <- fit2[ind,]
> The same thing can be done to the ExpressionSet object as well."
> If I know the probe ID's for the probes I want to select or exclude
> from the MArrayLM object (i.e. fit2) before producing the topTable() list,
> can I also use probe ID's somehow to select or exclude from the
> MArryaLM object?
Sure. Note that you can extract the probeset IDs from the ExpressionSet
object using the featureNames() extractor, and then you could use either
which() or %in% to create something that you could use to subset.
Say you have a character vector called 'probes' with all the probeset
IDs in it.
ind <- featureNames(eset) %in% probes
fit2[!ind,]
Best,
Jim
> Mark
> On Wed, Nov 23, 2011 at 11:01 AM, James W. MacDonald
> <jmacdon at med.umich.edu <mailto:jmacdon at med.umich.edu>> wrote:
>
> Hi Mark,
>
>
> On 11/23/2011 1:00 PM, Mark Baumeister wrote:
>
> Hi all,
>
> I am new to this list and have a question (below) related to -
> selecting/filtering probesets from exprSet object prior to
> diff. exp. anal.
>
> I'm also new to Bioconductor and am currently learning
> preprocessing of
> microarray data (i.e. raw CEL files from the Affymetrix
> UG-133A array) and
> then working
> with the normlized exprSet object to detect differential gene
> expression of
> tumor
> (ovarian) samples compared with normal samples. I am
> currently working
> with a set
> of ~33 tumor samples and ~7 normal samples.
>
> Because my machine is 32 bit and cannot handle that much memmory
> allocation,
> for the preprocessing I am using a program called RMAExpress
> to produce the
> normalized exprSet object. With the exprSet object (I am
> calling "eset") I
> am then using
> Bioconductor for the differential gene expression analysis.
>
> To start I have been creating a desgin matrix (as below)
> (which I name "design") for linear modeling steps I am using
> that come with the limma package.
>
> Normal Tumor
> T1 0 1
> T2 0 1
> T3 0 1
> T5 0 1
> T7 0 1
> N1 1 0
> T8 0 1
> T9 0 1
> T10 0 1
> T11 0 1
> N2 1 0
> T12 0 1
> T13 0 1
> T14 0 1
> T15 0 1
> N3 1 0
>
>
>
> and then I am using the following code to produce a linear
> model, a
> contrast matrix,
> and a list of differentially expressed genes.
>
>
> fit<- lmFit(eset, design)
> cont.matrix<- makeContrasts(NormalvsTumor=Tumor-Normal,
> levels=design)
> fit2<- contrasts.fit(fit, cont.matrix)
> fit2<- eBayes(fit2)
> topTable(fit2, number=100, adjust="BH") # use BH method
>
> My question is this,
> Is there a way to select or exclude ceratin probesets that I
> want or don't
> want to be included in the
> linear model before I produce the list (topTable) of
> differentially
> expressed genes?
>
>
> There are ways to do this, but note that the eBayes() step above
> is estimating a prior for the probeset variance that uses all
> probesets on the array. If you selectively remove some probesets
> (say, all the low-variance probesets), you will be biasing the
> prior, which may have unintended effects.
>
> That said, both ExpressionSets and MArrayLM objects (the output
> from eBayes()) can be subset using the conventional square-bracket
> functions in R. So for example, you could remove the first ten
> probesets from your fit2 object thusly:
>
> fit2 <- fit2[-c(1:10),]
>
> or you could create an indicator of TRUE/FALSE, based on some metric
>
> ind <- fit2$p.value < 0.25
>
> fit2 <- fit2[ind,]
>
> The same thing can be done to the ExpressionSet object as well.
>
> Best,
>
> Jim
>
>
>
>
> I have looked at the genefilter function but have not found
> specific
> examples of how to do what I want.
>
>
> Thanks in advance,
> -M
>
>
>
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> Douglas Lab
> University of Michigan
> Department of Human Genetics
> 5912 Buhl
> 1241 E. Catherine St.
> Ann Arbor MI 48109-5618
> 734-615-7826 <tel:734-615-7826>
>
> **********************************************************
> Electronic Mail is not secure, may not be read every day, and
> should not be used for urgent or sensitive issues
>
>
>
>
> --
> Mark Baumeister
>
> http://sites.google.com/site/lfmmab/
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
More information about the Bioconductor
mailing list