[BioC] Deciding on a cut off after QC

Ankit Pal pal_ankit2000 at yahoo.com
Mon May 16 14:42:07 CEST 2005


Dear Dr Smyth,
I guess there has been a mistake in communication.
There are 39000 spots on the array but the number of
probes are lesser as there are replicates(not a fixed
number) on the array.
The output from toptable() gives me a total of 39000
rows, one for each spot.
I need to give an output of the spots most significant
in this list.
What is the cut off value(p or B or any other) I need
to use above which I can consider a set of
differentially expressed genes significant enough for
further use (eg :function based clustering of all
significantly overexpressed genes).
I cannot use all the genes in the topTable() output.
Where do I put the cut off?
Is there a way I can calculate it?
sincerely,
-Ankit

--- Gordon K Smyth <smyth at wehi.EDU.AU> wrote:
> On Mon, May 16, 2005 9:49 pm, Ankit Pal said:
> > Dear Dr Smyth,
> > I'm sorry for not having specified which result
> file.
> > It is the final result summary we get after we
> give
> > the command
> > Resultfile <- topTable(fit,n=200, adjust="fdr")
> > A sample result file has been attached.
> > The code I used for my analysis is
> >
> >> targets <- readTargets("target.txt")
> >
> > #The QC filter
> >> myfun <- function(x,threshold=55){
> > + okred <- abs(x[,"% > B635+2SD"]) > threshold
> > + okgreen <- abs(x[,"% > B532+2SD"]) > threshold
> > + okflag <- abs(x[,"Flags"]) > 0
> > + okRGN <- abs(x[,"Rgn R²"]) > 0.6
> > + as.numeric(okgreen || okred || okflag || okRGN)
> > + }
> > #end of QC filter
> >
> >> RG_7 <- read.maimages(targets$FileName,
> > source="genepix",wt.fun=myfun)
> >> RG_7$genes <- readGAL()
> 
> As I said last week, this command is not needed with
> GenePix data.  You should omit it.
> 
> >> RG_7$printer <- getLayout(RG_7$genes)
> >> MA_7 <-
> normalizeWithinArrays(RG_7,method="loess")
> >> MA_7 <- normalizeBetweenArrays(MA_7)
> >> fit_7 <- lmFit(MA_7, design=c(1,-1,1,-1))
> >> fit_7 <- eBayes(fit_7)
> >> options(digits=3)
> >> Resultfile_7 <- topTable(fit_7, n=39000,
> > adjust="fdr")
> >> Resdat_7 <-data.frame(Resultfile_7)
> 
> Resultfile_7 is already a data.frame.
> 
> >> write.table(Resdat_,file='Result.csv',quote =
> FALSE,
> > sep = "\t")
> >
> > I understand that the spots that do not qualify
> the QC
> > filter are given a weight of "0" by limma and are
> not
> > considered for normalization and will not affect
> the
> > analysis.
> > The result file I get contains all the spots
> (38000)
> > in my case.
> 
> Mmm, this doesn't make sense.  You have 4 arrays
> with 39000 spots on each array.  Hence you have
> 4*39000 spots, not 39000.  The output from
> topTable() gives a summarized log-ratio for each
> probe,
> not a result for each spot.
> 
> Gordon
> 
> > Didn't the spots that were bad get removed from
> the
> > final result?
> > If not what is the cut off value (B, p etc) that I
> > need to use to get a set of reliable spots(I cant
> use
> > all the 38000) from my result file for my
> analysis.
> > Is there a fixed formula to derive the same as the
> > values vary with the analysis.
> > Waiting for your reply,
> > Thank you,
> > -Ankit
> 
> 
>



More information about the Bioconductor mailing list