[BioC] limma questions (normalization)

Thu Sep 20 02:57:05 CEST 2007

Dear Brooke,

>Date: Tue, 18 Sep 2007 17:28:23 -0400 (EDT)
>From: Brooke LaFlamme <bal44 at cornell.edu>
>Subject: [BioC] limma questions (normalization)
>To: bioconductor at stat.math.ethz.ch
>
>Hi, I am very new to bioconductor and microarray analysis in 
>general. I am using Windows XP and R version 2.5.1.
>
>I have a few questions about the limma package that I am using for 
>cDNA array analysis.
>
>First: I would like to flag out all spots with fluorescence 
>intensity lower than that of the mean background intensity of the array.

Personally, I think this sort of spot filtering is almost always a 
bad idea. It can make sense to remove probes from the downstream 
differential expression analysis which have very low intensity on 
every array, but removing them before the normalization step is 
usually counter productive. Filtering out individual spots just seems 
bad from every point of view.

You are implying that a low intensity spot is "low quality", but this 
is not necessarily so. Usually it is telling you that the probe is 
not expressed, which is information you should preserve rather than throw out.

If you're new to microarray analysis, it might be a good idea to 
follow one of the case study examples closely, which use recommended 
methods, instead of trying to put together your own ad hoc analysis.

>  The weight function I am using to do this is (using Genepix output 
> file (*.gpr) column names):
>
>myfun<-function(x){
>                 okred<-mean(x[,"B635"])-x[,"F635 Mean"]<=0
>                 okgreen<-mean(x[,"B532"])-x[,"F532 Mean"]<=0
>                 as.numeric(okgreen & okred)
>                 }
>
>Does this actually do what I want? I?m not certain of how to 
>calculate ?mean background intensity? for the whole array.

I assume you're trying to use the wt.fun argument of read.maimages(). 
You cannot use this approach to compute the mean background intensity 
of the array. You need to read in the whole array before you can do that.

>Also, when spots are given a weight of zero, are only spots that 
>have weight >0 across all replicate arrays included in the analysis?

No. If you give an individual spot a weight of zero, then that 
individual spot is removed from the analysis. The corresponding probe 
will remain in the analysis if it has any spots with weight>0.

>  How do I find the number of spots included in the analysis? I know 
> how to do this for individual arrays but if arrays 1-4 (out of 16) 
> are in group 1, how do I find how many spots are included in the 
> analysis for group 1?

colSums(RG$weights==0)

>Second: Two of my array groups consistently have all significant 
>spots being ?downregulated? relative to the control. I feel that 
>there must be an error in my normalization step or background 
>correction step. This is the code I am using:
>
>#Background correction and normalization
>         RG2<-backgroundCorrect(RG, method="normexp", offset=50)
>         w<-modifyWeights(RG2$weights, RG2$genes$Status, 
> c("Actin","Buffer"), c(2,2))
>         MA<-normalizeWithinArrays(RG2, method="loess", weights=w)
>         MA.Aq<-normalizeBetweenArrays(MA)
>#look at array weights
>         arrayw<-arrayWeights(MA.Aq)
>#Create the design matrix for linear models
>         design<-modelMatrix(targets, ref="control")
>#fit a linear model to each gene
>         fit<-lmFit(MA.Aq, design=design, weights=arrayw)
>#create a contrast matrix to compare each treatment to the control case
>         contrast.matrix<-makeContrasts(Acp26Aa, Acp29Ab, Acp36DE, 
> Acp62F, levels=design)
>#expand linear model and compute empirical Bayes statistics
>         fit2<-contrasts.fit(fit, contrast.matrix)
>         fit2<-eBayes(fit2)

The best way to judge if your background correction and normalization 
has been successful is to look at your data, before and after 
normalization, using diagnostic plots such as plotMA() which are 
described in the limma User's Guide.

>Is normexp the best way to do background correction?

Usually it is, see 
http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btm412 
. But I would not use any method on microarray data without 
scrutinising the results, as described above.

>Also, this array (DGRC-1) is printed with 48 pins, so I?m pretty 
>sure I can?t use printtiploess. Is this true?

Why not? We have regularly use printtiploess with 48 pins, although 
there are other possibilities, like robustspline.

>Finally, what is the underlying linear model that lmFit is using? I 
>can find no information on this.

You don't mention having looked at any of the documentation.

>I assume it treats treatment and dye as fixed effects and array as a 
>random effect, but I just want to confirm this.

I think that have you read literature about a different method. It 
would help for you to say what you have read.

The limma function lmscFit() would treat arrays as random, but you 
are using lmFit(), which treats everything as fixed, unless you use 
the block argument.

In your case, you are analysing log-ratios, so any array effect has 
already been removed from the data. So there is no array effect in the model.

There also is no dye effect in your model, because you didn't add 
one. See section 8.1.2 of the limma User's Guide. A dye effect would 
only make sense anyway if you have dye swaps in your experimental design.

Best wishes
Gordon

>Thank you so much for any help demystifying this. Sorry this was so long.
>
>Brooke L.