[BioC] lmfit zero genes fit model

Tue Apr 3 21:41:13 CEST 2007

Hi Urska,

Urska Cvek wrote:
> Hello BioC,
> 
> I am using the limma package and the lmfit function, as described in
> section 8.7 here:
> http://bioconductor.org/packages/1.9/bioc/vignettes/limma/inst/doc/usersguide.pdf.
> I have used this same package before, on a different data set, and it
> worked nice, so I am thinking there's something up with my data that I
> don't understand completely.
> 
> I have 8 Affy chips, for a factorial design with Deletion and Strain
> as the two factors:
> Filename    Deletion    Strain
> Ht-2002.cel    h            t2
> Ht-2003.cel    h            t2
> Hn2-2002.cel  h           n2
> Hn2-2003.cel  h           n2
> Lt-2002.cel     l            t2
> Lt-2003.cel     l            t2
> Ln2-2002.cel   l           n2
> Ln2-2003.cel   l           n2
> 
> readAffy function is used to read in the AffyBatch object "a" using
> the above PhenoData.
> Since these experiments were done as replicate pairs, I first
> normalize the data by pairs.

There are very few instances where this would be either necessary or a 
good thing to do. IMO you are much better off just running rma() on all 
chips together.

> 
> normalize(a[,1:2]), and repeate for the other three
> 
> normalized.a = merge(tmp1,tmp2)
> normalized.a = merge(normalized.a, tmp3)
> normalized.a = merge(normalized.a, tmp4)
> 
> x=rma(normalized.a, normalize=FALSE)
> 
> I look at the boxplots for the raw intensities, and expression set "x"
> intensities, and the second set looks much better.
> 
> TS <- paste(pd$Deletion, pd$Strain, sep=".")    # extract all
> combinations in one vector
> TS
> 
> # fit a model with a coefficient for each of the four factor combinations
> # and then extract the comparisons of interest as contrasts
> TS <- factor(TS, levels=c("h.t2", "h.n2", "l.t2", "l.n2"))
> design <- model.matrix(~0+TS)
> 
> 
>>design
> 
>   h.t2 h.n2 low.t2 low.n2
> 1         1       0        0      0
> 2         1       0        0      0
> 3         0       1        0      0
> 4         0       1        0      0
> 5         0       0        1      0
> 6         0       0        1      0
> 7         0       0        0      1
> 8         0       0        0      1
> 
> colnames(design) <- levels(TS)
> fit <- lmFit(x, design)
> 
> cont.matrix <- makeContrasts(
>   H.n2vst2=h.t2-h.n2, L.n2vst2=l.t2-l.n2,
>   Diff=(h.n2-h.t2)-(l.n2-l.t2),
>   levels=design)
> 
> 
>>cont.matrix
> 
>             Contrasts
> Levels      H.n2vsi2 L.n2vsi2 Diff
>   h.t2              1        0       -1
>   h.n2             -1       0        1
>   low.t2           0        1        1
>   low.n2          0       -1       -1
> 
> fit2 <- contrasts.fit(fit, cont.matrix)
> fit2 <- eBayes(fit2)
> 
> # display the counts
> results <- decideTests(fit2)
> vennDiagram(results)
> 
> This vennDiagram is empty (has zeros for counts). There are no genes
> that would be differentially expressed in other of the two linear
> models, which I find very surprising. I know that it's possible that
> not many genes would be differentially expressed, but I don't
> understand why there are no genes that would fit the model at all.
> I have tried just using the rma on the whole set of 8 chips as well,
> without the separate normalization step, but this does not yield much
> luck either. I also added the fourth replicate to the experiment,
> making it a total of 12 chips, but that chip had such different
> distribution that I removed it from the analysis.
> 
> What should I do to be able to answer the above questions? Is the
> caused by my data and I cannot find such genes?

I would start by looking at the output of topTable() for each of your 
coefficients (or even just topTable(fit2), which will output overall 
F-stats for your contrasts).

It may well be that you don't have much evidence for differential 
expression. Depending on the true differences between your sample types 
(actual difference in mRNA amounts as well as the number of 
differentially expressed genes), you may not have enough replication to 
detect anything.

Note that by default you are using the original BH version of FDR to 
adjust for multiple comparisons. The number of 'significant' probesets 
you get will be determined by the range of p-values as well as the 
number of probesets you are testing. You can increase your power to 
detect differences by first filtering out those probesets that aren't 
considered interesting by some criterion (such as low variance, being 
called absent by the mas5 algorithm, small expression values, etc.), and 
then fitting your model to the remaining probesets.

You might also consider using unadjusted p-values, and using a more 
stringent p-value cutoff. Something like 0.005. There is nothing really 
magical about using FDR or any other p-value adjustment - they are just 
useful for cutting long lists of genes down to a manageable length and 
giving you something you can say globally about the set of 'significant' 
probesets. The ordering of your gene list will be the same regardless. 
You may just have fewer significant probesets than you would expect to 
see by chance.

Best,

Jim

> 
> Thank you in advance, your help is appreciated.
> 
> U.C.
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623

**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.