[BioC] question about the error from function intraspotCorrelation

Fri Aug 6 08:52:25 CEST 2004

Dear Ren,

Thanks for sending me your data. You haven't made a mistake. It turns out 
that intraspotCorrelation() returns an error for one of your genes, 
although it runs for all the others. intraspotCorrelation() depends on an 
interative REML algorithm - when I get a chance I will look at the 
algorithm and try to trap floating point errors so they don't cause the 
function to fail in this way.

This is your targets setup:

targets <- readTargets()
 > targets
       FileName     Cy3     Cy5
1436 1436.spot young26   old20
1437 1437.spot   old20 young26
1438 1438.spot young27   old21
1439 1439.spot   old21 young27
1440 1440.spot young28   old22
1441 1441.spot   old22 young28
1442 1442.spot young29   old23
1443 1443.spot   old23 young29
1444 1444.spot young30   old25
1445 1445.spot   old25 young30

I understand that your reason for doing a single-channel analysis is so 
that you can handle the technical replication by fitting an individual 
effect for each mouse. Here are my results for your data:

targets.sc <- targetsA2C(targets)
#  I include an effect for each mouse plus a dye effect
design.sc <- 
model.matrix(~0+factor(targets.sc$Target)+factor(targets.sc$channel))
#  intraspotCorrelation() fails for gene 86, so I fit to all other genes
corfit <- intraspotCorrelation(MA[-86,],design.sc)
#> corfit$consensus
#[1] 0.8627374  # intra-spot correlations are typically 0.8-0.95.
fit <- lmscFit(MA,design.sc,correlation=corfit$consensus)
# The contrast of interest is the average difference between old and young mice
cont.matrix <- cbind(OldvsYoung=c(1,1,1,1,1,-1,-1,-1,-1,-1,0)/5)
fit2 <- contrasts.fit(fit,cont.matrix)
fit2 <- eBayes(fit2)
tab <- topTable(fit2,adjust="fdr")
tab[,c("ID","Symbol","Status","M","A","t","P.Value","B")]
            ID        Symbol 
Status          M         A          t      P.Value         B
314  AK020608 9530053H05Rik  oligo  0.7875154  9.655528  16.224403 
2.055509e-07 12.971211
150 NM_024197       Ndufa10  oligo -0.5067656 10.951221 -16.197897 
2.055509e-07 12.951719
462  BC009097 1110003E01Rik  oligo -0.6551344 10.555844 -15.821665 
2.055509e-07 12.671381
244 NM_008458     Serpina3c  oligo -0.6652694 13.860417 -14.852744 
3.034365e-07 11.916406
28  NM_008147         Gp49b  oligo  0.5659843  8.584803  14.668451 
3.034365e-07 11.767129
164  AK015780 4930513F16Rik  oligo  0.5359480  8.959335  14.255934 
3.544077e-07 11.426037
91   BC017687        Uqcrc1  oligo -0.5388312 10.798586 -13.204708 
7.481306e-07 10.510645
113 NM_007977            F8  oligo  0.5478736  8.822306  11.668914 
2.752735e-06  9.039561
337 NM_007763         Crip1  oligo  0.4535503  9.020447  11.008615 
4.769757e-06  8.351686
195 NM_023065         Ifi30  oligo  0.6385894  7.613664   9.759321 
1.669688e-05  6.945506

Actually you could also analyse this data using log-ratios and instead 
handle the blocking structure using a random effects model. Here is how 
that would run in limma:

design <- cbind(Dye=1,OldvsYoung=c(1,-1,1,-1,1,-1,1,-1,1,-1))
pair <- c(1,1,2,2,3,3,4,4,5,5)
corfit <- duplicateCorrelation(MA,design,block=pair)
#> corfit$consensus
#[1] -0.3769877  There is a reasonable within-block correlation. The 
correlation is negative because of the dye-swapping.
fit <- lmFit(MA,design,block=pair,correlation=corfit$consensus)
fit <- eBayes(fit)
tab <- topTable(fit,coef="OldvsYoung",adjust="fdr")
tab[,c("ID","Symbol","Status","M","A","t","P.Value","B")]
            ID        Symbol 
Status          M         A         t      P.Value         B
314  AK020608 9530053H05Rik  oligo  0.7829186  9.655528 10.280217 
3.135044e-05 8.5213432
150 NM_024197       Ndufa10  oligo -0.5093503 10.951221 -6.963341 
1.602865e-03 4.1907406
462  BC009097 1110003E01Rik  oligo -0.6629056 10.555844 -6.323564 
3.047635e-03 3.1488941
91   BC017687        Uqcrc1  oligo -0.5281134 10.798586 -6.111562 
3.275146e-03 2.8236443
333  BC004749          Hagh  oligo -0.4918728 10.774405 -5.102169 
1.514349e-02 1.1027534
497  AK011827 2610108D09Rik  oligo  0.3766103 11.110562  4.994212 
1.514349e-02 0.8217170
164  AK015780 4930513F16Rik  oligo  0.5131960  8.959335  4.872980 
1.514349e-02 0.7170266
57   AK013971        Tex261  oligo -0.3951064 10.430905 -4.888604 
1.514349e-02 0.6787478
244 NM_008458     Serpina3c  oligo -0.6595177 13.860417 -4.691885 
1.894933e-02 0.2713318
113 NM_007977            F8  oligo  0.5285670  8.822306  4.600728 
2.028759e-02 0.1105870

Notice that the log-ratio analysis results are very similar to the 
single-channel results but are somewhat less significant. The M-values from 
the two analysis are almost the same - but not exactly identical because 
you have quantitative spot weights set. My guess is that the significance 
levels from the log-ratio analysis are more reliable than those from the 
single-channel analysis, but I can't give you a firm reason.

Gordon

At 06:59 AM 3/08/2004, Ren Na wrote:
>Gordon Smyth wrote:
>
>>At 04:50 AM 31/07/2004, Ren Na wrote:
>>
>>>I tried to do individual channel analysis of two-colour microarray data.
>>>I sort of followed the example in function lmscFit. When I ran to the
>>>the following step, I got the error,
>>>corfit <- intraspotCorrelation(MA, design)
>>>Error in if (dev < devold - 1e-15) break :
>>>         missing value where TRUE/FALSE needed
>>>
>>>I checked M and A, there is no missing value. Does anyone know what
>>>cause this error?
>>
>>
>>The error message is from the function remlscore() in the statmod 
>>package, and indicates a failure of the iterative algorithm used to fit a 
>>REML model. remlscore() is usually very reliable. One possible 
>>explanation is that your data is degenerate in some way so that a model 
>>can't be sensibly fit. If you provide a reproducible example of the 
>>error, I will have a look at it.
>>
>>Gordon
>>
>>>Thank you very much.
>>
>Dr. Gordon Smyth,
>Thank you very much for replying my message. I really appreciate it.
>My mouse microarray data is five pairs of dye swaps,
>young26(Cy3) vs old20(Cy5); old20(Cy3)-young26(Cy5)
>young27(Cy3) vs old21(Cy5); old21(Cy3)-young27(Cy5)
>young28(Cy3) vs old22(Cy5); old22(Cy3)-young28(Cy5)
>young29(Cy3) vs old23(Cy5); old23(Cy3)-young29(Cy5)
>young30(Cy3) vs old25(Cy5); old25(Cy3)-young30(Cy5)
>I tried to create the topTable, find significantly expressed genes
>between  young  mice and  old  mice.
>Here are the steps I did:
> > library(limma)
> > targets <- readTargets("Targets.txt")
> > RG<-read.maimages(targets$FileName, source="spot",wt.fun=wtarea(100))
> > RG$genes<- readGAL("Mouse24052004.txt")
> > RG$printer<-getLayout(RG$genes)
> > spottypes<-readSpotTypes("spottypes.txt")
> > RG$genes$Status<- controlStatus(spottypes, RG$genes)
> > RG<-backgroundCorrect(RG,method="minimum")
> > MA<-normalizeWithinArrays(RG)
> > MA<-normalizeBetweenArrays(MA, method="Aq")
> > targets.sc <- targetsA2C(targets)
> > targets.sc$Target <- factor(targets.sc$Target, levels=c("young26", 
> "old20", "young29", "old23", "young30", "old25", "young28", "old22", 
> "young27", "old21"))
> > design <- model.matrix(~ -1+Target, data=targets.sc)
> > corfit <- intraspotCorrelation(MA, design)
>Error in if (dev < devold - 1e-15) break :
>        missing value where TRUE/FALSE needed
>In addition: Warning message:
>reml: Max iterations exceeded in: remlscore(y, X, Z)
>For making the files "MA" and "design" small, I randomly choose 500 genes 
>like the following:
>i <- sample(1: nrow(MA), 500)
>MA <- MA[i,]
>Then I use the function dput to save this "MA" object and  the object
>"design"  in  the  files  "MA"  and  "design" ,  so  you  can use
>these  files  to  run  the  function  intraspotCorrelation. Please see the 
>attachment for these two files.
>
>I am also not very clear about the contrast matrix. For my data,  what
>is  the right  way  to  create  the  contrast  matrix,  and  could  you
>tell  me  where  I  can  find  some  material  about  how  to  make
>design  matrix  and  contrast  matrix?
>
>Thanks again, I am looking  forward  to  hearing  from  you.
>
>Sincerely,
>
>Ren
>
>
>[dput material deleted]