[BioC] Limma: Error with dupcor.series/duplicateCorrelatoin function
Valtteri Wirta
valtteri at biotech.kth.se
Tue Mar 16 22:50:33 MET 2004
Hello Limma developers and users,
(sorry for the long posting)
Today I encountered a similar problem as the one that was reported by Jason
Skelton.
I'm using Limma version 1.5.7 and R 1.8.1 on Windows XP platform.
I have 16 slides divided into four comparisons (four in each).
Each array contains 30 000 elements, representing 15 000 probes printed in
duplicate (two identical fields => spacing=15000).
The M-values are stored in a matrix called Ms
Data for unreliable features is removed through filtration which causes
some probes to have no valid measurements.
The modelMatrix (previously known as designMatrix) looks like this:
> design
HF HM LF LM
1 1 0 0 0
2 0 0 0 1
3 0 0 1 0
4 0 1 0 0
5 0 1 0 0
6 1 0 0 0
7 0 0 0 1
8 0 0 1 0
9 0 1 0 0
10 1 0 0 0
11 0 0 0 1
12 0 -1 0 0
13 -1 0 0 0
14 0 0 0 -1
15 0 0 1 0
16 0 0 -1 0
When estimating the within-slide correlation using dupcor.series (or
duplicateCorrelation)
> cor <- dupcor.series(Ms, design=design, ndups=2, spacing=15000)
the following error message is displayed:
Error in if (rho[i] < -1) rho[i] <- -1 : missing value where TRUE/FALSE needed
Ok. I replaced all NAs with 1 and now it works nicely.
It seems therefore that this has something to do with the NAs.
As the purpose of this step is to estimate the within-slide duplicate spot
correlation, I thought that changing the design object into a new one
considering all slides belonging to the same comparison should be ok.
> cor <- dupcor.series(Ms, design=rep(1,16), ndups=2, spacing=15000)
And this works nicely. (Only the warning message: Max iterations exceeded
in: glmgam.fit(dx, dy, start = c(mean(dy), 0)) is displayed, but that
should not matter).
Do you think it is ok to do like this?
Now, back to the similarity with Jason's example:
I tried to further investigate the problem. Now I'm back to the old design
with four comparisons (shown above) and for each comparison I change the
M-value for all genes with no valid measurement on any of the four slides to 1.
> cor <- dupcor.series(Ms.new, design=design, ndups=2, spacing=15000)
This works nicely.
This brings up the question of "how many valid measurements are needed?"
For genes with no valid measurement on any of the four slides I replaced 1
of the measurements (always the first one) with 1.
Then
> cor <- dupcor.series(Ms.new2, design=design, ndups=2, spacing=15000)
gives the following error:
Error in if (rho[i] < -1) rho[i] <- -1 : missing value where TRUE/FALSE
needed
When I change two of the four NAs to 1, this error is displayed:
Error in chol(XVX + lambda * I) : the leading minor of order 2 is not
positive definite
Same goes for three of four, and four of four into 1 works nicely, as expected.
To make this even more difficult to understand... If I break out the four
slides corresponding to the first comparison (HF in the design matrix
above) the dupcor.series function works nicely also on the matrix containg
the original NA values:
Ms.new3 <- Ms[,c(1,6,10,13)]
cor.3 <- dupcor.series(Ms.new3, design=c(1,1,1,-1), ndups=2, spacing=15000)
To conclude, it seems like there is a problem when a modelMatrix
(designMatrix) is used in the dupcor.series/dupliceCorrelation function and
that this is related to the number of valid measurements. I'd appreciate if
someone could comment on this and perhaps explain what is wrong or what I
am doing wrong
Finally, I have a small suggestions. Would it be possibly to have LIMMA
display the version number when the package is loaded?
regards,
Valtteri
Contact information:
Valtteri Wirta
Department of Biotechnology, KTH
AlbaNova University Center
S - 10691 Stockholm, Sweden
Visiting address: Roslagstullsbacken 21, B3
Phone: +46 8 5537 8344(office)
Phone: +46 733 386 341 (gsm)
Fax: +46 8 5537 8481
Email: valtteri at biotech.kth.se
Web: http://biobase.biotech.kth.se/~valtteri/R
More information about the Bioconductor
mailing list