[BioC] ComBat error message, thanks

Guan Wang Guan.Wang at glasgow.ac.uk
Fri May 2 14:15:04 CEST 2014


Hi All,

I understood from the preivous post "[BioC] ComBat_ Error in solve.default(t(design) %*% design): Lapack routine dgesv: system is exactly singular: U[4, 4] = 0" that this error is to do with the confounded batch and covariate status. I have the same ComBat_Error appeared when running surrogate variable analysis (SVA) and have several other related questions. Hope you could have a look. Many thanks for any opinions/suggestions.

Data set: 24 samples from 6 subjects (4 time points/subject: 2 baseline samples collected on different days, 1 during drug treatment, 1 after drug treatment). Experiments were done with Affymetrix GeneChip 3.0 for miRNA expression profiling. 

Initial data analysis: "oligo" is used to handle Affy CEL files, "rma()" is used for data normalization. After this, I still see PC1 seems to correlate with certain batch effect (which I'm not aware, i.e. not come from different 
scan dates) on the PCA plot. Then "sva" package is used to estimate the surrogate variables, followed by "ComBat()". 

Now, come to the ComBat_Error, when I specified the contrasts as (Base2-Base1, During-Base1, Post-Base1). The pheno input attached below:

	                        sample	batch	Status
GW2miRNA1_(miRNA-3_0).CEL	1	1	Base1
GW2miRNA2_(miRNA-3_0).CEL	1	1	Post7
GW2miRNA3_(miRNA-3_0).CEL	2	1	Base1
GW2miRNA4_(miRNA-3_0).CEL	2	1	Post7
GW2miRNA5_(miRNA-3_0).CEL	3	1	Base1
GW2miRNA6_(miRNA-3_0).CEL	3	1	Post7
GW2miRNA7_(miRNA-3_0).CEL	4	1	Base1
GW2miRNA8_(miRNA-3_0).CEL	4	1	Post7
GW2miRNA9_(miRNA-3_0).CEL	5	1	Base1
GW2miRNA10_(miRNA-3_0).CEL	5	1	Post7
GW2miRNA11_(miRNA-3_0).CEL	6	1	Base1
GW2miRNA12_(miRNA-3_0).CEL	6	1	Post7
GW1miRNA13_(miRNA-3_0).CEL	6	2	Base2
GW1miRNA14_(miRNA-3_0).CEL	6	2	During4
GW1miRNA15_(miRNA-3_0).CEL	4	2	Base2
GW1miRNA16_(miRNA-3_0).CEL	1	2	During4
GW1miRNA17_(miRNA-3_0).CEL	5	2	Base2
GW1miRNA18_(miRNA-3_0).CEL	5	2	During4
GW1miRNA19_(miRNA-3_0).CEL	4	2	During4
GW1miRNA20_(miRNA-3_0).CEL	3	2	Base2
GW1miRNA21_(miRNA-3_0).CEL	3	2	During4
GW1miRNA22_(miRNA-3_0).CEL	1	2	Base2
GW1miRNA23_(miRNA-3_0).CEL	2	3	During4
GW1miRNA24_(miRNA-3_0).CEL	2	3	Base2

I understand that the batch is confounded with the status as you could see in the phenotype file above. Since the two baseline samples are from same subjects, however, collected on different days before injecting the drug. I'm thinking whether it makes sense to classify "Base1 + Base2" as "Base", and make contrasts for "During - Base" and "Post - Base". Other columns in above pheno file will be kept the same and re-run the "sva"? Or is it more appropriate to do two separate "sva" analyses, i.e. "Post7 - Base1" for first 12 samples as hybridized and scanned at the same time and "During4 - Base2" for the last 12 samples as they were treated as a batch (however, scanned at two times, that's why they were labelled as batch 2 and 3 in column of "batch").
 
Hope I've described clearly. Much appreciated for suggestions/opinions.

Regards
Guan


More information about the Bioconductor mailing list