Chol-hee, 

Notice the simple example:

> x=as.factor(c(1,0,1,0));y=as.factor(c(1,2,1,2));z=rnorm(4)

Notice that x and y are the same covariate. Now:

> design=model.matrix(z~x+y)
> design
  (Intercept) x1 y2
1           1  1  0
2           1  0  1
3           1  1  0
4           1  0  1
attr(,"assign")
[1] 0 1 2
attr(,"contrasts")
attr(,"contrasts")$x
[1] "contr.treatment"

attr(,"contrasts")$y
[1] "contr.treatment"

> solve(t(design)%*%design)
Error in solve.default(t(design) %*% design) : 
  Lapack routine dgesv: system is exactly singular

You get the singularity error because your covariates are exactly the same (or not linearly independent). Now if you concatenate the variables like you did:

> xy=as.factor(paste(x,y,sep='.'))
> xy
[1] 1.1 0.2 1.1 0.2
Levels: 0.2 1.1

Which is clearly different than the original x+y. Now:

> design=model.matrix(z~xy)
> design
  (Intercept) xy1.1
1           1     1
2           1     0
3           1     1
4           1     0
attr(,"assign")
[1] 0 1
attr(,"contrasts")
attr(,"contrasts")$xy
[1] "contr.treatment"

> solve(t(design)%*%design)
            (Intercept) xy1.1
(Intercept)         0.5  -0.5
xy1.1              -0.5   1.0

Which now works. However, note that I wouldn't use the latter. You need to find out which of your six covariates are linearly dependent with each other and remove one or more so they are NOT linearly dependent. This will be different from your second attempt but will be equivalent to what you were trying to accomplish in your first attempt

Let me know if this doesn't work!

Thanks!

Evan

--
W. Evan Johnson
Assistant Professor
Division of Computational Biomedicine
Boston University School of Medicine
72 East Concord St., E-645
Boston, MA 02118
Phone: (617) 638-2541



On Oct 25, 2012, at 6:00 AM, bioconductor-request@r-project.org wrote:

> ------------------------------
> 
> Message: 8
> Date: Wed, 24 Oct 2012 13:40:33 -0400
> From: Achilleas Pitsillides <anp4r@virginia.edu>
> To: Cholhee Jung <jung.cholhee@gmail.com>, Bioconductor mailing list
> 	<bioconductor@stat.math.ethz.ch>
> Subject: Re: [BioC] Fwd: covariate information
> Message-ID:
> 	<CABDy-6=dP5-Wu+Zqo4-UpMBPX51zeCRPQCwdSV5yGYHg2Z821A@mail.gmail.com>
> Content-Type: text/plain
> 
> Dear Chol-hee,
> 
> The short answer is that the two model matrices are different and they have
> different dimensions; you can verify this by using the dim(mod_mat) to see
> the dimension of the model matrix.
> 
> Here is my understanding: If you have a factor  f1 with 3 levels and a
> factor f2 with 2 levels (where all the possible level combinations exist),
> then f1:f2 is all the combinations of f1 and f2 (an equivalent  factor
> with 6 levels) and  the  model matrix ~f1:f2 would have six columns (i.e.
> fit a model with 6 coefficients).
> However, the model matrix ~f1+f2 will have 4 columns ( i.e. fit 4
> coefficients: constant, two for f1 and one for f2).
> The model ~f1*f2 will fit 6 coefficients and have a model matrix with the
> same column space as the model matrix for ~f1:f2.
> 
> 
> I hope this helps,
> 
> cheers,
> Achilleas
> 
> 
> On Tue, Oct 23, 2012 at 10:10 PM, Cholhee Jung <jung.cholhee@gmail.com>wrote:
> 
>> 
>> 
>> Dear users,
>> 
>> Below is the question I posted originally in the 'ComBat user forum' of
>> Google group.
>> But, as I was suggested to forward my question to Bioconductor mailing
>> list, I'm doing it now.
>> 
>> Please find my question, below.
>> 
>> 
>> Regards,
>> Chol-hee
>> 
>> On Tuesday, October 23, 2012 10:13:26 AM UTC+11, Cholhee Jung wrote:
>>> 
>>> 
>>> 
>>> Dear users,
>>> 
>>> I was trying ComBat on ~1,000 samples.
>>> Samples are spread over 12 batches and each batch contains 4 technical
>>> replicates that are identical across all batches.
>>> The number of covariates is 5, and I was using the ComBat implemented in
>>> the 'sva' package.
>>> 
>>> I tried ComBat with two model matrix built from the same covariate
>>> information.
>>> 
>>> First model matrix was constructed as below:
>>>> mod_mat = model.matrix(~as.factor(cov1) + as.factor(cov2) +
>>> as.factor(cov3) + as.factor(cov4) + as.factor(cov5), data=pheno_data )
>>> 
>>> Second one was built as below:
>>>> mod_mat = model.matrix(~as.factor(paste(pheno_data$cov1,
>>> pheno_data$cov2, pheno_data$cov3, pheno_data$cov4, pheno_data$cov5,
>>> sep=":")))
>>> 
>>> Basically, covariates were concatenated into one string for the the
>>> second model matrix.
>>> 
>>> ComBat with the first model matrix raised the 'singular' error like
>> below:
>>> 
>>>  Error in solve.default(t(design) %*% design) :
>>>   Lapack routine dgesv: system is exactly singular
>>> 
>>> But, ComBat run without error with the second model matrix.
>>> 
>>> 
>>> Now I wonder if the two different model matrices are same?
>>> 
>>> Regards,
>>> Chol-hee
>>> 
>>> 
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> 
> 
> 	[[alternative HTML version deleted]]


	[[alternative HTML version deleted]]

