[BioC] Error when applying ComBat in SCAN.UPC

Peter Langfelder peter.langfelder at gmail.com
Mon Jun 9 18:14:15 CEST 2014


Hi Jim,

with all due respect, it also helps to read the documentation (in this
case for ComBat) which states

 mod: Model matrix for outcome of interest and other covariates
          besides batch

The key point is "besides batch". I also read the source code of the
function and it adds the batch to mod, so if Joel has no covariates of
interest, model.matrix(~1,...) is the correct model matrix to use to
correct for batch without any covariates.

So to not confuse Joel even more, the argument mod = model.matrix(~1,
data = data.frame(batch)) is correct and I understand pretty well what
it does and why it is the right argument to use.

Best,

Peter


On Mon, Jun 9, 2014 at 7:59 AM, James W. MacDonald <jmacdon at uw.edu> wrote:
> Hi Joel,
>
> It always helps to look at things to make sure you are doing what you
> expect. As an example, I think you are expecting that this does something
> different than it actually does:
>
>
> mod = model.matrix(~1, data = data.frame(batch))
>
> This appears to be your attempt to follow the sva vignette. But note that in
> the sva vignette, the analogous code is used to create the NULL model, not
> the full model. And also note that the data argument in this case is only
> useful in that it tells model.matrix() how many 1s to put in the design
> matrix. In other words
>
> model.matrix(~1, data = somedataframe)
>
> is telling R to give you a design matrix with just an intercept, and use
> somedataframe to determine the number of rows for the design matrix. It
> appears you might think that this is telling R to use the first column of
> your data.frame, which is incorrect. Instead, you have to pass the column
> name of the data.frame that you want to use.
>
> Best,
>
> Jim
>
>
>
> On 6/9/2014 8:06 AM, Joel Ma wrote:
>>
>> Hi Natasha
>>
>> Thanks for pointing it out. It worked but another error message appeared.
>> It is the same one I encountered before.
>>
>> Found 2 batches
>> Found 0  categorical covariate(s)
>> Standardizing Data across genes
>> Fitting L/S model and finding priors
>> Finding parametric adjustments
>> Error in while (change > conv) { : missing value where TRUE/FALSE needed
>>
>> I found 4 probesets with zero values for all samples. I am guessing those
>> are what Peter mentioned as zero variances. I will try to remove them and
>> see what I get.
>>
>> Thanks again.
>>
>> Cheers
>>
>> Joel
>> ________________________________________
>> From: Natasha Sahgal [n.sahgal at qmul.ac.uk]
>> Sent: Monday, June 09, 2014 9:48 PM
>> To: Joel Ma; Peter Langfelder
>> Cc: bioconductor at r-project.org
>> Subject: Re: [BioC] Error when applying ComBat in SCAN.UPC
>>
>> Hi Joel,
>>
>> I think you have a typo in your code:
>>
>> batch = phenoData$batch should be batch = phenoData$Batch.
>>
>>
>>
>> HTH,
>> Natasha
>>
>> On 09/06/2014 12:10, "Joel Ma" <jzma at unimelb.edu.au> wrote:
>>
>>> Hi Peter
>>>
>>> Thanks for the suggestions. Apologies for my coding. As you can already
>>> tell, I have no expertise in this area of work. I have tried your
>>> suggestions and would like some advice.
>>>
>>> This is what I did. Hopefully, my coding has improved abit since the last
>>> email.
>>>
>>>> normData <- SCAN("*.CEL", convThreshold = 0.01, verbose = TRUE) ##
>>>> expressionset
>>>> edata <- exprs(normData)
>>>> phenoData <- read.table("batchtargets1.txt", header = T) ## batch file
>>>> batch = phenoData$batch
>>>> mod = model.matrix(~1, data = data.frame(batch))
>>>> combat_edata = ComBat(dat=edata, batch=batch, mod=mod, numCovs=NULL,
>>>> par.prior=TRUE, prior.plots=TRUE)
>>>
>>>
>>> Found 0 batches
>>> Found 0  categorical covariate(s)
>>> Standardizing Data across genes
>>> Error in solve.default(t(design) %*% design) :
>>>   Lapack routine dgesv: system is exactly singular: U[1,1] = 0
>>>
>>> I checked the forums for this error and found a reply: 'Capitalize the
>>> "b" in the "Batch" header, and let me know if this works. Alternatively,
>>> use the ComBat from the sva package of bioconductor. This will be less
>>> finicky.'
>>>
>>> My Batch is capitalized and I used the sva package.
>>>
>>>
>>> I tried to find out why it showed 0 batches when my >phenoData shows the
>>> following table of 48 datasets (24 from each batch).
>>>
>>>         FileName Batch
>>> 1  b1.CEL     1
>>> 2  b2.CEL     1
>>> 3  b3.CEL     1
>>> 4     N1.CEL     1
>>> 5     N2.CEL     1
>>> 6     N3.CEL     1
>>> 7    c1.CEL     1
>>> 8    c2.CEL     1
>>> 9    c3.CEL     1
>>> 10   e1.CEL     1
>>> 11   e2.CEL     1
>>> 12   e3.CEL     1
>>> 13 d1.CEL     1
>>> 14 d2.CEL     1
>>> 15 d3.CEL     1
>>> 16    L1.CEL     2
>>> 17    L2.CEL     2
>>>
>>> When I typed,
>>>>
>>>> batch
>>>
>>> NULL
>>>
>>> I am not sure why batch = phenoData$batch gave me a NULL. I feel I have
>>> done something wrong here, but can't identify the issue.
>>>
>>> Cheers
>>>
>>> Joel Z Ma, PhD
>>>
>>> Dept. of Microbiology and Immunology
>>> The Peter Doherty Institute for Infection and Immunity
>>> University of Melbourne
>>> 792 Elizabeth Street
>>> Parkville
>>> Victoria, 3000
>>>
>>> Ph: +61 3 83440775
>>> E-mail: jzma at unimelb.edu.au
>>>
>>> ________________________________________
>>> From: Peter Langfelder [peter.langfelder at gmail.com]
>>> Sent: Monday, June 09, 2014 6:55 AM
>>> To: Joel Ma
>>> Cc: W. Evan Johnson; bioconductor at r-project.org
>>> Subject: Re: [BioC] Error when applying ComBat in SCAN.UPC
>>>
>>> On Sun, Jun 8, 2014 at 10:24 AM, Joel Ma <jzma at unimelb.edu.au> wrote:
>>>>
>>>> Hi Peter
>>>>
>>>> Thanks for the reply.
>>>>
>>>> What if I used ComBat separate after SCAN.UPC? I have tried that but I
>>>> couldn't get ComBat right.
>>>> This is what I have typed in after going through forums on ComBat.
>>>>
>>>>> Sdata <- SCAN("*.CEL", convThreshold = 0.01, verbose = TRUE)
>>>>> Bdata = ComBat(dat=Sdata, batch=batch, mod=batchtargets.txt, numCovs =
>>>>> 3, par.prior = TRUE, prior.plots = FALSE)
>>>>
>>>> Error in cbind(mod, batch) : object 'batchtargets.txt' not found
>>>>>
>>>>> Bdata = ComBat(dat=Sdata, batch=batchtargets.txt, numCovs = 3,
>>>>> par.prior = TRUE, prior.plots = FALSE)
>>>>
>>>> Error in cbind(mod, batch) : argument "mod" is missing, with no default
>>>>>
>>>>> Bdata <- ComBat(Sdata, sample$Batch,
>>>>>
>>>>> c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,
>>>>> 2,2,2,1,1,1,1,1,1,1,1,1))
>>>>
>>>> Error in sample$Batch : object of type 'closure' is not subsettable
>>>
>>>
>>> Don't take this the wrong way, but your code simply doesn't make
>>> sense, so much so that I don't even know what is it you are trying to
>>> achieve, so I cannot suggest corrections.
>>>
>>> Here are a few suggestions:
>>>
>>> 1. Read the help file for SCAN (type help("SCAN") in R). It returns an
>>> object of type ExpressionSet; to get the actual expression data from
>>> it, use the function (method) 'exprs' on it. Call the resulting object
>>> say 'normData'.
>>>
>>> 2. The object 'normData' can be fed to ComBat; read the help file and
>>> maybe try to find some examples of use for ComBat.
>>>
>>> 3. For ComBat, you need the batch variable (call it, for example,
>>> 'batch'); this is a vector with one element for each sample and you
>>> can code the batches say 1 and 2. Make sure the order of the vector
>>> 'batch' corresponds to the order of the samples in 'normData'.
>>>
>>> 4. You also need a model matrix that specifies covariates you want
>>> ComBat to take into account. If you have none, use the argument mod =
>>> model.matrix(~1, data = data.frame(batch)).
>>>
>>> 5. Run ComBat and see if you get any errors; if you do, copy and paste
>>> the relevant part of the error into your favorite search engine and
>>> read the suggestions on how to solve it.
>>>
>>> 6. If you still can't get it to work, email the list again.
>>>
>>>
>>> Hope this helps,
>>>
>>> Peter
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>
>> Natasha Sahgal | Postdoctoral Research Assistant
>> Centre for Molecular Oncology
>> Barts Cancer Institute - a Cancer Research UK Centre of Excellence
>> Queen Mary, University of London
>> John Vane Science Centre, Charterhouse Square, London EC1M 6BQ
>> Tel: +44 (0)20 7882 3560 | Fax: +44 (0)20 7882 3884 |
>> www.bci.qmul.ac.uk/research/centre-profiles/molecular-oncology.html
>>
>>
>>
>>
>> This email may contain information that is privileged, confidential or
>> otherwise protected from disclosure.
>> It must not be used by, or its contents copied or disclosed to, persons
>> other than the addressee.
>> If you have received this email in error please notify the sender
>> immediately and delete the email. This message has been scanned for viruses.
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list