[BioC] Error when applying ComBat in SCAN.UPC

James W. MacDonald jmacdon at uw.edu
Mon Jun 9 16:59:57 CEST 2014


Hi Joel,

It always helps to look at things to make sure you are doing what you 
expect. As an example, I think you are expecting that this does 
something different than it actually does:

mod = model.matrix(~1, data = data.frame(batch))

This appears to be your attempt to follow the sva vignette. But note 
that in the sva vignette, the analogous code is used to create the NULL 
model, not the full model. And also note that the data argument in this 
case is only useful in that it tells model.matrix() how many 1s to put 
in the design matrix. In other words

model.matrix(~1, data = somedataframe)

is telling R to give you a design matrix with just an intercept, and use 
somedataframe to determine the number of rows for the design matrix. It 
appears you might think that this is telling R to use the first column 
of your data.frame, which is incorrect. Instead, you have to pass the 
column name of the data.frame that you want to use.

Best,

Jim


On 6/9/2014 8:06 AM, Joel Ma wrote:
> Hi Natasha
>
> Thanks for pointing it out. It worked but another error message appeared. It is the same one I encountered before.
>
> Found 2 batches
> Found 0  categorical covariate(s)
> Standardizing Data across genes
> Fitting L/S model and finding priors
> Finding parametric adjustments
> Error in while (change > conv) { : missing value where TRUE/FALSE needed
>
> I found 4 probesets with zero values for all samples. I am guessing those are what Peter mentioned as zero variances. I will try to remove them and see what I get.
>
> Thanks again.
>
> Cheers
>
> Joel
> ________________________________________
> From: Natasha Sahgal [n.sahgal at qmul.ac.uk]
> Sent: Monday, June 09, 2014 9:48 PM
> To: Joel Ma; Peter Langfelder
> Cc: bioconductor at r-project.org
> Subject: Re: [BioC] Error when applying ComBat in SCAN.UPC
>
> Hi Joel,
>
> I think you have a typo in your code:
>
> batch = phenoData$batch should be batch = phenoData$Batch.
>
>
>
> HTH,
> Natasha
>
> On 09/06/2014 12:10, "Joel Ma" <jzma at unimelb.edu.au> wrote:
>
>> Hi Peter
>>
>> Thanks for the suggestions. Apologies for my coding. As you can already
>> tell, I have no expertise in this area of work. I have tried your
>> suggestions and would like some advice.
>>
>> This is what I did. Hopefully, my coding has improved abit since the last
>> email.
>>
>>> normData <- SCAN("*.CEL", convThreshold = 0.01, verbose = TRUE) ##
>>> expressionset
>>> edata <- exprs(normData)
>>> phenoData <- read.table("batchtargets1.txt", header = T) ## batch file
>>> batch = phenoData$batch
>>> mod = model.matrix(~1, data = data.frame(batch))
>>> combat_edata = ComBat(dat=edata, batch=batch, mod=mod, numCovs=NULL,
>>> par.prior=TRUE, prior.plots=TRUE)
>>
>> Found 0 batches
>> Found 0  categorical covariate(s)
>> Standardizing Data across genes
>> Error in solve.default(t(design) %*% design) :
>>   Lapack routine dgesv: system is exactly singular: U[1,1] = 0
>>
>> I checked the forums for this error and found a reply: 'Capitalize the
>> "b" in the "Batch" header, and let me know if this works. Alternatively,
>> use the ComBat from the sva package of bioconductor. This will be less
>> finicky.'
>>
>> My Batch is capitalized and I used the sva package.
>>
>>
>> I tried to find out why it showed 0 batches when my >phenoData shows the
>> following table of 48 datasets (24 from each batch).
>>
>>         FileName Batch
>> 1  b1.CEL     1
>> 2  b2.CEL     1
>> 3  b3.CEL     1
>> 4     N1.CEL     1
>> 5     N2.CEL     1
>> 6     N3.CEL     1
>> 7    c1.CEL     1
>> 8    c2.CEL     1
>> 9    c3.CEL     1
>> 10   e1.CEL     1
>> 11   e2.CEL     1
>> 12   e3.CEL     1
>> 13 d1.CEL     1
>> 14 d2.CEL     1
>> 15 d3.CEL     1
>> 16    L1.CEL     2
>> 17    L2.CEL     2
>>
>> When I typed,
>>> batch
>> NULL
>>
>> I am not sure why batch = phenoData$batch gave me a NULL. I feel I have
>> done something wrong here, but can't identify the issue.
>>
>> Cheers
>>
>> Joel Z Ma, PhD
>>
>> Dept. of Microbiology and Immunology
>> The Peter Doherty Institute for Infection and Immunity
>> University of Melbourne
>> 792 Elizabeth Street
>> Parkville
>> Victoria, 3000
>>
>> Ph: +61 3 83440775
>> E-mail: jzma at unimelb.edu.au
>>
>> ________________________________________
>> From: Peter Langfelder [peter.langfelder at gmail.com]
>> Sent: Monday, June 09, 2014 6:55 AM
>> To: Joel Ma
>> Cc: W. Evan Johnson; bioconductor at r-project.org
>> Subject: Re: [BioC] Error when applying ComBat in SCAN.UPC
>>
>> On Sun, Jun 8, 2014 at 10:24 AM, Joel Ma <jzma at unimelb.edu.au> wrote:
>>> Hi Peter
>>>
>>> Thanks for the reply.
>>>
>>> What if I used ComBat separate after SCAN.UPC? I have tried that but I
>>> couldn't get ComBat right.
>>> This is what I have typed in after going through forums on ComBat.
>>>
>>>> Sdata <- SCAN("*.CEL", convThreshold = 0.01, verbose = TRUE)
>>>> Bdata = ComBat(dat=Sdata, batch=batch, mod=batchtargets.txt, numCovs =
>>>> 3, par.prior = TRUE, prior.plots = FALSE)
>>> Error in cbind(mod, batch) : object 'batchtargets.txt' not found
>>>> Bdata = ComBat(dat=Sdata, batch=batchtargets.txt, numCovs = 3,
>>>> par.prior = TRUE, prior.plots = FALSE)
>>> Error in cbind(mod, batch) : argument "mod" is missing, with no default
>>>> Bdata <- ComBat(Sdata, sample$Batch,
>>>> c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,
>>>> 2,2,2,1,1,1,1,1,1,1,1,1))
>>> Error in sample$Batch : object of type 'closure' is not subsettable
>>
>> Don't take this the wrong way, but your code simply doesn't make
>> sense, so much so that I don't even know what is it you are trying to
>> achieve, so I cannot suggest corrections.
>>
>> Here are a few suggestions:
>>
>> 1. Read the help file for SCAN (type help("SCAN") in R). It returns an
>> object of type ExpressionSet; to get the actual expression data from
>> it, use the function (method) 'exprs' on it. Call the resulting object
>> say 'normData'.
>>
>> 2. The object 'normData' can be fed to ComBat; read the help file and
>> maybe try to find some examples of use for ComBat.
>>
>> 3. For ComBat, you need the batch variable (call it, for example,
>> 'batch'); this is a vector with one element for each sample and you
>> can code the batches say 1 and 2. Make sure the order of the vector
>> 'batch' corresponds to the order of the samples in 'normData'.
>>
>> 4. You also need a model matrix that specifies covariates you want
>> ComBat to take into account. If you have none, use the argument mod =
>> model.matrix(~1, data = data.frame(batch)).
>>
>> 5. Run ComBat and see if you get any errors; if you do, copy and paste
>> the relevant part of the error into your favorite search engine and
>> read the suggestions on how to solve it.
>>
>> 6. If you still can't get it to work, email the list again.
>>
>>
>> Hope this helps,
>>
>> Peter
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> Natasha Sahgal | Postdoctoral Research Assistant
> Centre for Molecular Oncology
> Barts Cancer Institute - a Cancer Research UK Centre of Excellence
> Queen Mary, University of London
> John Vane Science Centre, Charterhouse Square, London EC1M 6BQ
> Tel: +44 (0)20 7882 3560 | Fax: +44 (0)20 7882 3884 | www.bci.qmul.ac.uk/research/centre-profiles/molecular-oncology.html
>
>
>
>
> This email may contain information that is privileged, confidential or otherwise protected from disclosure.
> It must not be used by, or its contents copied or disclosed to, persons other than the addressee.
> If you have received this email in error please notify the sender immediately and delete the email. This message has been scanned for viruses.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list