[BioC] Problems with robustPca in pcaMethods

Henrik Bengtsson hb at biostat.ucsf.edu
Mon Mar 3 20:34:22 CET 2014


[cc:ing the maintainer of pcaMethods]

Author of matrixStats::weightedMedian() here.  I won't solve OPs
problem but I'll add some clues to the non-informative error message
from weightedMedian().

On Mon, Mar 3, 2014 at 9:35 AM, Julian Gehring <julian.gehring at embl.de> wrote:
> Hi,
>
> You may check if you have missing values 'NA's in your data.
>
> Best wishes
> Julian
>
>
> On 27/02/14 21:53, J Brown [guest] wrote:
>>
>> Hi all;
>>
>> I'm new to R and trying to use the pcaMethods package to analyze a qPCR dataset. My dataset contains many missing values and I think the module I want to use is robustPca, but when I try to apply it to my dataset I keep getting the error described below. Using nipalsPca on my dataset works without errors, so I don't think it's a data-format issue. Using robustPca on the pcaMethods sample dataset "metaboliteData", which has missing values, also works fine (although it warns about missing values), so it isn't a general problem with my install of R and the relevant packages.
>>
>> The traceback results seems to say that the error is caused by a weighted-median calculation that is part of the robustPca command, but I have no idea why this only comes up using my dataset: could it be because my dataset is already median-normalized (before importing to R)? Troubleshooting this is beyond my abilities at this point; I'd be grateful for any insight anyone can offer.
>>
>>> pca_results <- pca(centered_data, method = "robustPca", nPcs = 10, center = FALSE)
>> Error in if (!all(tmp)) { : missing value where TRUE/FALSE needed
>> In addition: Warning message:
>> In robustPca(prepres$data, nPcs = nPcs, ...) :
>>   Data is incomplete, it is not recommended to use robustPca for missing value estimation
>>
>>> traceback()
>> 7: weightedMedian.default(x[keep]/a, abs(a), interpolate = FALSE)
>> 6: weightedMedian(x[keep]/a, abs(a), interpolate = FALSE)
>> 5: FUN(newX[, i], ...)
>> 4: apply(x, 1, L1RegCoef, bk)
>> 3: robustSvd(Matrix)
>> 2: robustPca(prepres$data, nPcs = nPcs, ...)
>> 1: pca(centered_data, method = "robustPca", nPcs = 10, center = FALSE)

That error in weightedMedian() occurs because there are missing values
in either in x[keep]/a or in the weights abs(a) and argument 'na.rm'
defaults to NA(*).  It's better if robustSvd() would call
weightedMedian(x[keep]/a, abs(a), na.rm=TRUE, interpolate=FALSE), or
possibly na.rm=FALSE.

(*) With weightedMedian(..., na.rm=NA) one tells that function to
trust the data (including the weights) to have no missing values.
This option exists for efficiency reasons.  If there are missing
values, the na.rm=TRUE should be used.  If there could be missing
value, na.rm=FALSE should be used (in case NA is returned if there are
missing values).  That the default is NA is unconventional and I may
consider changing matrixStats to use the more commonly used
na.rm=FALSE (no promises though).

/Henrik

>>
>>  -- output of sessionInfo():
>>
>> R version 3.0.2 (2013-09-25)
>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] pcaMethods_1.52.1  Rcpp_0.11.0        matrixStats_0.8.14 Biobase_2.22.0     BiocGenerics_0.8.0
>>
>> loaded via a namespace (and not attached):
>> [1] R.methodsS3_1.6.1 tools_3.0.2
>>
>> --
>> Sent via the guest posting facility at bioconductor.org.
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list