[Bioc-devel] python function in Basilisk never finishes
Elzbieta Gralinska
gr@||n@k@ @end|ng |rom mo|gen@mpg@de
Thu Apr 14 17:10:41 CEST 2022
Dear Quang,
thanks a lot for your reply and the suggestion about torch_svd(). We
tried it out in the past, but unfortunately it wasn't the most optimal
solution for us, and at the end we had to use pytorch together with the
'reticulate' package. This solution works very well for us, however in
the meantime we were suggested to use 'basilisk' and now we run into
problems when applying basilisk to bigger data sets. Since it works now
only for small data sets, we don't know what is the reason for it. Any
further help will be very much appreciated.
But thank you for the suggestion to use torch_svd(), we will keep it in
mind.
Best wishes,
Ela
On 4/14/22 16:17, Quang Nguyen wrote:
> This doesn't really address Basilisk and I apologize if you already knew
> about this but there are torch bindings in R where the torch_svd function
> is available (https://torch.mlverse.org/docs/reference/torch_svd.html).
> Depending on the torch R package instead of pytorch might simplify
> dependency management for users esp. if you're mostly relying on one
> function from the library.
>
> Hope this helps,
>
> Quang
>
> On Thu, Apr 14, 2022 at 7:36 AM Clemens Kohl <kohl using molgen.mpg.de> wrote:
>
>> Hello everyone,
>>
>> I am one of the developers of the package APL
>> (https://github.com/VingronLab/APL) for the visualization and analysis
>> of single cell transcriptomics data.
>>
>> To speed up the singular value decomposition of large matrices we use
>> torch.svd from pytorch and are trying to simplify the installation of
>> the python dependencies through the Bioconductor package basilisk
>> (http://bioconductor.org/packages/release/bioc/html/basilisk.html).
>>
>> Our current implementation with basilisk can be found here:
>> https://github.com/VingronLab/APL/tree/dev
>>
>> However we run into a problem we are so far unable to figure out, and my
>> hope is that someone with more experience in using basilisk could point
>> us into the right direction.
>>
>> When running run_cacomp(matrix, python = TRUE) (the function that
>> ultimately calls torch.svd within basiliskRun) it finishes for very
>> small matrices (an example for such a small dataset can be found in
>> /tests/testthat/testdata/countries.rda), but for larger datasets it
>> keeps running without any errors while continuously using up CPU
>> resources. An example that does not finish can be found when running
>> run_cacomp on the darmanis dataset in our vignette. When we previously
>> only used reticulate we did not observe such a behaviour.
>>
>> I assume this is due to some behaviour of basilisk that I do not
>> understand yet. Is there a parameter I can set that would prevent this
>> from happening?
>>
>> A less urgent problem is that programs that run within the basilisk
>> conda environment seem to sometimes use shared system libraries instead
>> of using the libraries installed in the conda environment. On one of our
>> test machines it failed because it used an older gcc library outside the
>> conda environment. When explicitly instructed to preferentially use the
>> conda libraries by setting
>>
>> export LD_LIBRARY_PATH=||/path/to/temp/basilisk/conda/environment/
>>
>> we can make it use the correct libraries, but this would not be a
>> solution suitable for most users.
>>
>>
>> Many thanks in advance!
>>
>>
>> Best regards,
>>
>> Clemens Kohl
>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
Elzbieta Gralinska,
Department of Computational Molecular Biology
Max Planck Institute for Molecular Genetics
Ihnestr. 63 - 73,
14195 Berlin, Germany
More information about the Bioc-devel
mailing list