[Bioc-devel] Support for Linux ARM64

Vincent Carey @tvjc @end|ng |rom ch@nn|ng@h@rv@rd@edu
Fri Jan 6 03:52:41 CET 2023


On Thu, Jan 5, 2023 at 7:08 PM Vincent Carey <stvjc using channing.harvard.edu>
wrote:

>
>
> On Thu, Jan 5, 2023 at 1:44 PM Hervé Pagès <hpages.on.github using gmail.com>
> wrote:
>
>> Hi Martin,
>>
>> Linux runs on many architectures, ARM64 is just one of them.
>>
>> Our daily builds have traditionally focused on 3 platforms: Intel-based
>> Linux (Ubuntu 22.04), Windows, and Intel-based Mac. Note that we
>> recently added ARM64-based Mac to our daily builds.
>>
>> One big difference between Linux and the other platforms is that we only
>> produce binary packages for the latter. More precisely:
>>
>> - on the Linux builders: the daily builds only run 'R CMD INSTALL', 'R
>> CMD build', and 'R CMD check', on each Bioconductor package,
>>
>> - on the Windows and Mac builders: the daily builds run all the above
>> plus an additional step that we call the BUILD BIN step that produces a
>> binary for each Bioconductor package.
>>
>> This means that on Linux, as well as on any other Unix-like OS that is
>> not macOS (e.g. FreeBSD, OpenBSD, Solaris, HP-UX, etc...), users will
>> install all their packages (Bioconductor and CRAN) **from source**. This
>> should work as long as they are on a platform where R is supported and
>> have the required compilers (C, C++, and Fortran).
>>
>> Note that if officially supporting a given platform means running the
>> daily builds on that particular platform, then there's no way for us to
>> do that because platform == OS + architecture, and the list of
>> combinations of Unix-like OS's (Linux, FreeBSD, Solaris, etc...) +
>> architectures (Intel, ARM64, Sparc, powerpc) is endless. Even if we
>> narrow this list to Intel-based Linux, there are hundreds of Linux
>> distributions around that use different kernel, compilers, package
>> managers, etc...
>>
>> All this to say that, as far as the daily builds are concerned, we had
>> to make choices, and those choices are based on the most commonly used
>> platforms. Since all Bioconductor packages are tested daily on
>> Intel-based Linux (Ubuntu 22.04), Windows, Intel-based Mac, and
>> ARM64-based Mac, we have some reasonable confidence that they will work
>> properly on these 4 platforms (still not a 100% guarantee of course,
>> there's nothing like that).
>>
>> My understanding is that ARM64-based Linux is still a marginally used
>> platform so probably not worth for us to allocate resources on adding it
>> to our daily builds at the moment. If it ever becomes more mainstream in
>> the future, then we will certainly reconsider. That does not mean that
>> you can't use Bioconductor on a ARM64-based Linux machine **now**. I see
>> no reason a priori why you couldn't install (from source) Bioconductor
>> packages on this platform, and use them, as long as:
>>
>>
> Thanks Hervé for a good overview of the issues.  I think there are a couple
> of reasons to keep this dialogue going (and there is now a community slack
> channel
> for further discussion: #arm-linux at community-bioc.slack.com.)
>
> The first reason is Martin's offer of resources to accomplish the support
> aim.  What
> exactly that support aim is remains to be made precise.  As you note, a
> properly
> configured system with R can use BiocManager::install to build from
> source, but
> there are a few additional things that can be done to produce binaries,
> and perhaps
> some of our software in BBS or some of the binary repo generation tools
> could be
> useful for Martin's group to make a relevant binary repo.  The
> package-management
> oriented process of Dirk Eddelbuettel's r2u
> <https://github.com/eddelbuettel/r2u> also seems potentially relevant.
> We also
> have tooling to build all the CRAN dependencies that Bioc packages
> declare.  This
> is all in the open and it would be interesting to see how much work is
> needed to
> get solutions for ARM64 linux.  It could lead to some robustification of
> the existing
> build machinery.  I am not offering to do it, but the fact that all the
> tooling is out in
> the open may not be fully clear and I am just mentioning this.
>
> The second reason to stay engaged is the nature of the ARM platform, which
> is
> said to require lower power consumption for equivalent throughput.  It may
> be
> environmentally beneficial to be ahead of the curve in being able to work
> with
> this platform.  Earlier I linked to a github issue indicating that rocker
> now has a dual
> platform container image including arm64 support but I don't know if that
> really
> addresses the issue at hand. Maybe I need to go onto a graviton machine to
> find out.
>

So I did this, and here are some notes:

1) it is easy to get such a machine in AWS, a1.2xlarge
Linux 10a568f32a1c 4.14.296-222.539.amzn2.aarch64 #1 SMP Wed Oct 26
20:36:51 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux
2) using the rocker/rstudio:latest-daily I could get DESeq2 installed in
about 20 minutes of
compilation of dependent packages
3)  to get a checkable version of DESeq2 I needed to enhance the rocker
environment
    4  apt-get install libxml2-dev
    8  apt install libpng-dev
   12  apt install libgit2-dev
   14  apt install -y libmagick++-dev
   16  apt install -y libharfbuzz-dev libfribidi-dev
4) DESeq2 check in release version (1.38.2) failed (but it passes on intel
linux):

Running examples in ‘DESeq2-Ex.R’ failed
The error most likely occurred in:

> ### Name: unmix
> ### Title: Unmix samples using loss in a variance stabilized space
> ### Aliases: unmix
>
> ### ** Examples
>
>
> # some artificial data
> cts <- matrix(c(80,50,1,100,
+                 1,1,60,100,
+                 0,50,60,100), ncol=4, byrow=TRUE)
> # make a DESeqDataSet
> dds <- DESeqDataSetFromMatrix(cts,
+   data.frame(row.names=seq_len(ncol(cts))), ~1)
converting counts to integer mode
> colnames(dds) <- paste0("sample",1:4)
>
> # note! here you would instead use
> # estimateSizeFactors() to do actual normalization
> sizeFactors(dds) <- rep(1, ncol(dds))
>
> norm.cts <- counts(dds, normalized=TRUE)
>
> # 'pure' should also have normalized counts...
> pure <- matrix(c(10,0,0,
+                  0,0,10,
+                  0,10,0), ncol=3, byrow=TRUE)
> colnames(pure) <- letters[1:3]
>
> # for real data, you need to find alpha after fitting
estimateDispersions()
> mix <- unmix(norm.cts, pure, alpha=0.01)
Warning in sqrt(alpha * q) : NaNs produced
Error in optim(par = rep(1, ncol(pure)), fn = sumLossVST, gr = NULL, i,  :
  L-BFGS-B needs finite values of 'fn'
Calls: unmix -> lapply -> lapply -> FUN -> optim

Is there bugged/nonportable code somewhere in the stack underlying this
example?
That could take some time to figure out.

I conclude that the mechanics of working with ARM64 and R to process
Bioconductor
packages are very tractable, but the work needed to get the whole ecosystem
to a
favorable state, as usable as it is for intel linux or mac or windows, may
be laborious.




> In any case it is not so often that we get a request for enhancements that
> includes
> an offer of VMs and person power so I want to be sure we don't lose the
> thread
> prematurely.
>
>
>
>
>
>
>
>> - R is supported on your ARM64-based Linux machine
>>
>> - you have compilers that are supported by R
>>
>> - you have the external libraries that are required by some CRAN and/or
>> Bioconductor packages.
>>
>> Hope this helps,
>>
>> H.
>>
>> On 05/01/2023 02:01, Martin Grigorov wrote:
>> > Dear community,
>> >
>> > Happy and successful new year!
>> >
>> > Appologies if this has been discussed before but
>> > https://stat.ethz.ch/pipermail/bioc-devel/ does not provide search
>> > facilities and my googling didn't help much!
>> >
>> > I'd like to ask whether Linux ARM64 is officially supported ?
>> > I know that Mac ARM64 is supported since 3.16 [1] [2].
>> > I cannot find such test results for Linux ARM64 and the site search [3]
>> > also mentions "arm64" only in context of "macOS".
>> > In addition the Docker images are also single-platform [4]
>> (linux/amd64).
>> >
>> > How can we help to add support for Linux ARM64 ?
>> > My employer is willing to donate VMs and man power if the community is
>> > interested in adding support for Linux ARM64!
>> >
>> >
>> > Regards,
>> > Martin
>> >
>> > 1. https://bioconductor.org/news/bioc_3_16_release/
>> > 2. https://bioconductor.org/checkResults/3.17/bioc-mac-arm64-LATEST/
>> > 3. https://bioconductor.org/help/search/index.html?q=arm64/
>> > 4. https://hub.docker.com/r/bioconductor/bioconductor_docker/tags
>> >
>> >       [[alternative HTML version deleted]]
>> >
>> > _______________________________________________
>> > Bioc-devel using r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>> --
>> Hervé Pagès
>>
>> Bioconductor Core Team
>> hpages.on.github using gmail.com
>>
>> _______________________________________________
>> Bioc-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>

-- 
The information in this e-mail is intended only for the ...{{dropped:18}}



More information about the Bioc-devel mailing list