[Bioc-devel] Support for Linux ARM64

Martin Grigorov m@rt|n@gr|gorov @end|ng |rom gm@||@com
Tue Jan 10 11:25:40 CET 2023


On Sun, Jan 8, 2023 at 12:24 AM Hervé Pagès <hpages.on.github using gmail.com>
wrote:

> On 05/01/2023 18:52, Vincent Carey wrote:
>
>
>
> On Thu, Jan 5, 2023 at 7:08 PM Vincent Carey <stvjc using channing.harvard.edu>
> wrote:
>
>>
>>
>> On Thu, Jan 5, 2023 at 1:44 PM Hervé Pagès <hpages.on.github using gmail.com>
>> wrote:
>>
>>> Hi Martin,
>>>
>>> Linux runs on many architectures, ARM64 is just one of them.
>>>
>>> Our daily builds have traditionally focused on 3 platforms: Intel-based
>>> Linux (Ubuntu 22.04), Windows, and Intel-based Mac. Note that we
>>> recently added ARM64-based Mac to our daily builds.
>>>
>>> One big difference between Linux and the other platforms is that we only
>>> produce binary packages for the latter. More precisely:
>>>
>>> - on the Linux builders: the daily builds only run 'R CMD INSTALL', 'R
>>> CMD build', and 'R CMD check', on each Bioconductor package,
>>>
>>> - on the Windows and Mac builders: the daily builds run all the above
>>> plus an additional step that we call the BUILD BIN step that produces a
>>> binary for each Bioconductor package.
>>>
>>> This means that on Linux, as well as on any other Unix-like OS that is
>>> not macOS (e.g. FreeBSD, OpenBSD, Solaris, HP-UX, etc...), users will
>>> install all their packages (Bioconductor and CRAN) **from source**. This
>>> should work as long as they are on a platform where R is supported and
>>> have the required compilers (C, C++, and Fortran).
>>>
>>> Note that if officially supporting a given platform means running the
>>> daily builds on that particular platform, then there's no way for us to
>>> do that because platform == OS + architecture, and the list of
>>> combinations of Unix-like OS's (Linux, FreeBSD, Solaris, etc...) +
>>> architectures (Intel, ARM64, Sparc, powerpc) is endless. Even if we
>>> narrow this list to Intel-based Linux, there are hundreds of Linux
>>> distributions around that use different kernel, compilers, package
>>> managers, etc...
>>>
>>> All this to say that, as far as the daily builds are concerned, we had
>>> to make choices, and those choices are based on the most commonly used
>>> platforms. Since all Bioconductor packages are tested daily on
>>> Intel-based Linux (Ubuntu 22.04), Windows, Intel-based Mac, and
>>> ARM64-based Mac, we have some reasonable confidence that they will work
>>> properly on these 4 platforms (still not a 100% guarantee of course,
>>> there's nothing like that).
>>>
>>> My understanding is that ARM64-based Linux is still a marginally used
>>> platform so probably not worth for us to allocate resources on adding it
>>> to our daily builds at the moment. If it ever becomes more mainstream in
>>> the future, then we will certainly reconsider. That does not mean that
>>> you can't use Bioconductor on a ARM64-based Linux machine **now**. I see
>>> no reason a priori why you couldn't install (from source) Bioconductor
>>> packages on this platform, and use them, as long as:
>>>
>>>
>> Thanks Hervé for a good overview of the issues.  I think there are a
>> couple
>> of reasons to keep this dialogue going (and there is now a community
>> slack channel
>> for further discussion: #arm-linux at community-bioc.slack.com.)
>>
>> The first reason is Martin's offer of resources to accomplish the support
>> aim.  What
>> exactly that support aim is remains to be made precise.  As you note, a
>> properly
>> configured system with R can use BiocManager::install to build from
>> source, but
>> there are a few additional things that can be done to produce binaries,
>> and perhaps
>> some of our software in BBS or some of the binary repo generation tools
>> could be
>> useful for Martin's group to make a relevant binary repo.  The
>> package-management
>> oriented process of Dirk Eddelbuettel's r2u
>> <https://github.com/eddelbuettel/r2u> also seems potentially relevant.
>> We also
>> have tooling to build all the CRAN dependencies that Bioc packages
>> declare.  This
>> is all in the open and it would be interesting to see how much work is
>> needed to
>> get solutions for ARM64 linux.  It could lead to some robustification of
>> the existing
>> build machinery.  I am not offering to do it, but the fact that all the
>> tooling is out in
>> the open may not be fully clear and I am just mentioning this.
>>
>> The second reason to stay engaged is the nature of the ARM platform,
>> which is
>> said to require lower power consumption for equivalent throughput.  It
>> may be
>> environmentally beneficial to be ahead of the curve in being able to work
>> with
>> this platform.  Earlier I linked to a github issue indicating that rocker
>> now has a dual
>> platform container image including arm64 support but I don't know if that
>> really
>> addresses the issue at hand. Maybe I need to go onto a graviton machine
>> to find out.
>>
>
> So I did this, and here are some notes:
>
> 1) it is easy to get such a machine in AWS, a1.2xlarge
> Linux 10a568f32a1c 4.14.296-222.539.amzn2.aarch64 #1 SMP Wed Oct 26
> 20:36:51 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux
> 2) using the rocker/rstudio:latest-daily I could get DESeq2 installed in
> about 20 minutes of
> compilation of dependent packages
> 3)  to get a checkable version of DESeq2 I needed to enhance the rocker
> environment
>     4  apt-get install libxml2-dev
>     8  apt install libpng-dev
>    12  apt install libgit2-dev
>    14  apt install -y libmagick++-dev
>    16  apt install -y libharfbuzz-dev libfribidi-dev
> 4) DESeq2 check in release version (1.38.2) failed (but it passes on intel
> linux):
>
> Running examples in ‘DESeq2-Ex.R’ failed
> The error most likely occurred in:
>
> > ### Name: unmix
> > ### Title: Unmix samples using loss in a variance stabilized space
> > ### Aliases: unmix
> >
> > ### ** Examples
> >
> >
> > # some artificial data
> > cts <- matrix(c(80,50,1,100,
> +                 1,1,60,100,
> +                 0,50,60,100), ncol=4, byrow=TRUE)
> > # make a DESeqDataSet
> > dds <- DESeqDataSetFromMatrix(cts,
> +   data.frame(row.names=seq_len(ncol(cts))), ~1)
> converting counts to integer mode
> > colnames(dds) <- paste0("sample",1:4)
> >
> > # note! here you would instead use
> > # estimateSizeFactors() to do actual normalization
> > sizeFactors(dds) <- rep(1, ncol(dds))
> >
> > norm.cts <- counts(dds, normalized=TRUE)
> >
> > # 'pure' should also have normalized counts...
> > pure <- matrix(c(10,0,0,
> +                  0,0,10,
> +                  0,10,0), ncol=3, byrow=TRUE)
> > colnames(pure) <- letters[1:3]
> >
> > # for real data, you need to find alpha after fitting
> estimateDispersions()
> > mix <- unmix(norm.cts, pure, alpha=0.01)
> Warning in sqrt(alpha * q) : NaNs produced
> Error in optim(par = rep(1, ncol(pure)), fn = sumLossVST, gr = NULL, i,  :
>   L-BFGS-B needs finite values of 'fn'
> Calls: unmix -> lapply -> lapply -> FUN -> optim
>
> Hmm.. this ain't good :-(
>
>
> Is there bugged/nonportable code somewhere in the stack underlying this
> example?
>
> Probably.
>
> That could take some time to figure out.
>
> I conclude that the mechanics of working with ARM64 and R to process
> Bioconductor
> packages are very tractable, but the work needed to get the whole
> ecosystem to a
> favorable state, as usable as it is for intel linux or mac or windows, may
> be laborious.
>
> OK so maybe a good start would be to try to set up the daily builds (BBS)
> on one of those AWS 1.2xlarge or a1.4xlarge instances, or, even better, on
> one of the VMs that Martin is offering? If we use ARM64 Ubuntu on it,
> setting up the builds there should be very similar to what we do for our
> current Intel Ubuntu build machines, which is easy and well-documented.
>
I've sent you privately the SSH details for an Ubuntu 22.04 ARM64 VM!
Please let me know if I can help you anyhow with the setup and testing !

Kind regards,
Martin

> H.
>
>
>
>
>
>
>> In any case it is not so often that we get a request for enhancements
>> that includes
>> an offer of VMs and person power so I want to be sure we don't lose the
>> thread
>> prematurely.
>>
>>
>>
>>
>>
>>
>>
>>> - R is supported on your ARM64-based Linux machine
>>>
>>> - you have compilers that are supported by R
>>>
>>> - you have the external libraries that are required by some CRAN and/or
>>> Bioconductor packages.
>>>
>>> Hope this helps,
>>>
>>> H.
>>>
>>> On 05/01/2023 02:01, Martin Grigorov wrote:
>>> > Dear community,
>>> >
>>> > Happy and successful new year!
>>> >
>>> > Appologies if this has been discussed before but
>>> > https://stat.ethz.ch/pipermail/bioc-devel/ does not provide search
>>> > facilities and my googling didn't help much!
>>> >
>>> > I'd like to ask whether Linux ARM64 is officially supported ?
>>> > I know that Mac ARM64 is supported since 3.16 [1] [2].
>>> > I cannot find such test results for Linux ARM64 and the site search [3]
>>> > also mentions "arm64" only in context of "macOS".
>>> > In addition the Docker images are also single-platform [4]
>>> (linux/amd64).
>>> >
>>> > How can we help to add support for Linux ARM64 ?
>>> > My employer is willing to donate VMs and man power if the community is
>>> > interested in adding support for Linux ARM64!
>>> >
>>> >
>>> > Regards,
>>> > Martin
>>> >
>>> > 1. https://bioconductor.org/news/bioc_3_16_release/
>>> > 2. https://bioconductor.org/checkResults/3.17/bioc-mac-arm64-LATEST/
>>> > 3. https://bioconductor.org/help/search/index.html?q=arm64/
>>> > 4. https://hub.docker.com/r/bioconductor/bioconductor_docker/tags
>>> >
>>> >       [[alternative HTML version deleted]]
>>> >
>>> > _______________________________________________
>>> > Bioc-devel using r-project.org mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>> --
>>> Hervé Pagès
>>>
>>> Bioconductor Core Team
>>> hpages.on.github using gmail.com
>>>
>>> _______________________________________________
>>> Bioc-devel using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>
> The information in this e-mail is intended only for th...{{dropped:23}}



More information about the Bioc-devel mailing list