[Bioc-devel] New ExperimentHub resource and some related questions

Martin Morgan mtmorg@n@bioc @ending from gm@il@com
Sun Dec 23 17:06:37 CET 2018


This email is enough to start the conversation, but the person who will help is on holiday until approximately January 3 so a response will be delayed.

Martin

On 12/21/18, 7:26 PM, "Bioc-devel on behalf of Lu, Dongyi (Lambda)" <bioc-devel-bounces using r-project.org on behalf of dlu2 using caltech.edu> wrote:

    I don’t mean to name the package “SingleCell”. I was referring to the biocView. Also, BUS format is quite different from the 10x molecule info, since while CellRanger aligns reads to the genome with STAR, the BUS file is generated by pseudoalignment to a transcriptome index and gives the set of transcripts a read is compatible to rather than which gene a read aligns to.
    
    In the ExperimentHub vignette about creating a new ExperimentHub package, we should contact a Bioconductor team member to upload the data. So does it mean that I directly email one of the core team members?
    
    Lambda
    
    On 12/21/18, 3:02 AM, "Bioc-devel on behalf of bioc-devel-request using r-project.org" <bioc-devel-bounces using r-project.org on behalf of bioc-devel-request using r-project.org> wrote:
    
        Send Bioc-devel mailing list submissions to
        	bioc-devel using r-project.org
        
        To subscribe or unsubscribe via the World Wide Web, visit
        	https://stat.ethz.ch/mailman/listinfo/bioc-devel
        or, via email, send a message with subject or body 'help' to
        	bioc-devel-request using r-project.org
        
        You can reach the person managing the list at
        	bioc-devel-owner using r-project.org
        
        When replying, please edit your Subject line so it is more specific
        than "Re: Contents of Bioc-devel digest..."
        
        
        Today's Topics:
        
           1. Re:  New ExperimentHub resource and some related questions
              (Aaron Lun)
           2. Re:  New ExperimentHub resource and some related questions
              (Shepherd, Lori)
           3. Re: Aliasing `]` breaks BiocCheck::BiocCheck() version 1.18.0
              (Martin Morgan)
           4. Re: Aliasing `]` breaks BiocCheck::BiocCheck() version 1.18.0
              (Tierney, Luke)
           5. Re: Compilation flags, CHECK errors and BiocNeighbors
              (Obenchain, Valerie)
        
        ----------------------------------------------------------------------
        
        Message: 1
        Date: Thu, 20 Dec 2018 12:00:20 +0000
        From: Aaron Lun <infinite.monkeys.with.keyboards using gmail.com>
        To: bioc-devel <bioc-devel using r-project.org>
        Subject: Re: [Bioc-devel]  New ExperimentHub resource and some related
        	questions
        Message-ID: <9BF95433-AF04-431B-B71D-62425195DEBE using gmail.com>
        Content-Type: text/plain; charset="utf-8"
        
        I presume your package is not actually called “SingleCell” (in point 1). This would be pretty confusing wjem compared to the simpleSingleCell package, the SingleCellExperiment package, and the SingleCell biocViews term itself. It would probably make more sense to call it BUStoolsR or some other appropriate pun (e.g., RBUS, which is funniest when it gets to version 3.8.0.).
        
        Also, at first glance, the BUS format seems pretty similar to 10X’s molecule information file, for which the DropletUtils package has a series of reader functions. You may find some of the code there useful for your package. I might also add a readBUS() function to DropletUtils if this turns out to be a popular format for droplet data, though TBH the sparse matrix is a much more common starting point.
        
        -A
        
        > On 20 Dec 2018, at 01:42, Lu, Dongyi (Lambda) <dlu2 using caltech.edu> wrote:
        > 
        > Hi everyone,
        > 
        > I’m writing a package (biocViews SinigleCell) that converts files of the BUS format (standing for Barcode, UMI, Set, see https://www.biorxiv.org/content/early/2018/11/21/472571) into a sparse matrix in R that can be used in Seurat and SingleCellExperiment. In order to write the examples and the vignette, I’m also putting the data itself into a package for ExperimentHub. The data used here are some mixed human and mouse cells from 10x. Here are my questions:
        > 
        > 
        >  1.  In the documentation for `ExperimentHubData::makeExperimentHubMetadata`, the fields `RDataClass` and `DispatchClass` are required. However, this accompanying dataset package is meant to download text files (generated by command line tools outside R) to disk rather than into the R session, and it’s the job of the SingleCell package to converts the text files into a sparse matrix. There is a website documenting how the command line tools were used to generate the text files. So is this dataset still appropriate for ExperimentHub?
        >  2.  If it is appropriate, then what shall I put in `RDataClass` and `DispatchClass`?
        > 
        > Thanks,
        > Lambda
        > 
        > 	[[alternative HTML version deleted]]
        > 
        > _______________________________________________
        > Bioc-devel using r-project.org mailing list
        > https://stat.ethz.ch/mailman/listinfo/bioc-devel
        
        
        
        
        ------------------------------
        
        Message: 2
        Date: Thu, 20 Dec 2018 12:05:57 +0000
        From: "Shepherd, Lori" <Lori.Shepherd using RoswellPark.org>
        To: "Lu, Dongyi (Lambda)" <dlu2 using caltech.edu>,
        	"bioc-devel using r-project.org" <bioc-devel using r-project.org>
        Subject: Re: [Bioc-devel]  New ExperimentHub resource and some related
        	questions
        Message-ID:
        	<MW2PR12MB23645E21836B066C9E38F9DDF9BF0 using MW2PR12MB2364.namprd12.prod.outlook.com>
        	
        Content-Type: text/plain; charset="utf-8"
        
        There is a DispatchClass  -  FilePath -  That will download the file and give you the path to the file in the cache location rather than loading it to the R session -  You then can use the file path in whatever read/load/etc method you deem fit.
        
        RDataClass  - I would either say character or matrix - knowing that there will be instructions on how to load the data somewhere in your package -
        
        
        
        Lori Shepherd
        
        Bioconductor Core Team
        
        Roswell Park Cancer Institute
        
        Department of Biostatistics & Bioinformatics
        
        Elm & Carlton Streets
        
        Buffalo, New York 14263
        
        ________________________________
        From: Bioc-devel <bioc-devel-bounces using r-project.org> on behalf of Lu, Dongyi (Lambda) <dlu2 using caltech.edu>
        Sent: Wednesday, December 19, 2018 8:42:39 PM
        To: bioc-devel using r-project.org
        Subject: [Bioc-devel] New ExperimentHub resource and some related questions
        
        Hi everyone,
        
        I�m writing a package (biocViews SinigleCell) that converts files of the BUS format (standing for Barcode, UMI, Set, see https://www.biorxiv.org/content/early/2018/11/21/472571) into a sparse matrix in R that can be used in Seurat and SingleCellExperiment. In order to write the examples and the vignette, I�m also putting the data itself into a package for ExperimentHub. The data used here are some mixed human and mouse cells from 10x. Here are my questions:
        
        
          1.  In the documentation for `ExperimentHubData::makeExperimentHubMetadata`, the fields `RDataClass` and `DispatchClass` are required. However, this accompanying dataset package is meant to download text files (generated by command line tools outside R) to disk rather than into the R session, and it�s the job of the SingleCell package to converts the text files into a sparse matrix. There is a website documenting how the command line tools were used to generate the text files. So is this dataset still appropriate for ExperimentHub?
          2.  If it is appropriate, then what shall I put in `RDataClass` and `DispatchClass`?
        
        Thanks,
        Lambda
        
                [[alternative HTML version deleted]]
        
        _______________________________________________
        Bioc-devel using r-project.org mailing list
        https://stat.ethz.ch/mailman/listinfo/bioc-devel
        
        
        This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
        	[[alternative HTML version deleted]]
        
        
        
        
        ------------------------------
        
        Message: 3
        Date: Thu, 20 Dec 2018 14:17:04 +0000
        From: Martin Morgan <mtmorgan.bioc using gmail.com>
        To: "Tierney, Luke" <luke-tierney using uiowa.edu>, "Shepherd, Lori"
        	<Lori.Shepherd using RoswellPark.org>
        Cc: bioc-devel <bioc-devel using r-project.org>
        Subject: Re: [Bioc-devel] Aliasing `]` breaks BiocCheck::BiocCheck()
        	version 1.18.0
        Message-ID:
        	<MWHPR05MB3582C1F459721640BDE93C55F9BF0 using MWHPR05MB3582.namprd05.prod.outlook.com>
        	
        Content-Type: text/plain; charset="utf-8"
        
        this comes from `findGlobals()`
        
        > foo <- `[`
        > findGlobals(foo)
        Error in makeUsageCollector(fun, ...) : only works for closures
        > traceback()
        4: stop("only works for closures")
        3: makeUsageCollector(fun, ...)
        2: collectUsage(fun, enterGlobal = enter)
        1: findGlobals(foo)
        
        In the bigger context it is in code that looks for poor 'coding practice', in this particular case looking for use of T / F rather than TRUE / FALSE, where the logic is to parse each function for use of global variables, and then to search for T / F amongst those.
        
        The full traceback when run on the package at https://github.com/mtmorgan/PkgA/tree/BiocCheck-sbs
        
        * Checking coding practice...
        Error in makeUsageCollector(fun, ...) : only works for closures
        > traceback()
        9: stop("only works for closures")
        8: makeUsageCollector(fun, ...)
        7: collectUsage(fun, enterGlobal = enter)
        6: findGlobals(value)
        5: FUN(X[[i]], ...)
        4: lapply(objs, FUN = function(obj) {
               value = env[[obj]]
               if (is.function(value)) 
                   findGlobals(value)
               else character(0)
           })
        3: findLogicalRdir(pkgname, c("T", "F"))
        2: checkCodingPractice(package_dir, parsedCode, package_name)
        1: BiocCheck::BiocCheck(".")
        
        Martin
        
        On 12/19/18, 8:32 AM, "Bioc-devel on behalf of Tierney, Luke" <bioc-devel-bounces using r-project.org on behalf of luke-tierney using uiowa.edu> wrote:
        
            codetools already checks only closures in checkUsageENv and hande
            checkUsagePackage, so this is anissue on the Bioc side.
            
            Best,
            
            luke
            
            On Tue, 18 Dec 2018, Tierney, Luke wrote:
            
            > Codetools should probably be ignoring those. Will have a look
            >
            > Sent from my iPhone
            >
            >> On Dec 18, 2018, at 6:54 AM, Shepherd, Lori <Lori.Shepherd using RoswellPark.org> wrote:
            >>
            >> Can you please open an issue for this so we don't lose track of it -
            >>
            >> https://github.com/Bioconductor/BiocCheck/issues
            >>
            >>
            >>
            >> Lori Shepherd
            >>
            >> Bioconductor Core Team
            >>
            >> Roswell Park Cancer Institute
            >>
            >> Department of Biostatistics & Bioinformatics
            >>
            >> Elm & Carlton Streets
            >>
            >> Buffalo, New York 14263
            >>
            >> ________________________________
            >> From: Bioc-devel <bioc-devel-bounces using r-project.org> on behalf of Shian Su <su.s using wehi.edu.au>
            >> Sent: Monday, December 17, 2018 8:34:10 PM
            >> To: bioc-devel
            >> Subject: [Bioc-devel] Aliasing `]` breaks BiocCheck::BiocCheck() version 1.18.0
            >>
            >> Hi all,
            >>
            >> If you put
            >>
            >> foo <- `[`
            >>
            >> Somewhere in a package, it will trigger
            >>
            >> Error in makeUsageCollector(fun, ...) : only works for closures
            >>
            >> In BiocCheck::BiocCheck() (version 1.18.0). This comes from
            >>
            >> if (typeof(fun) != "closure")
            >>        stop("only works for closures")
            >>
            >> In codetools::makeUsageCollector(), but
            >>
            >>> typeof(`[`)
            >> ## "special"
            >>
            >> Not that it matters for my use-case because I had discovered magrittr???s extract alias, but it might be an edge case worth covering, especially since the error message is so cryptic.
            >>
            >> Kind regards,
            >> Shian Su
            >>
            >> _______________________________________________
            >>
            >> The information in this email is confidential and intend...{{dropped:29}}
            >>
            >> _______________________________________________
            >> Bioc-devel using r-project.org mailing list
            >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
            > _______________________________________________
            > Bioc-devel using r-project.org mailing list
            > https://stat.ethz.ch/mailman/listinfo/bioc-devel
            
            -- 
            Luke Tierney
            Ralph E. Wareham Professor of Mathematical Sciences
            University of Iowa                  Phone:             319-335-3386
            Department of Statistics and        Fax:               319-335-3017
                Actuarial Science
            241 Schaeffer Hall                  email:   luke-tierney using uiowa.edu
            Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
            
            _______________________________________________
            Bioc-devel using r-project.org mailing list
            https://stat.ethz.ch/mailman/listinfo/bioc-devel
            
        
        
        ------------------------------
        
        Message: 4
        Date: Thu, 20 Dec 2018 14:31:47 +0000
        From: "Tierney, Luke" <luke-tierney using uiowa.edu>
        To: Martin Morgan <mtmorgan.bioc using gmail.com>
        Cc: "Shepherd, Lori" <Lori.Shepherd using RoswellPark.org>, bioc-devel
        	<bioc-devel using r-project.org>
        Subject: Re: [Bioc-devel] Aliasing `]` breaks BiocCheck::BiocCheck()
        	version 1.18.0
        Message-ID: <alpine.DEB.2.21.1812200829080.3478 using luke-Latitude-7480>
        Content-Type: text/plain; charset="utf-8"
        
        That's where the error is signaled, but the issue is in
        
        > 4: lapply(objs, FUN = function(obj) {
        >       value = env[[obj]]
        >       if (is.function(value))
        >           findGlobals(value)
        >       else character(0)
        >   })
        > 3: findLogicalRdir(pkgname, c("T", "F"))
        
        Change is.function(value) to typeof(value) == "closure" and you should be OK.
        
        Best,
        
        luke
        
        On Thu, 20 Dec 2018, Martin Morgan wrote:
        
        > this comes from `findGlobals()`
        >
        >> foo <- `[`
        >> findGlobals(foo)
        > Error in makeUsageCollector(fun, ...) : only works for closures
        >> traceback()
        > 4: stop("only works for closures")
        > 3: makeUsageCollector(fun, ...)
        > 2: collectUsage(fun, enterGlobal = enter)
        > 1: findGlobals(foo)
        >
        > In the bigger context it is in code that looks for poor 'coding practice', in this particular case looking for use of T / F rather than TRUE / FALSE, where the logic is to parse each function for use of global variables, and then to search for T / F amongst those.
        >
        > The full traceback when run on the package at https://github.com/mtmorgan/PkgA/tree/BiocCheck-sbs
        >
        > * Checking coding practice...
        > Error in makeUsageCollector(fun, ...) : only works for closures
        >> traceback()
        > 9: stop("only works for closures")
        > 8: makeUsageCollector(fun, ...)
        > 7: collectUsage(fun, enterGlobal = enter)
        > 6: findGlobals(value)
        > 5: FUN(X[[i]], ...)
        > 4: lapply(objs, FUN = function(obj) {
        >       value = env[[obj]]
        >       if (is.function(value))
        >           findGlobals(value)
        >       else character(0)
        >   })
        > 3: findLogicalRdir(pkgname, c("T", "F"))
        > 2: checkCodingPractice(package_dir, parsedCode, package_name)
        > 1: BiocCheck::BiocCheck(".")
        >
        > Martin
        >
        > On 12/19/18, 8:32 AM, "Bioc-devel on behalf of Tierney, Luke" <bioc-devel-bounces using r-project.org on behalf of luke-tierney using uiowa.edu> wrote:
        >
        >    codetools already checks only closures in checkUsageENv and hande
        >    checkUsagePackage, so this is anissue on the Bioc side.
        >
        >    Best,
        >
        >    luke
        >
        >    On Tue, 18 Dec 2018, Tierney, Luke wrote:
        >
        >    > Codetools should probably be ignoring those. Will have a look
        >    >
        >    > Sent from my iPhone
        >    >
        >    >> On Dec 18, 2018, at 6:54 AM, Shepherd, Lori <Lori.Shepherd using RoswellPark.org> wrote:
        >    >>
        >    >> Can you please open an issue for this so we don't lose track of it -
        >    >>
        >    >> https://github.com/Bioconductor/BiocCheck/issues
        >    >>
        >    >>
        >    >>
        >    >> Lori Shepherd
        >    >>
        >    >> Bioconductor Core Team
        >    >>
        >    >> Roswell Park Cancer Institute
        >    >>
        >    >> Department of Biostatistics & Bioinformatics
        >    >>
        >    >> Elm & Carlton Streets
        >    >>
        >    >> Buffalo, New York 14263
        >    >>
        >    >> ________________________________
        >    >> From: Bioc-devel <bioc-devel-bounces using r-project.org> on behalf of Shian Su <su.s using wehi.edu.au>
        >    >> Sent: Monday, December 17, 2018 8:34:10 PM
        >    >> To: bioc-devel
        >    >> Subject: [Bioc-devel] Aliasing `]` breaks BiocCheck::BiocCheck() version 1.18.0
        >    >>
        >    >> Hi all,
        >    >>
        >    >> If you put
        >    >>
        >    >> foo <- `[`
        >    >>
        >    >> Somewhere in a package, it will trigger
        >    >>
        >    >> Error in makeUsageCollector(fun, ...) : only works for closures
        >    >>
        >    >> In BiocCheck::BiocCheck() (version 1.18.0). This comes from
        >    >>
        >    >> if (typeof(fun) != "closure")
        >    >>        stop("only works for closures")
        >    >>
        >    >> In codetools::makeUsageCollector(), but
        >    >>
        >    >>> typeof(`[`)
        >    >> ## "special"
        >    >>
        >    >> Not that it matters for my use-case because I had discovered magrittr???s extract alias, but it might be an edge case worth covering, especially since the error message is so cryptic.
        >    >>
        >    >> Kind regards,
        >    >> Shian Su
        >    >>
        >    >> _______________________________________________
        >    >>
        >    >> The information in this email is confidential and intend...{{dropped:29}}
        >    >>
        >    >> _______________________________________________
        >    >> Bioc-devel using r-project.org mailing list
        >    >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
        >    > _______________________________________________
        >    > Bioc-devel using r-project.org mailing list
        >    > https://stat.ethz.ch/mailman/listinfo/bioc-devel
        >
        >    --
        >    Luke Tierney
        >    Ralph E. Wareham Professor of Mathematical Sciences
        >    University of Iowa                  Phone:             319-335-3386
        >    Department of Statistics and        Fax:               319-335-3017
        >        Actuarial Science
        >    241 Schaeffer Hall                  email:   luke-tierney using uiowa.edu
        >    Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
        >
        >    _______________________________________________
        >    Bioc-devel using r-project.org mailing list
        >    https://stat.ethz.ch/mailman/listinfo/bioc-devel
        >
        >
        
        -- 
        Luke Tierney
        Ralph E. Wareham Professor of Mathematical Sciences
        University of Iowa                  Phone:             319-335-3386
        Department of Statistics and        Fax:               319-335-3017
            Actuarial Science
        241 Schaeffer Hall                  email:   luke-tierney using uiowa.edu
        Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
        
        ------------------------------
        
        Message: 5
        Date: Thu, 20 Dec 2018 19:52:08 +0000
        From: "Obenchain, Valerie" <Valerie.Obenchain using RoswellPark.org>
        To: Aaron Lun <infinite.monkeys.with.keyboards using gmail.com>,
        	"bioc-devel using r-project.org" <bioc-devel using r-project.org>
        Subject: Re: [Bioc-devel] Compilation flags, CHECK errors and
        	BiocNeighbors
        Message-ID:
        	<MWHPR1201MB02547C0566B9DAF16450CF7BFFBF0 using MWHPR1201MB0254.namprd12.prod.outlook.com>
        	
        Content-Type: text/plain; charset="utf-8"
        
        The problem is that during the nightly builds, one of the Bioconductor 
        packages writes out a .R/Makevars.win in biocbuild's HOME during R CMD 
        build.
        
        Yesterday I removed the .R/ directory before the builds started and, as 
        expected, today's NodeInfo on tokay2 and packages using the C++11 show 
        the correct flags.
        
        If this .R/Makevars.win is not removed, it will (and did in the past) 
        pollute the next build cycle such that the NodeInfo and all packages 
        using C++11 would report/use the wrong flags.
        
        I think I've narrowed down which package is doing this and will contact 
        the maintainer. We'll also implement some sanitation code in the BBS to 
        prevent this from happening again.
        
        The reason HOME is writable is that many applications need to create 
        files (often hidden) such as lock files, cache, config files etc. If 
        they can't, they'll break and they will sometimes break in a subtle way 
        that is not immediately obvious.
        
        One last follow up is to explain why the previous iteration of the 
        NodeInfo on the build report reported the incorrect C++11 flags. The 
        problem there was that previously we were only picking up CXX1XFLAGS 
        instead of the individual CXX11FLAGS, CXX14FLAGS etc.
        
        Thanks for being persistent on this issue and for bringing the 
        conversation to bioc-devel.
        
        Val
        
        
        
        On 12/18/18 8:39 AM, Obenchain, Valerie wrote:
        > The devel build report hasn't posted yet but I took a look at the new
        > compiler flag output Herve implemented. The results show tokay2 is
        > indeed using
        > 
        > CXX11FLAGS: -O3 -march=native -mtune=native
        > 
        > This is inconsistent with what we have in the R/etc/<arch>/Makeconf for
        > both architectures on both tokay1 and tokay2. The Makeconf looks like this:
        > 
        > CXX11 = $(BINPREF)g++ $(M_ARCH)
        > CXX11FLAGS = -O2 -Wall $(DEBUGFLAG) -mtune=generic
        > CXX11PICFLAGS =
        > CXX11STD = -std=gnu++11
        > 
        > I don't know why the Makeconf is not being respected on tokay2. I can
        > confirm the inconsistency in an R session -
        > 
        > tokay2:
        > 
        > PS C:\Users\biocbuild\bbs-3.9-bioc\R> ./bin/R CMD config CXX11FLAGS
        > -O3 -march=native -mtune=native
        > 
        > tokay1:
        > 
        > PS C:\Users\biocbuild\bbs-3.8-bioc\R> ./bin/R CMD config CXX11FLAGS
        > -O2 -Wall -mtune=generic
        > 
        > I'll work with Herve to resolve this.
        > 
        > Val
        > 
        > 
        > 
        > On 12/17/18 5:05 PM, Aaron Lun wrote:
        >> Thanks Val. I don�t think it�s a BiocNeighbors thing, as it doesn�t try
        >> to customize the compilation flags or have its own Makevars. Moreover,
        >> the �-O3 -mtune=native -mtune=generic� flags seem to show up on all of
        >> my packages containing C++11 code. Some cursory checks of other packages
        >> suggest that the correct flags (�-O2 -mtune=generic�) are used for C++98
        >> code.
        >>
        >> -A
        >>
        >>> On 17 Dec 2018, at 17:47, Obenchain, Valerie <Valerie.Obenchain using RoswellPark.org> wrote:
        >>>
        >>> Hi Aaron,
        >>>
        >>> The only compilation flags that are different for tokay1 (release) and
        >>> tokay2 (devel) are C++14 flags. BiocNeighbors is not using C++14 but
        >>> C++11 so I think the changes we discussed previously actually don't
        >>> apply to your case.
        >>>
        >>> All compilation flags we use are listed at the top of the build report,
        >>> e.g., for tokay2:
        >>>
        >>> https://www.bioconductor.org/checkResults/devel/bioc-LATEST/tokay2-NodeInfo.html
        >> <https://www.bioconductor.org/checkResults/devel/bioc-LATEST/tokay2-NodeInfo.html>
        >>>
        >>> I can look into this further but right now I'm not sure where the '-O3
        >>> -march=native -mtune=native' is coming from in the check output for
        >>> BiocNeighbors. We don't use 'native' on the builders for build/check or
        >>> for creating binaries.
        >>>
        >>> Herve might have more insight on this.
        >>>
        >>> Val
        >>>
        >>>
        >>>
        >>>
        >>>
        >>>
        >>>
        >>> On 12/15/18 10:56 PM, Aaron Lun wrote:
        >>>> Sometime between 6-18 November, BiocNeighbors� BioC-devel builds began failing on Windows 64-bit, and have continued to fail since:
        >>>>
        >>>> http://bioconductor.org/checkResults/devel/bioc-LATEST/BiocNeighbors/
        >> <http://bioconductor.org/checkResults/devel/bioc-LATEST/BiocNeighbors/>
        >> <http://bioconductor.org/checkResults/devel/bioc-LATEST/BiocNeighbors/
        >> <http://bioconductor.org/checkResults/devel/bioc-LATEST/BiocNeighbors/>>
        >>>>
        >>>> The most interesting part is the nature of the failures. They are not segmentation faults but rather �incorrect� output in the unit tests:
        >>>>
        >>>> - BiocNeighbors uses the Annoy algorithm for approximate nearest neighbor search, which is provided as a header-only C++ library in the RcppAnnoy package.
        >>>>
        >>>> - I have compiled the BiocNeighhbors C++ code with an �#include" for these libraries to use the Annoy routines. For testing, I compared the output of my C++ code to the output of the code in the RcppAnnoy package.
        >>>>
        >>>> - It is these tests that are failing (i.e., the output does not match up) during CHECK on Windows 64-bit only, despite the fact that the same library is being �#include�d in both the BiocNeighbors and RcppAnnoy sources!
        >>>>
        >>>> What makes this particularly intriguing is that the differences between BiocNeighbors and RcppAnnoy are very minor. Less than 1% of the neighbor identities differ, and only for some of the scenarios, so it�s not an obvious bug that would be changing the  output en masse. Now, the package also uses/tests Annoy in
        >> BioC-release but builds fine on tokay1:
        >>>>
        >>>> http://bioconductor.org/checkResults/release/bioc-LATEST/BiocNeighbors/
        >> <http://bioconductor.org/checkResults/release/bioc-LATEST/BiocNeighbors/> <http://bioconductor.org/checkResults/release/bioc-LATEST/BiocNeighbors/
        >> <http://bioconductor.org/checkResults/release/bioc-LATEST/BiocNeighbors/>>
        >>>>
        >>>> The major difference between the Bioc-release/devel builds is the compilation flags, which have changed from �-O2 -mtune=generic� to �-O3 -march=native -mtune=native� in tokay2. I am told (thanks Val) that the timing of this change is consistent with the  start of the BiocNeighbors build failures on tokay2. I would guess
        >> that RcppAnnoy is also compiled with �-O2 -mtune=generic� on the CRAN
        >> build systems, introducing differences in optimization levels between
        >> the BiocNeighbors and RcppAnnoy binaries. These could be responsible for
        >> the discrepancies in the search results.
        >>>>
        >>>> I was able to reproduce this on my Unix cluster (gcc 6.5.0) where setting �-march=native� with either �-O3� or �-O2� caused a difference in the calculations. After much trial and error, I eventually narrowed this down to the �-mfma� flag, which seems to  change the precision of multiply-and-add operations and thus the
        >> search results. This occurs even when AVX support is turned off; I guess
        >> the compiler tries to be smart if it detects you are doing some kind of
        >> simultaneous multiply and addition, which is a pretty common thing to do
        >> when computing Euclidean distances.
        >>>>
        >>>> In summary: can we not use �-march=native� on tokay2? (Val, I know we discussed this, but whatever changes you made to the compilation flags don�t seem to have propagated to the build machines.) As the case study with BiocNeighbors shows, this leads to inconsistencies  between the CRAN and BioC-devel binaries for the same code, which
        >> unnecessarily complicates downstream usage and unit tests. I also wonder
        >> how binaries specialized for tokay2�s architecture would behave on other
        >> CPUs with different instruction sets, if they would run at all.
        >>>>
        >>>> Cheers,
        >>>>
        >>>> Aaron
        >>>>         [[alternative HTML version deleted]]
        >>>>
        >>>> _______________________________________________
        >>>> Bioc-devel using r-project.org <mailto:Bioc-devel using r-project.org> mailing list
        >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
        >> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
        >>>>
        >>>
        >>>
        >>>
        >>> This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that  any disclosure, copying, distribution, or use of this email message is
        >> prohibited.  If you have received this message in error, please notify
        >> the sender immediately by e-mail and delete this email message from your
        >> computer. Thank you.
        >>
        >>
        >>           [[alternative HTML version deleted]]
        >>
        >> _______________________________________________
        >> Bioc-devel using r-project.org mailing list
        >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
        > 
        > 
        > 
        > This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
        > _______________________________________________
        > Bioc-devel using r-project.org mailing list
        > https://stat.ethz.ch/mailman/listinfo/bioc-devel
        > 
        
        
        
        This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
        
        ------------------------------
        
        Subject: Digest Footer
        
        _______________________________________________
        Bioc-devel mailing list
        Bioc-devel using r-project.org
        https://stat.ethz.ch/mailman/listinfo/bioc-devel
        
        
        ------------------------------
        
        End of Bioc-devel Digest, Vol 177, Issue 17
        *******************************************
    
    _______________________________________________
    Bioc-devel using r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/bioc-devel
    


More information about the Bioc-devel mailing list