[Bioc-devel] build errors: "Error in .seqlengths_TwoBitFile(x) : UCSC library operation failed"

Paul Shannon p@u|@thurmond@@h@nnon @end|ng |rom gm@||@com
Fri May 10 21:06:28 CEST 2019


Belated thanks, Herve, for getting this fixed for the release.

I think the same problem has popped up again, as seen in these latest trena build report:

 o ERROR for 'R CMD check' on malbec2. See the details here:
     https://master.bioconductor.org/checkResults/3.9/bioc-LATEST/trena/malbec2-checksrc.html

Warning in .seqlengths_TwoBitFile(x) : mustOpen: Can't open /home/biocbuild/bbs-3.9-bioc/R/library/BSgenome.Hsapiens.UCSC.hg38/extdata/single_sequences.2bit to read: No such file or directory Timing stopped at: 6.522 0 6.522 Error in .seqlengths_TwoBitFile(x) : UCSC library operation failed 

 - Paul

> On Apr 24, 2019, at 10:39 PM, Pages, Herve <hpages using fredhutch.org> wrote:
> 
> Hi Paul,
> 
> Something/someone is definitely re-installing the 
> BSgenome.Hsapiens.UCSC.hg38 while 'R CMD check trena' is running on the 
> build machines. This has happened consistently for several consecutive 
> nights on malbec2 (BioC 3.9 builds) and malbec1 (BioC 3.10 builds) where 
> I've been monitoring this.
> 
> The builds are parallelized at the "top level" i.e. several 'R CMD 
> check' instances are running concurrently on different packages at any 
> given time (e.g. 15 concurrent instances on malbec1 & malbec2). So we 
> cannot exclude the possibility that another package could be pulling the 
> rug from under trena's feet. However, the exact set of packages that is 
> being checked at the time that BSgenome.Hsapiens.UCSC.hg38 gets 
> re-installed will typically change from one build to the next and also 
> across build machines. This makes it unlikely that the culprit is 
> another package.
> 
> Anyway, just to make sure, I've identified the 15 packages that were 
> running at the time BSgenome.Hsapiens.UCSC.hg38 got re-installed last 
> night on malbec1 (BioC 3.10 builds) and manually 'R CMD check'ed them 
> (including trena which is one of them). None of them re-installed 
> BSgenome.Hsapiens.UCSC.hg38. All this to say that I've not been able to 
> reproduce this problem so far in an interactive session on the build 
> machines.
> 
> Puzzling! (and frustrating) I'll keep investigating...
> 
> Note that trena is currently at version 1.5.14 in git but the last 
> version of the source package that propagated is 1.5.8. Version 1.5.9 
> (from Dec 6, 2018) and successive versions never seem to have propagated 
> which suggests that the package has been erroring on malbec2 since Dec 
> 2018. This makes it hard to know since when trena has been having the 
> "UCSC library operation failed" problem on the build machines.
> 
> Finally, another intriguing thing is that, according to the lastest 3.8 
> build result, trena's unit tests also seemed to have a problem accessing 
> a file that belongs to another package:
> 
> https://bioconductor.org/checkResults/3.8/bioc-20190416/trena/merida1-checksrc.html
> 
> Not the same problem but similar (and this time on Mac and not on 
> Linux). Very puzzling!
> 
> H.
> 
> 
> On 4/23/19 11:29, Paul Shannon wrote:
>> Hi Herve,
>> 
>> Thanks for your reply!
>> 
>>> Is there a possibility that trena's code is having one worker
>>> downloading/re-installing BSgenome.Hsapiens.UCSC.hg38 while at the same
>>> time another worker is trying to access it?
>> I don’t think any download or reinstalling happens.  Several genome packages (hg38, hg19, mm10) are imported by trena as specified in the DESCRIPTION file, and so I assume they must be present after trena is built and installed.  Thus - and here’s where I may be confused - there should be nothing to trigger download or re-install as the tests, examples and vignettes are run.
>> 
>> In the constructor of the MotifMatcher class, this assignment is made
>> 
>>     if(genomeName == "hg38"){
>>        reference.genome <- BSgenome.Hsapiens.UCSC.hg38::BSgenome.Hsapiens.UCSC.hg38
>>        }
>> 
>> And used later like this:
>> 
>>     seqs <- as.character(BSgenome::getSeq(obj using reference.genome, gr.regions))
>> 
>> Hence my suggestion that no download or install takes place at run time.
>> 
>> 
>> In the current design of the unit tests for MotifMatcher, I call the constructor in each test:
>> 
>>    jaspar.human.pfms <- as.list(query (query(MotifDb, "sapiens"), "jaspar2016"))
>>    motifMatcher <- MotifMatcher(genomeName="hg38", pfms=jaspar.human.pfms, quiet=TRUE)
>> 
>> For what it’s worth, this code is unchanged in the last year, has run fine on the build system until recently, and passes R CMD check under R3.6.0beta on ubuntu for me.  There is no parallelization in this class - but maybe the build system introduces some at a higher level?
>> 
>> I can condition these failing tests on hostname in order to pass the build tests if that is not too much of a dodge.
>> 
>>  - Paul
>> 
>> 
>>> On Apr 23, 2019, at 12:19 AM, Pages, Herve <hpages using fredhutch.org> wrote:
>>> 
>>> Hi Paul,
>>> 
>>> Is there a possibility that trena's code is having one worker
>>> downloading/re-installing BSgenome.Hsapiens.UCSC.hg38 while at the same
>>> time another worker is trying to access it?
>>> 
>>> The reason I suspect something like this is that it seems that
>>> BSgenome.Hsapiens.UCSC.hg38 gets reinstalled every night on the builders
>>> and that this happens at the time the build system is running 'R CMD
>>> check' on trena.
>>> 
>>> Package vignettes, examples, and unit tests should avoid re-installing
>>> packages.
>>> 
>>> H.
>>> 
>>> On 4/22/19 15:01, Paul Shannon wrote:
>>>> I cannot reproduce daily build failures found in the trena package by the build system.  The build report shows:
>>>> 
>>>> trena RUnit Tests - 86 test functions, 7 errors, 0 failures
>>>> 
>>>> ERROR in test_.injectSnp: Error in .seqlengths_TwoBitFile(x) : UCSC library operation failed
>>>> ERROR in test_bugInStartEndOfMinusStrandHits: Error in .seqlengths_TwoBitFile(x) : UCSC library operation failed
>>>> ERROR in test_findMatchesByChromosomalRegion: Error in .seqlengths_TwoBitFile(x) : UCSC library operation failed
>>>> ERROR in test_findMatchesByChromosomalRegion.twoAlternateAlleles: Error in .seqlengths_TwoBitFile(x) : UCSC library operation failed
>>>> ERROR in test_findMatchesByMultipleChromosomalRegions: Error in .seqlengths_TwoBitFile(x) : UCSC library operation failed
>>>> ERROR in test_getSequence: Error in .seqlengths_TwoBitFile(x) : UCSC library operation failed
>>>> ERROR in test_noMatch: Error in .seqlengths_TwoBitFile(x) : UCSC library operation failed
>>>> 
>>>> This seems similar to a bioc support exchange from two years ago, which may suggest that the build system's BSgenome.Hsapiens.UCSC.hg38 is the locus of the problem.   I offer suggestion very tentatively.
>>>> 
>>>>    support https://urldefense.proofpoint.com/v2/url?u=https-3A__support.bioconductor.org_p_95963_&d=DwIFAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=1AJWecG5cm0EI_BZG7zYbHNZa3JkQY8pdsJFahrtpIU&s=2WHZQbOLmt-jvKlwVBty43jY5JcBt2U_sdqZDqRxEOY&e=
>>>> 
>>>> Any suggestions?
>>>> 
>>>>  - Paul
>>>> 
>>>> sessionInfo()  # from my clean R CMD check
>>>> R version 3.6.0 beta (2019-04-11 r76379)
>>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>> Running under: Ubuntu 16.04.5 LTS
>>>> 
>>>> Matrix products: default
>>>> BLAS:   /local/users/pshannon/src/R-beta/lib/libRblas.so
>>>> LAPACK: /local/users/pshannon/src/R-beta/lib/libRlapack.so
>>>> 
>>>> locale:
>>>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>> 
>>>> attached base packages:
>>>> [1] stats4    parallel  stats     graphics  grDevices utils     datasets
>>>> [8] methods   base
>>>> 
>>>> other attached packages:
>>>>  [1] RPostgreSQL_0.6-2   DBI_1.0.0           RUnit_0.4.32
>>>>  [4] trena_1.5.14        MotifDb_1.22.0      Biostrings_2.48.0
>>>>  [7] XVector_0.20.0      IRanges_2.14.12     S4Vectors_0.18.3
>>>> [10] BiocGenerics_0.26.0 glmnet_2.0-16       foreach_1.4.4
>>>> [13] Matrix_1.2-17
>>>> 
>>>> loaded via a namespace (and not attached):
>>>>  [1] SummarizedExperiment_1.10.1       lassopv_0.2.0
>>>>  [3] progress_1.2.0                    lattice_0.20-38
>>>>  [5] rtracklayer_1.40.6                blob_1.1.1
>>>>  [7] XML_3.98-1.19                     rlang_0.3.4
>>>>  [9] flare_1.6.0                       BiocParallel_1.14.2
>>>> [11] bit64_0.9-7                       splitstackshape_1.4.8
>>>> [13] matrixStats_0.54.0                GenomeInfoDbData_1.1.0
>>>> [15] stringr_1.4.0                     zlibbioc_1.26.0
>>>> [17] codetools_0.2-16                  memoise_1.1.0
>>>> [19] Biobase_2.40.0                    biomaRt_2.36.1
>>>> [21] GenomeInfoDb_1.16.0               curl_3.3
>>>> [23] AnnotationDbi_1.42.1              lars_1.2
>>>> [25] Rcpp_1.0.1                        BSgenome_1.48.0
>>>> [27] DelayedArray_0.6.6                org.Hs.eg.db_3.6.0
>>>> [29] bit_1.1-14                        Rsamtools_1.32.3
>>>> [31] BSgenome.Hsapiens.UCSC.hg38_1.4.1 RMySQL_0.10.17
>>>> [33] hms_0.4.2                         digest_0.6.18
>>>> [35] stringi_1.4.3                     GenomicRanges_1.32.7
>>>> [37] grid_3.6.0                        tools_3.6.0
>>>> [39] bitops_1.0-6                      magrittr_1.5
>>>> [41] RCurl_1.95-4.12                   RSQLite_2.1.1
>>>> [43] randomForest_4.6-14               crayon_1.3.4
>>>> [45] vbsr_0.0.5                        pkgconfig_2.0.2
>>>> [47] MASS_7.3-51.4                     data.table_1.12.2
>>>> [49] prettyunits_1.0.2                 httr_1.4.0
>>>> [51] assertthat_0.2.1                  iterators_1.0.10
>>>> [53] R6_2.4.0                          GenomicAlignments_1.16.0
>>>> [55] igraph_1.2.4.1                    compiler_3.6.0
>>>> _______________________________________________
>>>> Bioc-devel using r-project.org mailing list
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIFAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=1AJWecG5cm0EI_BZG7zYbHNZa3JkQY8pdsJFahrtpIU&s=Hd_vdYy62MOejkKAH21haaIJ0HMvjDSH-BxAjBCxSjk&e=
>>> -- 
>>> Hervé Pagès
>>> 
>>> Program in Computational Biology
>>> Division of Public Health Sciences
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N, M1-B514
>>> P.O. Box 19024
>>> Seattle, WA 98109-1024
>>> 
>>> E-mail: hpages using fredhutch.org
>>> Phone:  (206) 667-5791
>>> Fax:    (206) 667-1319
>>> 
> -- 
> Hervé Pagès
> 
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
> 
> E-mail: hpages using fredhutch.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
> 



More information about the Bioc-devel mailing list