[Bioc-devel] build errors: "Error in .seqlengths_TwoBitFile(x) : UCSC library operation failed"

Pages, Herve hp@ge@ @end|ng |rom |redhutch@org
Thu Apr 25 07:39:41 CEST 2019


Hi Paul,

Something/someone is definitely re-installing the 
BSgenome.Hsapiens.UCSC.hg38 while 'R CMD check trena' is running on the 
build machines. This has happened consistently for several consecutive 
nights on malbec2 (BioC 3.9 builds) and malbec1 (BioC 3.10 builds) where 
I've been monitoring this.

The builds are parallelized at the "top level" i.e. several 'R CMD 
check' instances are running concurrently on different packages at any 
given time (e.g. 15 concurrent instances on malbec1 & malbec2). So we 
cannot exclude the possibility that another package could be pulling the 
rug from under trena's feet. However, the exact set of packages that is 
being checked at the time that BSgenome.Hsapiens.UCSC.hg38 gets 
re-installed will typically change from one build to the next and also 
across build machines. This makes it unlikely that the culprit is 
another package.

Anyway, just to make sure, I've identified the 15 packages that were 
running at the time BSgenome.Hsapiens.UCSC.hg38 got re-installed last 
night on malbec1 (BioC 3.10 builds) and manually 'R CMD check'ed them 
(including trena which is one of them). None of them re-installed 
BSgenome.Hsapiens.UCSC.hg38. All this to say that I've not been able to 
reproduce this problem so far in an interactive session on the build 
machines.

Puzzling! (and frustrating) I'll keep investigating...

Note that trena is currently at version 1.5.14 in git but the last 
version of the source package that propagated is 1.5.8. Version 1.5.9 
(from Dec 6, 2018) and successive versions never seem to have propagated 
which suggests that the package has been erroring on malbec2 since Dec 
2018. This makes it hard to know since when trena has been having the 
"UCSC library operation failed" problem on the build machines.

Finally, another intriguing thing is that, according to the lastest 3.8 
build result, trena's unit tests also seemed to have a problem accessing 
a file that belongs to another package:

https://bioconductor.org/checkResults/3.8/bioc-20190416/trena/merida1-checksrc.html

Not the same problem but similar (and this time on Mac and not on 
Linux). Very puzzling!

H.


On 4/23/19 11:29, Paul Shannon wrote:
> Hi Herve,
>
> Thanks for your reply!
>
>> Is there a possibility that trena's code is having one worker
>> downloading/re-installing BSgenome.Hsapiens.UCSC.hg38 while at the same
>> time another worker is trying to access it?
> I don’t think any download or reinstalling happens.  Several genome packages (hg38, hg19, mm10) are imported by trena as specified in the DESCRIPTION file, and so I assume they must be present after trena is built and installed.  Thus - and here’s where I may be confused - there should be nothing to trigger download or re-install as the tests, examples and vignettes are run.
>
> In the constructor of the MotifMatcher class, this assignment is made
>
>      if(genomeName == "hg38"){
>         reference.genome <- BSgenome.Hsapiens.UCSC.hg38::BSgenome.Hsapiens.UCSC.hg38
>         }
>
> And used later like this:
>
>      seqs <- as.character(BSgenome::getSeq(obj using reference.genome, gr.regions))
>
> Hence my suggestion that no download or install takes place at run time.
>
>
> In the current design of the unit tests for MotifMatcher, I call the constructor in each test:
>
>     jaspar.human.pfms <- as.list(query (query(MotifDb, "sapiens"), "jaspar2016"))
>     motifMatcher <- MotifMatcher(genomeName="hg38", pfms=jaspar.human.pfms, quiet=TRUE)
>
> For what it’s worth, this code is unchanged in the last year, has run fine on the build system until recently, and passes R CMD check under R3.6.0beta on ubuntu for me.  There is no parallelization in this class - but maybe the build system introduces some at a higher level?
>
> I can condition these failing tests on hostname in order to pass the build tests if that is not too much of a dodge.
>
>   - Paul
>
>
>> On Apr 23, 2019, at 12:19 AM, Pages, Herve <hpages using fredhutch.org> wrote:
>>
>> Hi Paul,
>>
>> Is there a possibility that trena's code is having one worker
>> downloading/re-installing BSgenome.Hsapiens.UCSC.hg38 while at the same
>> time another worker is trying to access it?
>>
>> The reason I suspect something like this is that it seems that
>> BSgenome.Hsapiens.UCSC.hg38 gets reinstalled every night on the builders
>> and that this happens at the time the build system is running 'R CMD
>> check' on trena.
>>
>> Package vignettes, examples, and unit tests should avoid re-installing
>> packages.
>>
>> H.
>>
>> On 4/22/19 15:01, Paul Shannon wrote:
>>> I cannot reproduce daily build failures found in the trena package by the build system.  The build report shows:
>>>
>>> trena RUnit Tests - 86 test functions, 7 errors, 0 failures
>>>
>>> ERROR in test_.injectSnp: Error in .seqlengths_TwoBitFile(x) : UCSC library operation failed
>>> ERROR in test_bugInStartEndOfMinusStrandHits: Error in .seqlengths_TwoBitFile(x) : UCSC library operation failed
>>> ERROR in test_findMatchesByChromosomalRegion: Error in .seqlengths_TwoBitFile(x) : UCSC library operation failed
>>> ERROR in test_findMatchesByChromosomalRegion.twoAlternateAlleles: Error in .seqlengths_TwoBitFile(x) : UCSC library operation failed
>>> ERROR in test_findMatchesByMultipleChromosomalRegions: Error in .seqlengths_TwoBitFile(x) : UCSC library operation failed
>>> ERROR in test_getSequence: Error in .seqlengths_TwoBitFile(x) : UCSC library operation failed
>>> ERROR in test_noMatch: Error in .seqlengths_TwoBitFile(x) : UCSC library operation failed
>>>
>>> This seems similar to a bioc support exchange from two years ago, which may suggest that the build system's BSgenome.Hsapiens.UCSC.hg38 is the locus of the problem.   I offer suggestion very tentatively.
>>>
>>>     support https://urldefense.proofpoint.com/v2/url?u=https-3A__support.bioconductor.org_p_95963_&d=DwIFAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=1AJWecG5cm0EI_BZG7zYbHNZa3JkQY8pdsJFahrtpIU&s=2WHZQbOLmt-jvKlwVBty43jY5JcBt2U_sdqZDqRxEOY&e=
>>>
>>> Any suggestions?
>>>
>>>   - Paul
>>>
>>> sessionInfo()  # from my clean R CMD check
>>> R version 3.6.0 beta (2019-04-11 r76379)
>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>> Running under: Ubuntu 16.04.5 LTS
>>>
>>> Matrix products: default
>>> BLAS:   /local/users/pshannon/src/R-beta/lib/libRblas.so
>>> LAPACK: /local/users/pshannon/src/R-beta/lib/libRlapack.so
>>>
>>> locale:
>>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats4    parallel  stats     graphics  grDevices utils     datasets
>>> [8] methods   base
>>>
>>> other attached packages:
>>>   [1] RPostgreSQL_0.6-2   DBI_1.0.0           RUnit_0.4.32
>>>   [4] trena_1.5.14        MotifDb_1.22.0      Biostrings_2.48.0
>>>   [7] XVector_0.20.0      IRanges_2.14.12     S4Vectors_0.18.3
>>> [10] BiocGenerics_0.26.0 glmnet_2.0-16       foreach_1.4.4
>>> [13] Matrix_1.2-17
>>>
>>> loaded via a namespace (and not attached):
>>>   [1] SummarizedExperiment_1.10.1       lassopv_0.2.0
>>>   [3] progress_1.2.0                    lattice_0.20-38
>>>   [5] rtracklayer_1.40.6                blob_1.1.1
>>>   [7] XML_3.98-1.19                     rlang_0.3.4
>>>   [9] flare_1.6.0                       BiocParallel_1.14.2
>>> [11] bit64_0.9-7                       splitstackshape_1.4.8
>>> [13] matrixStats_0.54.0                GenomeInfoDbData_1.1.0
>>> [15] stringr_1.4.0                     zlibbioc_1.26.0
>>> [17] codetools_0.2-16                  memoise_1.1.0
>>> [19] Biobase_2.40.0                    biomaRt_2.36.1
>>> [21] GenomeInfoDb_1.16.0               curl_3.3
>>> [23] AnnotationDbi_1.42.1              lars_1.2
>>> [25] Rcpp_1.0.1                        BSgenome_1.48.0
>>> [27] DelayedArray_0.6.6                org.Hs.eg.db_3.6.0
>>> [29] bit_1.1-14                        Rsamtools_1.32.3
>>> [31] BSgenome.Hsapiens.UCSC.hg38_1.4.1 RMySQL_0.10.17
>>> [33] hms_0.4.2                         digest_0.6.18
>>> [35] stringi_1.4.3                     GenomicRanges_1.32.7
>>> [37] grid_3.6.0                        tools_3.6.0
>>> [39] bitops_1.0-6                      magrittr_1.5
>>> [41] RCurl_1.95-4.12                   RSQLite_2.1.1
>>> [43] randomForest_4.6-14               crayon_1.3.4
>>> [45] vbsr_0.0.5                        pkgconfig_2.0.2
>>> [47] MASS_7.3-51.4                     data.table_1.12.2
>>> [49] prettyunits_1.0.2                 httr_1.4.0
>>> [51] assertthat_0.2.1                  iterators_1.0.10
>>> [53] R6_2.4.0                          GenomicAlignments_1.16.0
>>> [55] igraph_1.2.4.1                    compiler_3.6.0
>>> _______________________________________________
>>> Bioc-devel using r-project.org mailing list
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIFAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=1AJWecG5cm0EI_BZG7zYbHNZa3JkQY8pdsJFahrtpIU&s=Hd_vdYy62MOejkKAH21haaIJ0HMvjDSH-BxAjBCxSjk&e=
>> -- 
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages using fredhutch.org
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
>>
-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages using fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list