[Bioc-devel] duplicated entries with 'ExperimentHub(localHub=TRUE)'

Robert Castelo robert@c@@te|o @end|ng |rom up|@edu
Thu Apr 4 20:40:03 CEST 2024


hi,

I'm getting duplicated entries when loading **offline** previously 
cached ExperimentHub resources. This code reproduces the problem:

1. If in a fresh empty cache of ExperimentHub I download 9 resources 
through the gDNAinRNAseqData package:

library(gDNAinRNAseqData)

bamfiles <- LiYu22subsetBAMfiles()
length(bamfiles)
[1] 9

2. Try to load them again from the local cache either going offline or 
using the 'offline=TRUE' argument to the loader function, which sets 
'localHub=TRUE' in the call to 'ExperimentHub()':

bamfiles <- LiYu22subsetBAMfiles(offline=TRUE)
Using 'localHub=TRUE'
   If offline, please also see BiocManager vignette section on offline use
snapshotDate(): 2024-04-02
see ?gDNAinRNAseqData and browseVignettes('gDNAinRNAseqData') for 
documentation
loading from cache
[...]

length(bamfiles)
[1] 18

3. If I examine the resources offline directly with 'ExperimentHub()' I 
see them duplicated with some IDs getting a '.1' suffix:

library(ExperimentHub)

eh <- ExperimentHub(localHub=TRUE)
Using 'localHub=TRUE'
   If offline, please also see BiocManager vignette section on offline use
snapshotDate(): 2024-04-02
length(eh)
[1] 18
eh
ExperimentHub with 18 records
# snapshotDate(): 2024-04-02
# $dataprovider: NGDC
# $species: Homo sapiens
# $rdataclass: BamFile
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["EH8079"]]'


   EH8079   |
   EH8079.1 |
   EH8080   |
   EH8080.1 |
   EH8081   |
   ...
   EH8085.1 |
   EH8086   |
   EH8086.1 |
   EH8087   |
   EH8087.1 |
title
   EH8079   RNA-seq data BAM file subset of HRR589632 contaminated with 
0% gDNA
   EH8079.1 RNA-seq data BAM file subset of HRR589632 contaminated with 
0% gDNA
   EH8080   RNA-seq data BAM file subset of HRR589633 contaminated with 
0% gDNA
   EH8080.1 RNA-seq data BAM file subset of HRR589633 contaminated with 
0% gDNA
   EH8081   RNA-seq data BAM file subset of HRR589634 contaminated with 
0% gDNA
   ... ...
   EH8085.1 RNA-seq data BAM file subset of HRR589623 contaminated with 
10% ...
   EH8086   RNA-seq data BAM file subset of HRR589624 contaminated with 
10% ...
   EH8086.1 RNA-seq data BAM file subset of HRR589624 contaminated with 
10% ...
   EH8087   RNA-seq data BAM file subset of HRR589625 contaminated with 
10% ...
   EH8087.1 RNA-seq data BAM file subset of HRR589625 contaminated with 
10% ...

Does anybody have an idea what might be going on with 
'ExperimentHub(localHub=TRUE)'?

Thanks!

robert.



More information about the Bioc-devel mailing list