[Bioc-devel] duplicated entries with 'ExperimentHub(localHub=TRUE)'

Kern, Lori Lor|@Shepherd @end|ng |rom Ro@we||P@rk@org
Fri Apr 5 15:02:30 CEST 2024


I found the bug.  Testing and pushing up a fix.

Cheers,


Lori Shepherd - Kern

Bioconductor Core Team

Roswell Park Comprehensive Cancer Center

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263

________________________________
From: Bioc-devel <bioc-devel-bounces using r-project.org> on behalf of Kern, Lori via Bioc-devel <bioc-devel using r-project.org>
Sent: Friday, April 5, 2024 8:15 AM
To: Robert Castelo <robert.castelo using upf.edu>; bioc-devel using r-project.org <bioc-devel using r-project.org>
Subject: Re: [Bioc-devel] duplicated entries with 'ExperimentHub(localHub=TRUE)'

I will have to look at how offline changes the loading of the files.  That is an odd and unexpected behavior.

They aren't actually duplicate files, what is happening is it is displaying the entry for the bam file (.bam) and the index file (.bai) as separate entries when offline instead of associating them as one entry.

I'll investigate more.


Lori Shepherd - Kern

Bioconductor Core Team

Roswell Park Comprehensive Cancer Center

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263

________________________________
From: Bioc-devel <bioc-devel-bounces using r-project.org> on behalf of Robert Castelo <robert.castelo using upf.edu>
Sent: Thursday, April 4, 2024 2:40 PM
To: bioc-devel using r-project.org <bioc-devel using r-project.org>
Subject: [Bioc-devel] duplicated entries with 'ExperimentHub(localHub=TRUE)'

hi,

I'm getting duplicated entries when loading **offline** previously
cached ExperimentHub resources. This code reproduces the problem:

1. If in a fresh empty cache of ExperimentHub I download 9 resources
through the gDNAinRNAseqData package:

library(gDNAinRNAseqData)

bamfiles <- LiYu22subsetBAMfiles()
length(bamfiles)
[1] 9

2. Try to load them again from the local cache either going offline or
using the 'offline=TRUE' argument to the loader function, which sets
'localHub=TRUE' in the call to 'ExperimentHub()':

bamfiles <- LiYu22subsetBAMfiles(offline=TRUE)
Using 'localHub=TRUE'
   If offline, please also see BiocManager vignette section on offline use
snapshotDate(): 2024-04-02
see ?gDNAinRNAseqData and browseVignettes('gDNAinRNAseqData') for
documentation
loading from cache
[...]

length(bamfiles)
[1] 18

3. If I examine the resources offline directly with 'ExperimentHub()' I
see them duplicated with some IDs getting a '.1' suffix:

library(ExperimentHub)

eh <- ExperimentHub(localHub=TRUE)
Using 'localHub=TRUE'
   If offline, please also see BiocManager vignette section on offline use
snapshotDate(): 2024-04-02
length(eh)
[1] 18
eh
ExperimentHub with 18 records
# snapshotDate(): 2024-04-02
# $dataprovider: NGDC
# $species: Homo sapiens
# $rdataclass: BamFile
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["EH8079"]]'


   EH8079   |
   EH8079.1 |
   EH8080   |
   EH8080.1 |
   EH8081   |
   ...
   EH8085.1 |
   EH8086   |
   EH8086.1 |
   EH8087   |
   EH8087.1 |
title
   EH8079   RNA-seq data BAM file subset of HRR589632 contaminated with
0% gDNA
   EH8079.1 RNA-seq data BAM file subset of HRR589632 contaminated with
0% gDNA
   EH8080   RNA-seq data BAM file subset of HRR589633 contaminated with
0% gDNA
   EH8080.1 RNA-seq data BAM file subset of HRR589633 contaminated with
0% gDNA
   EH8081   RNA-seq data BAM file subset of HRR589634 contaminated with
0% gDNA
   ... ...
   EH8085.1 RNA-seq data BAM file subset of HRR589623 contaminated with
10% ...
   EH8086   RNA-seq data BAM file subset of HRR589624 contaminated with
10% ...
   EH8086.1 RNA-seq data BAM file subset of HRR589624 contaminated with
10% ...
   EH8087   RNA-seq data BAM file subset of HRR589625 contaminated with
10% ...
   EH8087.1 RNA-seq data BAM file subset of HRR589625 contaminated with
10% ...

Does anybody have an idea what might be going on with
'ExperimentHub(localHub=TRUE)'?

Thanks!

robert.

_______________________________________________
Bioc-devel using r-project.org mailing list
https://secure-web.cisco.com/1H0voxA7oQ0saDcNCWmRZwr1H6rkyUr0Fu4Ru-hZrq5GY1ay-R4ltvl_raeo94HUjjlKMox7wMWOkNHrqW28aJmsFXxCkYVatvRWHo5X5Pwpy3KKZLPxRybRw-xB-pjeKV38ia8MSC3_WURYilKunRSCMrcU8O0rBmThSR5Zip-TpfdAvp5oTkjIvudwgfsDPkVYxWwfoZIAFgRMj1x0D6yNG-HAsH5z4ejKrUklBnDvDPDK60h8e8HX0O31gA3pKSQYcN4v71RUYobDgAeciTZJwFe7PVneGo5q2nBuXNIhkwzKebrB5H9_O2At40PjQ9NOAKYCnl4N532p-NNGkHw/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fbioc-devel



This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel using r-project.org mailing list
https://secure-web.cisco.com/1thqFzBAcT0ErtGX-D2Y92G38HTNI25jcq62d-WCawFoQYC218AtscHoM4VW5_dRc5tH-YWcY-cjQDkvIc-6ukdQ3ZAA5y0SlIQJMp1h2ArJIqB4yGFiua5DXt2eeIb-qChQZgmntCJffJrNwtn3iHQCg-X6kSkwzbBOT_Y4B-YWr77Qctd7puN0evQlJ4XSDSWEUfdvWzk-7wAQID4XCq-q6VWk7W2LhGRUPIThvl6_YYNljIeloEj5RlyS4VeYsw6EE0-0O_77PPWLDlfZpJmekjXREfUDjvJSLLELTyvrk-kanUUidUjcRpWgFUzrH/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fbioc-devel



This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list