[Bioc-devel] duplicated entries with 'ExperimentHub(localHub=TRUE)'

Robert Castelo robert@c@@te|o @end|ng |rom up|@edu
Fri Apr 5 15:22:25 CEST 2024


that's great, thank you so much Lori!


robert.


On 4/5/24 15:02, Kern, Lori wrote:
> I found the bug.  Testing and pushing up a fix.
>
> Cheers,
>
> Lori Shepherd - Kern
>
> Bioconductor Core Team
>
> Roswell Park Comprehensive Cancer Center
>
> Department of Biostatistics & Bioinformatics
>
> Elm & Carlton Streets
>
> Buffalo, New York 14263
>
> ------------------------------------------------------------------------
> *From:* Bioc-devel <bioc-devel-bounces using r-project.org> on behalf of 
> Kern, Lori via Bioc-devel <bioc-devel using r-project.org>
> *Sent:* Friday, April 5, 2024 8:15 AM
> *To:* Robert Castelo <robert.castelo using upf.edu>; 
> bioc-devel using r-project.org <bioc-devel using r-project.org>
> *Subject:* Re: [Bioc-devel] duplicated entries with 
> 'ExperimentHub(localHub=TRUE)'
> I will have to look at how offline changes the loading of the files.  
> That is an odd and unexpected behavior.
>
> They aren't actually duplicate files, what is happening is it is 
> displaying the entry for the bam file (.bam) and the index file (.bai) 
> as separate entries when offline instead of associating them as one entry.
>
> I'll investigate more.
>
>
> Lori Shepherd - Kern
>
> Bioconductor Core Team
>
> Roswell Park Comprehensive Cancer Center
>
> Department of Biostatistics & Bioinformatics
>
> Elm & Carlton Streets
>
> Buffalo, New York 14263
>
> ________________________________
> From: Bioc-devel <bioc-devel-bounces using r-project.org> on behalf of 
> Robert Castelo <robert.castelo using upf.edu>
> Sent: Thursday, April 4, 2024 2:40 PM
> To: bioc-devel using r-project.org <bioc-devel using r-project.org>
> Subject: [Bioc-devel] duplicated entries with 
> 'ExperimentHub(localHub=TRUE)'
>
> hi,
>
> I'm getting duplicated entries when loading **offline** previously
> cached ExperimentHub resources. This code reproduces the problem:
>
> 1. If in a fresh empty cache of ExperimentHub I download 9 resources
> through the gDNAinRNAseqData package:
>
> library(gDNAinRNAseqData)
>
> bamfiles <- LiYu22subsetBAMfiles()
> length(bamfiles)
> [1] 9
>
> 2. Try to load them again from the local cache either going offline or
> using the 'offline=TRUE' argument to the loader function, which sets
> 'localHub=TRUE' in the call to 'ExperimentHub()':
>
> bamfiles <- LiYu22subsetBAMfiles(offline=TRUE)
> Using 'localHub=TRUE'
>    If offline, please also see BiocManager vignette section on offline use
> snapshotDate(): 2024-04-02
> see ?gDNAinRNAseqData and browseVignettes('gDNAinRNAseqData') for
> documentation
> loading from cache
> [...]
>
> length(bamfiles)
> [1] 18
>
> 3. If I examine the resources offline directly with 'ExperimentHub()' I
> see them duplicated with some IDs getting a '.1' suffix:
>
> library(ExperimentHub)
>
> eh <- ExperimentHub(localHub=TRUE)
> Using 'localHub=TRUE'
>    If offline, please also see BiocManager vignette section on offline use
> snapshotDate(): 2024-04-02
> length(eh)
> [1] 18
> eh
> ExperimentHub with 18 records
> # snapshotDate(): 2024-04-02
> # $dataprovider: NGDC
> # $species: Homo sapiens
> # $rdataclass: BamFile
> # additional mcols(): taxonomyid, genome, description,
> #   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
> #   rdatapath, sourceurl, sourcetype
> # retrieve records with, e.g., 'object[["EH8079"]]'
>
>
>    EH8079   |
>    EH8079.1 |
>    EH8080   |
>    EH8080.1 |
>    EH8081   |
>    ...
>    EH8085.1 |
>    EH8086   |
>    EH8086.1 |
>    EH8087   |
>    EH8087.1 |
> title
>    EH8079   RNA-seq data BAM file subset of HRR589632 contaminated with
> 0% gDNA
>    EH8079.1 RNA-seq data BAM file subset of HRR589632 contaminated with
> 0% gDNA
>    EH8080   RNA-seq data BAM file subset of HRR589633 contaminated with
> 0% gDNA
>    EH8080.1 RNA-seq data BAM file subset of HRR589633 contaminated with
> 0% gDNA
>    EH8081   RNA-seq data BAM file subset of HRR589634 contaminated with
> 0% gDNA
>    ... ...
>    EH8085.1 RNA-seq data BAM file subset of HRR589623 contaminated with
> 10% ...
>    EH8086   RNA-seq data BAM file subset of HRR589624 contaminated with
> 10% ...
>    EH8086.1 RNA-seq data BAM file subset of HRR589624 contaminated with
> 10% ...
>    EH8087   RNA-seq data BAM file subset of HRR589625 contaminated with
> 10% ...
>    EH8087.1 RNA-seq data BAM file subset of HRR589625 contaminated with
> 10% ...
>
> Does anybody have an idea what might be going on with
> 'ExperimentHub(localHub=TRUE)'?
>
> Thanks!
>
> robert.
>
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://secure-web.cisco.com/1H0voxA7oQ0saDcNCWmRZwr1H6rkyUr0Fu4Ru-hZrq5GY1ay-R4ltvl_raeo94HUjjlKMox7wMWOkNHrqW28aJmsFXxCkYVatvRWHo5X5Pwpy3KKZLPxRybRw-xB-pjeKV38ia8MSC3_WURYilKunRSCMrcU8O0rBmThSR5Zip-TpfdAvp5oTkjIvudwgfsDPkVYxWwfoZIAFgRMj1x0D6yNG-HAsH5z4ejKrUklBnDvDPDK60h8e8HX0O31gA3pKSQYcN4v71RUYobDgAeciTZJwFe7PVneGo5q2nBuXNIhkwzKebrB5H9_O2At40PjQ9NOAKYCnl4N532p-NNGkHw/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fbioc-devel
>
>
>
> This email message may contain legally privileged and/or confidential 
> information.  If you are not the intended recipient(s), or the 
> employee or agent responsible for the delivery of this message to the 
> intended recipient(s), you are hereby notified that any disclosure, 
> copying, distribution, or use of this email message is prohibited. If 
> you have received this message in error, please notify the sender 
> immediately by e-mail and delete this email message from your 
> computer. Thank you.
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://secure-web.cisco.com/1thqFzBAcT0ErtGX-D2Y92G38HTNI25jcq62d-WCawFoQYC218AtscHoM4VW5_dRc5tH-YWcY-cjQDkvIc-6ukdQ3ZAA5y0SlIQJMp1h2ArJIqB4yGFiua5DXt2eeIb-qChQZgmntCJffJrNwtn3iHQCg-X6kSkwzbBOT_Y4B-YWr77Qctd7puN0evQlJ4XSDSWEUfdvWzk-7wAQID4XCq-q6VWk7W2LhGRUPIThvl6_YYNljIeloEj5RlyS4VeYsw6EE0-0O_77PPWLDlfZpJmekjXREfUDjvJSLLELTyvrk-kanUUidUjcRpWgFUzrH/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fbioc-devel
>
>
> This email message may contain legally privileged and/or confidential 
> information. If you are not the intended recipient(s), or the employee 
> or agent responsible for the delivery of this message to the intended 
> recipient(s), you are hereby notified that any disclosure, copying, 
> distribution, or use of this email message is prohibited. If you have 
> received this message in error, please notify the sender immediately 
> by e-mail and delete this email message from your computer. Thank you. 

-- 
Robert Castelo, PhD
Associate Professor
Dept. of Medicine and Life Sciences
Universitat Pompeu Fabra (UPF)
Barcelona Biomedical Research Park (PRBB)
Dr Aiguader 88
E-08003 Barcelona, Spain
telf: +34.933.160.514

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list