[Bioc-devel] duplicated entries with 'ExperimentHub(localHub=TRUE)'
Robert Castelo
robert@c@@te|o @end|ng |rom up|@edu
Fri Apr 5 15:22:25 CEST 2024
that's great, thank you so much Lori!
robert.
On 4/5/24 15:02, Kern, Lori wrote:
> I found the bug. Testing and pushing up a fix.
>
> Cheers,
>
> Lori Shepherd - Kern
>
> Bioconductor Core Team
>
> Roswell Park Comprehensive Cancer Center
>
> Department of Biostatistics & Bioinformatics
>
> Elm & Carlton Streets
>
> Buffalo, New York 14263
>
> ------------------------------------------------------------------------
> *From:* Bioc-devel <bioc-devel-bounces using r-project.org> on behalf of
> Kern, Lori via Bioc-devel <bioc-devel using r-project.org>
> *Sent:* Friday, April 5, 2024 8:15 AM
> *To:* Robert Castelo <robert.castelo using upf.edu>;
> bioc-devel using r-project.org <bioc-devel using r-project.org>
> *Subject:* Re: [Bioc-devel] duplicated entries with
> 'ExperimentHub(localHub=TRUE)'
> I will have to look at how offline changes the loading of the files.
> That is an odd and unexpected behavior.
>
> They aren't actually duplicate files, what is happening is it is
> displaying the entry for the bam file (.bam) and the index file (.bai)
> as separate entries when offline instead of associating them as one entry.
>
> I'll investigate more.
>
>
> Lori Shepherd - Kern
>
> Bioconductor Core Team
>
> Roswell Park Comprehensive Cancer Center
>
> Department of Biostatistics & Bioinformatics
>
> Elm & Carlton Streets
>
> Buffalo, New York 14263
>
> ________________________________
> From: Bioc-devel <bioc-devel-bounces using r-project.org> on behalf of
> Robert Castelo <robert.castelo using upf.edu>
> Sent: Thursday, April 4, 2024 2:40 PM
> To: bioc-devel using r-project.org <bioc-devel using r-project.org>
> Subject: [Bioc-devel] duplicated entries with
> 'ExperimentHub(localHub=TRUE)'
>
> hi,
>
> I'm getting duplicated entries when loading **offline** previously
> cached ExperimentHub resources. This code reproduces the problem:
>
> 1. If in a fresh empty cache of ExperimentHub I download 9 resources
> through the gDNAinRNAseqData package:
>
> library(gDNAinRNAseqData)
>
> bamfiles <- LiYu22subsetBAMfiles()
> length(bamfiles)
> [1] 9
>
> 2. Try to load them again from the local cache either going offline or
> using the 'offline=TRUE' argument to the loader function, which sets
> 'localHub=TRUE' in the call to 'ExperimentHub()':
>
> bamfiles <- LiYu22subsetBAMfiles(offline=TRUE)
> Using 'localHub=TRUE'
> If offline, please also see BiocManager vignette section on offline use
> snapshotDate(): 2024-04-02
> see ?gDNAinRNAseqData and browseVignettes('gDNAinRNAseqData') for
> documentation
> loading from cache
> [...]
>
> length(bamfiles)
> [1] 18
>
> 3. If I examine the resources offline directly with 'ExperimentHub()' I
> see them duplicated with some IDs getting a '.1' suffix:
>
> library(ExperimentHub)
>
> eh <- ExperimentHub(localHub=TRUE)
> Using 'localHub=TRUE'
> If offline, please also see BiocManager vignette section on offline use
> snapshotDate(): 2024-04-02
> length(eh)
> [1] 18
> eh
> ExperimentHub with 18 records
> # snapshotDate(): 2024-04-02
> # $dataprovider: NGDC
> # $species: Homo sapiens
> # $rdataclass: BamFile
> # additional mcols(): taxonomyid, genome, description,
> # coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
> # rdatapath, sourceurl, sourcetype
> # retrieve records with, e.g., 'object[["EH8079"]]'
>
>
> EH8079 |
> EH8079.1 |
> EH8080 |
> EH8080.1 |
> EH8081 |
> ...
> EH8085.1 |
> EH8086 |
> EH8086.1 |
> EH8087 |
> EH8087.1 |
> title
> EH8079 RNA-seq data BAM file subset of HRR589632 contaminated with
> 0% gDNA
> EH8079.1 RNA-seq data BAM file subset of HRR589632 contaminated with
> 0% gDNA
> EH8080 RNA-seq data BAM file subset of HRR589633 contaminated with
> 0% gDNA
> EH8080.1 RNA-seq data BAM file subset of HRR589633 contaminated with
> 0% gDNA
> EH8081 RNA-seq data BAM file subset of HRR589634 contaminated with
> 0% gDNA
> ... ...
> EH8085.1 RNA-seq data BAM file subset of HRR589623 contaminated with
> 10% ...
> EH8086 RNA-seq data BAM file subset of HRR589624 contaminated with
> 10% ...
> EH8086.1 RNA-seq data BAM file subset of HRR589624 contaminated with
> 10% ...
> EH8087 RNA-seq data BAM file subset of HRR589625 contaminated with
> 10% ...
> EH8087.1 RNA-seq data BAM file subset of HRR589625 contaminated with
> 10% ...
>
> Does anybody have an idea what might be going on with
> 'ExperimentHub(localHub=TRUE)'?
>
> Thanks!
>
> robert.
>
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://secure-web.cisco.com/1H0voxA7oQ0saDcNCWmRZwr1H6rkyUr0Fu4Ru-hZrq5GY1ay-R4ltvl_raeo94HUjjlKMox7wMWOkNHrqW28aJmsFXxCkYVatvRWHo5X5Pwpy3KKZLPxRybRw-xB-pjeKV38ia8MSC3_WURYilKunRSCMrcU8O0rBmThSR5Zip-TpfdAvp5oTkjIvudwgfsDPkVYxWwfoZIAFgRMj1x0D6yNG-HAsH5z4ejKrUklBnDvDPDK60h8e8HX0O31gA3pKSQYcN4v71RUYobDgAeciTZJwFe7PVneGo5q2nBuXNIhkwzKebrB5H9_O2At40PjQ9NOAKYCnl4N532p-NNGkHw/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fbioc-devel
>
>
>
> This email message may contain legally privileged and/or confidential
> information. If you are not the intended recipient(s), or the
> employee or agent responsible for the delivery of this message to the
> intended recipient(s), you are hereby notified that any disclosure,
> copying, distribution, or use of this email message is prohibited. If
> you have received this message in error, please notify the sender
> immediately by e-mail and delete this email message from your
> computer. Thank you.
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://secure-web.cisco.com/1thqFzBAcT0ErtGX-D2Y92G38HTNI25jcq62d-WCawFoQYC218AtscHoM4VW5_dRc5tH-YWcY-cjQDkvIc-6ukdQ3ZAA5y0SlIQJMp1h2ArJIqB4yGFiua5DXt2eeIb-qChQZgmntCJffJrNwtn3iHQCg-X6kSkwzbBOT_Y4B-YWr77Qctd7puN0evQlJ4XSDSWEUfdvWzk-7wAQID4XCq-q6VWk7W2LhGRUPIThvl6_YYNljIeloEj5RlyS4VeYsw6EE0-0O_77PPWLDlfZpJmekjXREfUDjvJSLLELTyvrk-kanUUidUjcRpWgFUzrH/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fbioc-devel
>
>
> This email message may contain legally privileged and/or confidential
> information. If you are not the intended recipient(s), or the employee
> or agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure, copying,
> distribution, or use of this email message is prohibited. If you have
> received this message in error, please notify the sender immediately
> by e-mail and delete this email message from your computer. Thank you.
--
Robert Castelo, PhD
Associate Professor
Dept. of Medicine and Life Sciences
Universitat Pompeu Fabra (UPF)
Barcelona Biomedical Research Park (PRBB)
Dr Aiguader 88
E-08003 Barcelona, Spain
telf: +34.933.160.514
[[alternative HTML version deleted]]
More information about the Bioc-devel
mailing list