[Bioc-devel] AnnotationHubData Error: Access denied: 530
Arora, Sonali
sarora at fredhutch.org
Thu Apr 16 23:49:40 CEST 2015
Hi Johannes,
I have updated the GRanges for all the GTF files.
> library(AnnotationHub)
> hubCache(ah)
[1] "/home/sarora/.AnnotationHub" ## delete everything from this folder
> ah = AnnotationHub()
retrieving 1 resources
|======================================================================|
100%
> rat <- query(ah, c("gtf", "69", "rattus"))
> rat[1]
AnnotationHub with 1 record
# snapshotDate(): 2015-03-26
# names(): AH7522
# $dataprovider: Ensembl
# $species: Rattus norvegicus
# $rdataclass: GRanges
# $title: Rattus_norvegicus.RGSC3.4.69.gtf
# $description: Gene Annotation for Rattus norvegicus
# $taxonomyid: 10116
# $genome: RGSC3.4
# $sourcetype: GTF
# $sourceurl:
ftp://ftp.ensembl.org/pub/release-69/gtf/rattus_norvegicus/Rat...
# $sourcelastmodifieddate: 2012-10-19
# $sourcesize: 8485113
# $tags: GTF, ensembl, Gene, Transcript, Annotation
# retrieve record with 'object[["AH7522"]]'
> ra <- rat[[1]]
require(“GenomicRanges”)
retrieving 1 resources
|======================================================================|
100%
> seqinfo(ra)
Seqinfo object with 22 sequences (1 circular) from RGSC3 genome:
seqnames seqlengths isCircular genome
1 267910886 FALSE RGSC3
2 258207540 FALSE RGSC3
3 171063335 FALSE RGSC3
4 187126005 FALSE RGSC3
5 173096209 FALSE RGSC3
... ... ... ...
18 87265094 FALSE RGSC3
19 59218465 FALSE RGSC3
20 55268282 FALSE RGSC3
X 160699376 FALSE RGSC3
MT 16313 TRUE RGSC3
> seqlevels(ra)
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13"
"14" "15"
[16] "16" "17" "18" "19" "20" "X" "MT"
> genome(ra)
1 2 3 4 5 6 7 8 9 10
"RGSC3" "RGSC3" "RGSC3" "RGSC3" "RGSC3" "RGSC3" "RGSC3" "RGSC3" "RGSC3"
"RGSC3"
11 12 13 14 15 16 17 18 19 20
"RGSC3" "RGSC3" "RGSC3" "RGSC3" "RGSC3" "RGSC3" "RGSC3" "RGSC3" "RGSC3"
"RGSC3"
X MT
"RGSC3" "RGSC3"
Hope that helps!
Sonali.
On 4/13/2015 10:54 AM, Marc Carlson wrote:
> Hi Johannes,
>
> We are already planning to upgrade those objects to have that
> information when they are downloaded... Sonali is actually working on
> that right now. She will probably have updated that information by the
> end of the week or so. It's a lot of files to update, but this is
> already in progress.
>
> So if you are willing to wait a few days you can probably save yourself
> some headaches...
>
>
> Marc
>
>
>
>
> On 04/11/2015 12:13 PM, Rainer Johannes wrote:
>> Hi Marc,
>>
>> you're right. I'll start with option 1. For that it would however be
>> really nice to have the seqinfo available in the GRanges as mentioned
>> in my previous mail. In the meantime I'll try to fetch the chrom
>> lengths myself but would be nice to have all that ready in the GRanges
>> at some point.
>>
>> cheers, jo
>>
>>> On 11 Apr 2015, at 00:54, Marc Carlson <mcarlson at fredhutch.org
>>> <mailto:mcarlson at fredhutch.org>> wrote:
>>>
>>> On 04/10/2015 12:18 PM, Rainer Johannes wrote:
>>>> dear Sonali, Herve,
>>>>
>>>> On 10 Apr 2015, at 19:59, Herv� Pag�s <hpages at fredhutch.org
>>>> <mailto:hpages at fredhutch.org><mailto:hpages at fredhutch.org>> wrote:
>>>>
>>>> Hi Johannes, Sonali,
>>>>
>>>> On 04/10/2015 09:40 AM, Arora, Sonali wrote:
>>>> Hi Rainer,
>>>>
>>>> Just to be clear - what do you want to be available from AnnotationHub()
>>>> in the end?
>>>>
>>>> Currently the GTF files from Ensembl are already present inside the
>>>> AnnotationHub
>>>>
>>>> library(AnnotationHub)
>>>> ah = AnnotationHub()
>>>> gtf <- query(ah, "GTF")
>>>> gtf <- query(gtf, "Ensembl")
>>>> gtf[1]
>>>> gtf[[1]] # returned to you as GenomicRanges object.
>>>>
>>>> - why not get the GTF files directly from AnnotationHub instead of
>>>> getting them from the ftp site? Then you can make your EnsDb classes
>>>> from these GRanges.
>>>> It will also make your recipe faster because you will not have to
>>>> download the file and parse the object.
>>>>
>>>> A GRanges object is not the same as a GTF file and I guess Johannes
>>>> wants access to the GTF file. Are these GTF files available on
>>>> AnnotationHub?
>>>>
>>>>
>>>> yes, you're right. I wanted access to the GTF file and most likely
>>>> understood the AnnotationHub idea wrong... my idea was to build a
>>>> recipe that takes as input the GTF file (as the
>>>> makeEnsemblGtfToGRanges) and generates from that the EnsDb SQLite
>>>> database file. I thought that these SQLite files would be generated
>>>> on the fly on the user's computer, but I guess that stuff is
>>>> processed once and stored on your servers, right?
>>> Hi Johannes,
>>>
>>> So you have several options actually. We sometimes store the files in
>>> S3 and then send them down/cache them as requested and other times the
>>> hub can just point to an existing ftp site (and files get
>>> transformed/cached on the fly when users ask for them). So you have
>>> three choices here:
>>>
>>> 1) You could just write a function that takes in one of the processed
>>> GRanges objects and transforms it into an EnsDb object. This should be
>>> straightforward and is probably your easiest option since you won't have
>>> to write a recipe OR have any code included into the AnnotationHub. You
>>> can basically just take advantage of the fact that these data are
>>> already there in the hub waiting to be used.
>>> 2) You could write R code that transforms a GTF file into a sqlite file
>>> and ALSO a recipe to call that (and create metadata) for all the GTF
>>> files. This will be more work than #1 since you will have to write both
>>> a recipe and port any code that you have for generating the DB files.
>>> But when you are done you would be able to have your resources come
>>> right out of the AnnotationHub.
>>> 3) You could write R code to process a GRanges object into an EnsDb
>>> object and then also write a recipe so that your data resources can be
>>> served up directly from the AnnotationHub, but still take advantage of
>>> what is already there (GRanges). No new data would need to be added to
>>> the hub since new metadata records could allow users to transform the
>>> data into EnsDb objects on the fly. This is an elegant solution, but it
>>> will still take more effort than option #1.
>>>
>>> If I were you, I would start with option #1. That way if (after I got
>>> that working) I still wanted things to be more elegant, then I could
>>> then add a recipe (thus evolving the strategy into option #3...
>>>
>>>
>>> Marc
>>>
>>>
>>>
>>>> @Johannes - Here is one alternative: You could take a different approach
>>>> and implement some equivalent of makeTxDbFromGRanges() for EnsDb
>>>> objects. So people could just do:
>>>>
>>>> library(ensembldb)
>>>> ensdb <- makeEnsDbFromGRanges(gtf[[1]])
>>>>
>>>> like they can do right now with makeTxDbFromGRanges():
>>>>
>>>> library(GenomicFeatures)
>>>> txdb <- makeTxDbFromGRanges(gtf[[1]])
>>>>
>>>> That way you don't need a recipe or try to add things to
>>>> AnnotationHub at all.
>>>>
>>>>
>>>> that's a good idea, I will implement that too. just want to make
>>>> sure that I can get all data I'll need (also the genome build
>>>> version, Ensembl version etc from the GRanges, most likely I have to
>>>> guess that from the file name of the RData file).
>>>>
>>>> @Sonali - These GRanges objects I get from AnnotationHub have no genome
>>>> information and their seqlevels are not sorted:
>>>>
>>>>> seqinfo(gtf[[1]])
>>>> Seqinfo object with 22 sequences from an unspecified genome; no
>>>> seqlengths:
>>>> seqnames seqlengths isCircular genome
>>>> X <NA> <NA> <NA>
>>>> 9 <NA> <NA> <NA>
>>>> 8 <NA> <NA> <NA>
>>>> 7 <NA> <NA> <NA>
>>>> 6 <NA> <NA> <NA>
>>>> ... ... ... ...
>>>> 12 <NA> <NA> <NA>
>>>> 11 <NA> <NA> <NA>
>>>> 10 <NA> <NA> <NA>
>>>> 1 <NA> <NA> <NA>
>>>> MT <NA> <NA> <NA>
>>>>
>>>> I know it's easy enough to sort the seqlevels with sortSeqlevels() but
>>>> what about having these things done by the recipe instead?
>>>>
>>>>
>>>> I also have a suggestion there: what if you used also the
>>>> fetchChromLengthsFromEnsembl from the GenomicFeatures package? the
>>>> GTF files are anyway from Ensembl, so getting the seqinfo from there
>>>> would make sense... and I wouldn't have to fetch it separately to
>>>> build the EnsDb.
>>>>
>>>> thanks!
>>>> jo
>>>>
>>>> Thanks,
>>>> H.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>> Sonali.
>>>>
>>>>
>>>> On 4/9/2015 11:14 PM, Rainer Johannes wrote:
>>>> dear all,
>>>>
>>>> I have added a recipe to the AnnotationHubData to provide EnsDb
>>>> classes (from my ensembldb package) based on GTF files from Ensembl.
>>>> Now, after adding the recipe to the AnnotationHubData package and
>>>> installing it (following the vignettes from the AnnotationHub and
>>>> AnnotationHubData) I called
>>>>
>>>> updateResources(AnnotationHubRoot=getWd(), BiocVersion=biocVersion(),
>>>> preparerClasses="EnsemblGtfToEnsDbPreparer", insert=FALSE,
>>>> metadataOnly=TRUE)
>>>>
>>>> and got the output:
>>>>
>>>> Ailuropoda_melanoleuca.ailMel1.78.gtf.gz
>>>> Anas_platyrhynchos.BGI_duck_1.0.78.gtf.gz
>>>> Anolis_carolinensis.AnoCar2.0.78.gtf.gz
>>>> Astyanax_mexicanus.AstMex102.78.gtf.gz
>>>> Bos_taurus.UMD3.1.78.gtf.gz
>>>> Caenorhabditis_elegans.WBcel235.78.gtf.gz
>>>> Callithrix_jacchus.C_jacchus3.2.1.78.gtf.gz
>>>> Canis_familiaris.CanFam3.1.78.gtf.gz
>>>> Cavia_porcellus.cavPor3.78.gtf.gz
>>>> Chlorocebus_sabaeus.ChlSab1.1.78.gtf.gz
>>>> Choloepus_hoffmanni.choHof1.78.gtf.gz
>>>> Ciona_intestinalis.KH.78.gtf.gz
>>>> Ciona_savignyi.CSAV2.0.78.gtf.gz
>>>> Danio_rerio.Zv9.78.gtf.gz
>>>> Dasypus_novemcinctus.Dasnov3.0.78.gtf.gz
>>>> Dipodomys_ordii.dipOrd1.78.gtf.gz
>>>> Drosophila_melanogaster.BDGP5.78.gtf.gz
>>>> Error in function (type, msg, asError = TRUE) : Access denied: 530
>>>>
>>>> I guess that must be related to the Ensembl ftp? Is anybody else
>>>> experiencing this error?
>>>>
>>>> cheers, jo
>>>>
>>>>
>>>> my session info:
>>>>
>>>> sessionInfo()
>>>> R Under development (unstable) (2015-03-04 r67940)
>>>> Platform: x86_64-apple-darwin14.3.0/x86_64 (64-bit)
>>>> Running under: OS X 10.10.3 (Yosemite)
>>>>
>>>> locale:
>>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>>
>>>> attached base packages:
>>>> [1] parallel stats4 stats graphics grDevices utils datasets
>>>> [8] methods base
>>>>
>>>> other attached packages:
>>>> [1] AnnotationHubData_0.0.205 futile.logger_1.4
>>>> [3] AnnotationHub_1.99.81 GenomicRanges_1.19.52
>>>> [5] GenomeInfoDb_1.3.16 IRanges_2.1.43
>>>> [7] S4Vectors_0.5.22 BiocGenerics_0.13.11
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] Rcpp_0.11.5 BiocInstaller_1.17.7
>>>> [3] XVector_0.7.4 futile.options_1.0.0
>>>> [5] GenomicFeatures_1.19.37 bitops_1.0-6
>>>> [7] tools_3.2.0 zlibbioc_1.13.3
>>>> [9] biomaRt_2.23.5 digest_0.6.8
>>>> [11] BSgenome_1.35.20 jsonlite_0.9.15
>>>> [13] RSQLite_1.0.0 shiny_0.11.1
>>>> [15] DBI_0.3.1 rtracklayer_1.27.11
>>>> [17] httr_0.6.1 stringr_0.6.2
>>>> [19] Biostrings_2.35.12 Biobase_2.27.3
>>>> [21] R6_2.0.1 AnnotationDbi_1.29.21
>>>> [23] XML_3.98-1.1 BiocParallel_1.1.24
>>>> [25] RJSONIO_1.3-0 ensembldb_0.99.15
>>>> [27] lambda.r_1.1.7 Rsamtools_1.19.50
>>>> [29] htmltools_0.2.6 GenomicAlignments_1.3.34
>>>> [31] AnnotationForge_1.9.7 mime_0.3
>>>> [33] interactiveDisplayBase_1.5.6 xtable_1.7-4
>>>> [35] httpuv_1.3.2 RCurl_1.95-4.5
>>>> [37] VariantAnnotation_1.13.47
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org
>>>> <mailto:Bioc-devel at r-project.org><mailto:Bioc-devel at r-project.org>
>>>> mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>> --
>>>> Herv� Pag�s
>>>>
>>>> Program in Computational Biology
>>>> Division of Public Health Sciences
>>>> Fred Hutchinson Cancer Research Center
>>>> 1100 Fairview Ave. N, M1-B514
>>>> P.O. Box 19024
>>>> Seattle, WA 98109-1024
>>>>
>>>> E-mail: hpages at fredhutch.org<mailto:hpages at fredhutch.org>
>>>> Phone: (206) 667-5791
>>>> Fax: (206) 667-1319
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
More information about the Bioc-devel
mailing list