[Bioc-devel] AnnotationHubData Error: Access denied: 530

Rainer Johannes Johannes.Rainer at eurac.edu
Fri Apr 10 21:18:23 CEST 2015


dear Sonali, Herve,

On 10 Apr 2015, at 19:59, Hervé Pagès <hpages at fredhutch.org<mailto:hpages at fredhutch.org>> wrote:

Hi Johannes, Sonali,

On 04/10/2015 09:40 AM, Arora, Sonali wrote:
Hi Rainer,

Just to be clear - what do you want to be available from AnnotationHub()
in the end?

Currently the GTF files from Ensembl are already present inside the
AnnotationHub

library(AnnotationHub)
ah = AnnotationHub()
gtf <- query(ah, "GTF")
gtf <- query(gtf, "Ensembl")
gtf[1]
gtf[[1]] # returned to you as  GenomicRanges object.

- why not get the GTF files directly from AnnotationHub instead of
getting them from the ftp site? Then you can make your EnsDb classes
from these GRanges.
It will also make your recipe faster because you will not have to
download the file and parse the object.

A GRanges object is not the same as a GTF file and I guess Johannes
wants access to the GTF file. Are these GTF files available on
AnnotationHub?


yes, you're right. I wanted access to the GTF file and most likely understood the AnnotationHub idea wrong... my idea was to build a recipe that takes as input the GTF file (as the makeEnsemblGtfToGRanges) and generates from that the EnsDb SQLite database file. I thought that these SQLite files would be generated on the fly on the user's computer, but I guess that stuff is processed once and stored on your servers, right?


@Johannes - Here is one alternative: You could take a different approach
and implement some equivalent of makeTxDbFromGRanges() for EnsDb
objects. So people could just do:

 library(ensembldb)
 ensdb <- makeEnsDbFromGRanges(gtf[[1]])

like they can do right now with makeTxDbFromGRanges():

 library(GenomicFeatures)
 txdb <- makeTxDbFromGRanges(gtf[[1]])

That way you don't need a recipe or try to add things to AnnotationHub at all.


that's a good idea, I will implement that too. just want to make sure that I can get all data I'll need (also the genome build version, Ensembl version etc from the GRanges, most likely I have to guess that from the file name of the RData file).

@Sonali - These GRanges objects I get from AnnotationHub have no genome
information and their seqlevels are not sorted:

 > seqinfo(gtf[[1]])
 Seqinfo object with 22 sequences from an unspecified genome; no seqlengths:
   seqnames seqlengths isCircular genome
   X              <NA>       <NA>   <NA>
   9              <NA>       <NA>   <NA>
   8              <NA>       <NA>   <NA>
   7              <NA>       <NA>   <NA>
   6              <NA>       <NA>   <NA>
   ...             ...        ...    ...
   12             <NA>       <NA>   <NA>
   11             <NA>       <NA>   <NA>
   10             <NA>       <NA>   <NA>
   1              <NA>       <NA>   <NA>
   MT             <NA>       <NA>   <NA>

I know it's easy enough to sort the seqlevels with sortSeqlevels() but
what about having these things done by the recipe instead?


I also have a suggestion there: what if you used also the fetchChromLengthsFromEnsembl from the GenomicFeatures package? the GTF files are anyway from Ensembl, so getting the seqinfo from there would make sense... and I wouldn't have to fetch it separately to build the EnsDb.

thanks!
jo

Thanks,
H.



Thanks,
Sonali.


On 4/9/2015 11:14 PM, Rainer Johannes wrote:
dear all,

I have added a recipe to the AnnotationHubData to provide EnsDb
classes (from my ensembldb package) based on GTF files from Ensembl.
Now, after adding the recipe to the AnnotationHubData package and
installing it (following the vignettes from the AnnotationHub and
AnnotationHubData) I called

updateResources(AnnotationHubRoot=getWd(), BiocVersion=biocVersion(),
preparerClasses="EnsemblGtfToEnsDbPreparer", insert=FALSE,
metadataOnly=TRUE)

and got the output:

Ailuropoda_melanoleuca.ailMel1.78.gtf.gz
Anas_platyrhynchos.BGI_duck_1.0.78.gtf.gz
Anolis_carolinensis.AnoCar2.0.78.gtf.gz
Astyanax_mexicanus.AstMex102.78.gtf.gz
Bos_taurus.UMD3.1.78.gtf.gz
Caenorhabditis_elegans.WBcel235.78.gtf.gz
Callithrix_jacchus.C_jacchus3.2.1.78.gtf.gz
Canis_familiaris.CanFam3.1.78.gtf.gz
Cavia_porcellus.cavPor3.78.gtf.gz
Chlorocebus_sabaeus.ChlSab1.1.78.gtf.gz
Choloepus_hoffmanni.choHof1.78.gtf.gz
Ciona_intestinalis.KH.78.gtf.gz
Ciona_savignyi.CSAV2.0.78.gtf.gz
Danio_rerio.Zv9.78.gtf.gz
Dasypus_novemcinctus.Dasnov3.0.78.gtf.gz
Dipodomys_ordii.dipOrd1.78.gtf.gz
Drosophila_melanogaster.BDGP5.78.gtf.gz
Error in function (type, msg, asError = TRUE)  : Access denied: 530

I guess that must be related to the Ensembl ftp? Is anybody else
experiencing this error?

cheers, jo


my session info:

sessionInfo()
R Under development (unstable) (2015-03-04 r67940)
Platform: x86_64-apple-darwin14.3.0/x86_64 (64-bit)
Running under: OS X 10.10.3 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
[1] AnnotationHubData_0.0.205 futile.logger_1.4
[3] AnnotationHub_1.99.81     GenomicRanges_1.19.52
[5] GenomeInfoDb_1.3.16       IRanges_2.1.43
[7] S4Vectors_0.5.22          BiocGenerics_0.13.11

loaded via a namespace (and not attached):
 [1] Rcpp_0.11.5                  BiocInstaller_1.17.7
 [3] XVector_0.7.4                futile.options_1.0.0
 [5] GenomicFeatures_1.19.37      bitops_1.0-6
 [7] tools_3.2.0                  zlibbioc_1.13.3
 [9] biomaRt_2.23.5               digest_0.6.8
[11] BSgenome_1.35.20             jsonlite_0.9.15
[13] RSQLite_1.0.0                shiny_0.11.1
[15] DBI_0.3.1                    rtracklayer_1.27.11
[17] httr_0.6.1                   stringr_0.6.2
[19] Biostrings_2.35.12           Biobase_2.27.3
[21] R6_2.0.1                     AnnotationDbi_1.29.21
[23] XML_3.98-1.1                 BiocParallel_1.1.24
[25] RJSONIO_1.3-0                ensembldb_0.99.15
[27] lambda.r_1.1.7               Rsamtools_1.19.50
[29] htmltools_0.2.6              GenomicAlignments_1.3.34
[31] AnnotationForge_1.9.7        mime_0.3
[33] interactiveDisplayBase_1.5.6 xtable_1.7-4
[35] httpuv_1.3.2                 RCurl_1.95-4.5
[37] VariantAnnotation_1.13.47
_______________________________________________
Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

_______________________________________________
Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org<mailto:hpages at fredhutch.org>
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list