[Bioc-devel] AnnotationHubData Error: Access denied: 530

Mon Apr 13 19:54:26 CEST 2015

Hi Johannes,

We are already planning to upgrade those objects to have that 
information when they are downloaded...  Sonali is actually working on 
that right now.  She will probably have updated that information by the 
end of the week or so.  It's a lot of files to update, but this is 
already in progress.

So if you are willing to wait a few days you can probably save yourself 
some headaches...

  Marc

On 04/11/2015 12:13 PM, Rainer Johannes wrote:
> Hi Marc,
>
> you're right. I'll start with option 1. For that it would however be 
> really nice to have the seqinfo available in the GRanges as mentioned 
> in my previous mail. In the meantime I'll try to fetch the chrom 
> lengths myself but would be nice to have all that ready in the GRanges 
> at some point.
>
> cheers, jo
>
>> On 11 Apr 2015, at 00:54, Marc Carlson <mcarlson at fredhutch.org 
>> <mailto:mcarlson at fredhutch.org>> wrote:
>>
>> On 04/10/2015 12:18 PM, Rainer Johannes wrote:
>>> dear Sonali, Herve,
>>>
>>> On 10 Apr 2015, at 19:59, Herv� Pag�s <hpages at fredhutch.org 
>>> <mailto:hpages at fredhutch.org><mailto:hpages at fredhutch.org>> wrote:
>>>
>>> Hi Johannes, Sonali,
>>>
>>> On 04/10/2015 09:40 AM, Arora, Sonali wrote:
>>> Hi Rainer,
>>>
>>> Just to be clear - what do you want to be available from AnnotationHub()
>>> in the end?
>>>
>>> Currently the GTF files from Ensembl are already present inside the
>>> AnnotationHub
>>>
>>> library(AnnotationHub)
>>> ah = AnnotationHub()
>>> gtf <- query(ah, "GTF")
>>> gtf <- query(gtf, "Ensembl")
>>> gtf[1]
>>> gtf[[1]] # returned to you as  GenomicRanges object.
>>>
>>> - why not get the GTF files directly from AnnotationHub instead of
>>> getting them from the ftp site? Then you can make your EnsDb classes
>>> from these GRanges.
>>> It will also make your recipe faster because you will not have to
>>> download the file and parse the object.
>>>
>>> A GRanges object is not the same as a GTF file and I guess Johannes
>>> wants access to the GTF file. Are these GTF files available on
>>> AnnotationHub?
>>>
>>>
>>> yes, you're right. I wanted access to the GTF file and most likely 
>>> understood the AnnotationHub idea wrong... my idea was to build a 
>>> recipe that takes as input the GTF file (as the 
>>> makeEnsemblGtfToGRanges) and generates from that the EnsDb SQLite 
>>> database file. I thought that these SQLite files would be generated 
>>> on the fly on the user's computer, but I guess that stuff is 
>>> processed once and stored on your servers, right?
>> Hi Johannes,
>>
>> So you have several options actually.  We sometimes store the files in
>> S3 and then send them down/cache them as requested and other times the
>> hub can just point to an existing ftp site (and files get
>> transformed/cached on the fly when users ask for them).  So you have
>> three choices here:
>>
>> 1) You could just write a function that takes in one of the processed
>> GRanges objects and transforms it into an EnsDb object. This should be
>> straightforward and is probably your easiest option since you won't have
>> to write a recipe OR have any code included into the AnnotationHub.  You
>> can basically just take advantage of the fact that these data are
>> already there in the hub waiting to be used.
>> 2) You could write R code that transforms a GTF file into a sqlite file
>> and ALSO a recipe to call that (and create metadata) for all the GTF
>> files.  This will be more work than #1 since you will have to write both
>> a recipe and port any code that you have for generating the DB files.
>> But when you are done you would be able to have your resources come
>> right out of the AnnotationHub.
>> 3) You could write R code to process a GRanges object into an EnsDb
>> object and then also write a recipe so that your data resources can be
>> served up directly from the AnnotationHub, but still take advantage of
>> what is already there (GRanges).  No new data would need to be added to
>> the hub since new metadata records could allow users to transform the
>> data into EnsDb objects on the fly.  This is an elegant solution, but it
>> will still take more effort than option #1.
>>
>> If I were you, I would start with option #1.  That way if (after I got
>> that working) I still wanted things to be more elegant, then I could
>> then add a recipe (thus evolving the strategy into option #3...
>>
>>
>>  Marc
>>
>>
>>
>>>
>>> @Johannes - Here is one alternative: You could take a different approach
>>> and implement some equivalent of makeTxDbFromGRanges() for EnsDb
>>> objects. So people could just do:
>>>
>>>  library(ensembldb)
>>>  ensdb <- makeEnsDbFromGRanges(gtf[[1]])
>>>
>>> like they can do right now with makeTxDbFromGRanges():
>>>
>>>  library(GenomicFeatures)
>>>  txdb <- makeTxDbFromGRanges(gtf[[1]])
>>>
>>> That way you don't need a recipe or try to add things to 
>>> AnnotationHub at all.
>>>
>>>
>>> that's a good idea, I will implement that too. just want to make 
>>> sure that I can get all data I'll need (also the genome build 
>>> version, Ensembl version etc from the GRanges, most likely I have to 
>>> guess that from the file name of the RData file).
>>>
>>> @Sonali - These GRanges objects I get from AnnotationHub have no genome
>>> information and their seqlevels are not sorted:
>>>
>>>> seqinfo(gtf[[1]])
>>>  Seqinfo object with 22 sequences from an unspecified genome; no 
>>> seqlengths:
>>>    seqnames seqlengths isCircular genome
>>>    X              <NA>       <NA>   <NA>
>>>    9              <NA>       <NA>   <NA>
>>>    8              <NA>       <NA>   <NA>
>>>    7              <NA>       <NA>   <NA>
>>>    6              <NA>       <NA>   <NA>
>>>    ...             ...        ...    ...
>>>    12             <NA>       <NA>   <NA>
>>>    11             <NA>       <NA>   <NA>
>>>    10             <NA>       <NA>   <NA>
>>>    1              <NA>       <NA>   <NA>
>>>    MT             <NA>       <NA>   <NA>
>>>
>>> I know it's easy enough to sort the seqlevels with sortSeqlevels() but
>>> what about having these things done by the recipe instead?
>>>
>>>
>>> I also have a suggestion there: what if you used also the 
>>> fetchChromLengthsFromEnsembl from the GenomicFeatures package? the 
>>> GTF files are anyway from Ensembl, so getting the seqinfo from there 
>>> would make sense... and I wouldn't have to fetch it separately to 
>>> build the EnsDb.
>>>
>>> thanks!
>>> jo
>>>
>>> Thanks,
>>> H.
>>>
>>>
>>>
>>> Thanks,
>>> Sonali.
>>>
>>>
>>> On 4/9/2015 11:14 PM, Rainer Johannes wrote:
>>> dear all,
>>>
>>> I have added a recipe to the AnnotationHubData to provide EnsDb
>>> classes (from my ensembldb package) based on GTF files from Ensembl.
>>> Now, after adding the recipe to the AnnotationHubData package and
>>> installing it (following the vignettes from the AnnotationHub and
>>> AnnotationHubData) I called
>>>
>>> updateResources(AnnotationHubRoot=getWd(), BiocVersion=biocVersion(),
>>> preparerClasses="EnsemblGtfToEnsDbPreparer", insert=FALSE,
>>> metadataOnly=TRUE)
>>>
>>> and got the output:
>>>
>>> Ailuropoda_melanoleuca.ailMel1.78.gtf.gz
>>> Anas_platyrhynchos.BGI_duck_1.0.78.gtf.gz
>>> Anolis_carolinensis.AnoCar2.0.78.gtf.gz
>>> Astyanax_mexicanus.AstMex102.78.gtf.gz
>>> Bos_taurus.UMD3.1.78.gtf.gz
>>> Caenorhabditis_elegans.WBcel235.78.gtf.gz
>>> Callithrix_jacchus.C_jacchus3.2.1.78.gtf.gz
>>> Canis_familiaris.CanFam3.1.78.gtf.gz
>>> Cavia_porcellus.cavPor3.78.gtf.gz
>>> Chlorocebus_sabaeus.ChlSab1.1.78.gtf.gz
>>> Choloepus_hoffmanni.choHof1.78.gtf.gz
>>> Ciona_intestinalis.KH.78.gtf.gz
>>> Ciona_savignyi.CSAV2.0.78.gtf.gz
>>> Danio_rerio.Zv9.78.gtf.gz
>>> Dasypus_novemcinctus.Dasnov3.0.78.gtf.gz
>>> Dipodomys_ordii.dipOrd1.78.gtf.gz
>>> Drosophila_melanogaster.BDGP5.78.gtf.gz
>>> Error in function (type, msg, asError = TRUE)  : Access denied: 530
>>>
>>> I guess that must be related to the Ensembl ftp? Is anybody else
>>> experiencing this error?
>>>
>>> cheers, jo
>>>
>>>
>>> my session info:
>>>
>>> sessionInfo()
>>> R Under development (unstable) (2015-03-04 r67940)
>>> Platform: x86_64-apple-darwin14.3.0/x86_64 (64-bit)
>>> Running under: OS X 10.10.3 (Yosemite)
>>>
>>> locale:
>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>
>>> attached base packages:
>>> [1] parallel  stats4    stats     graphics  grDevices utils     datasets
>>> [8] methods   base
>>>
>>> other attached packages:
>>> [1] AnnotationHubData_0.0.205 futile.logger_1.4
>>> [3] AnnotationHub_1.99.81     GenomicRanges_1.19.52
>>> [5] GenomeInfoDb_1.3.16       IRanges_2.1.43
>>> [7] S4Vectors_0.5.22          BiocGenerics_0.13.11
>>>
>>> loaded via a namespace (and not attached):
>>>  [1] Rcpp_0.11.5                  BiocInstaller_1.17.7
>>>  [3] XVector_0.7.4                futile.options_1.0.0
>>>  [5] GenomicFeatures_1.19.37      bitops_1.0-6
>>>  [7] tools_3.2.0                  zlibbioc_1.13.3
>>>  [9] biomaRt_2.23.5               digest_0.6.8
>>> [11] BSgenome_1.35.20             jsonlite_0.9.15
>>> [13] RSQLite_1.0.0                shiny_0.11.1
>>> [15] DBI_0.3.1                    rtracklayer_1.27.11
>>> [17] httr_0.6.1                   stringr_0.6.2
>>> [19] Biostrings_2.35.12           Biobase_2.27.3
>>> [21] R6_2.0.1                     AnnotationDbi_1.29.21
>>> [23] XML_3.98-1.1                 BiocParallel_1.1.24
>>> [25] RJSONIO_1.3-0                ensembldb_0.99.15
>>> [27] lambda.r_1.1.7               Rsamtools_1.19.50
>>> [29] htmltools_0.2.6              GenomicAlignments_1.3.34
>>> [31] AnnotationForge_1.9.7        mime_0.3
>>> [33] interactiveDisplayBase_1.5.6 xtable_1.7-4
>>> [35] httpuv_1.3.2                 RCurl_1.95-4.5
>>> [37] VariantAnnotation_1.13.47
>>> _______________________________________________
>>> Bioc-devel at r-project.org 
>>> <mailto:Bioc-devel at r-project.org><mailto:Bioc-devel at r-project.org> 
>>> mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>> --
>>> Herv� Pag�s
>>>
>>> Program in Computational Biology
>>> Division of Public Health Sciences
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N, M1-B514
>>> P.O. Box 19024
>>> Seattle, WA 98109-1024
>>>
>>> E-mail: hpages at fredhutch.org<mailto:hpages at fredhutch.org>
>>> Phone:  (206) 667-5791
>>> Fax:    (206) 667-1319
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

	[[alternative HTML version deleted]]