[Bioc-devel] AnnotationHubData Error: Access denied: 530

Arora, Sonali sarora at fredhutch.org
Thu Apr 16 23:49:40 CEST 2015


Hi Johannes,

I have updated the GRanges for all the GTF files.


 > library(AnnotationHub)
 > hubCache(ah)
[1] "/home/sarora/.AnnotationHub"   ## delete everything from this folder
 > ah = AnnotationHub()
retrieving 1 resources
|======================================================================| 
100%
 > rat <- query(ah, c("gtf", "69", "rattus"))
 > rat[1]
AnnotationHub with 1 record
# snapshotDate(): 2015-03-26
# names(): AH7522
# $dataprovider: Ensembl
# $species: Rattus norvegicus
# $rdataclass: GRanges
# $title: Rattus_norvegicus.RGSC3.4.69.gtf
# $description: Gene Annotation for Rattus norvegicus
# $taxonomyid: 10116
# $genome: RGSC3.4
# $sourcetype: GTF
# $sourceurl: 
ftp://ftp.ensembl.org/pub/release-69/gtf/rattus_norvegicus/Rat...
# $sourcelastmodifieddate: 2012-10-19
# $sourcesize: 8485113
# $tags: GTF, ensembl, Gene, Transcript, Annotation
# retrieve record with 'object[["AH7522"]]'
 > ra <- rat[[1]]
require(“GenomicRanges”)
retrieving 1 resources
|======================================================================| 
100%
 > seqinfo(ra)
Seqinfo object with 22 sequences (1 circular) from RGSC3 genome:
   seqnames seqlengths isCircular genome
   1         267910886      FALSE  RGSC3
   2         258207540      FALSE  RGSC3
   3         171063335      FALSE  RGSC3
   4         187126005      FALSE  RGSC3
   5         173096209      FALSE  RGSC3
   ...             ...        ...    ...
   18         87265094      FALSE  RGSC3
   19         59218465      FALSE  RGSC3
   20         55268282      FALSE  RGSC3
   X         160699376      FALSE  RGSC3
   MT            16313       TRUE  RGSC3
 > seqlevels(ra)
  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" 
"14" "15"
[16] "16" "17" "18" "19" "20" "X"  "MT"
 > genome(ra)
       1       2       3       4       5       6       7 8       9      10
"RGSC3" "RGSC3" "RGSC3" "RGSC3" "RGSC3" "RGSC3" "RGSC3" "RGSC3" "RGSC3" 
"RGSC3"
      11      12      13      14      15      16      17      18 19      20
"RGSC3" "RGSC3" "RGSC3" "RGSC3" "RGSC3" "RGSC3" "RGSC3" "RGSC3" "RGSC3" 
"RGSC3"
       X      MT
"RGSC3" "RGSC3"

Hope that helps!
Sonali.

On 4/13/2015 10:54 AM, Marc Carlson wrote:
> Hi Johannes,
>
> We are already planning to upgrade those objects to have that
> information when they are downloaded...  Sonali is actually working on
> that right now.  She will probably have updated that information by the
> end of the week or so.  It's a lot of files to update, but this is
> already in progress.
>
> So if you are willing to wait a few days you can probably save yourself
> some headaches...
>
>
>    Marc
>
>
>
>
> On 04/11/2015 12:13 PM, Rainer Johannes wrote:
>> Hi Marc,
>>
>> you're right. I'll start with option 1. For that it would however be
>> really nice to have the seqinfo available in the GRanges as mentioned
>> in my previous mail. In the meantime I'll try to fetch the chrom
>> lengths myself but would be nice to have all that ready in the GRanges
>> at some point.
>>
>> cheers, jo
>>
>>> On 11 Apr 2015, at 00:54, Marc Carlson <mcarlson at fredhutch.org
>>> <mailto:mcarlson at fredhutch.org>> wrote:
>>>
>>> On 04/10/2015 12:18 PM, Rainer Johannes wrote:
>>>> dear Sonali, Herve,
>>>>
>>>> On 10 Apr 2015, at 19:59, Herv� Pag�s <hpages at fredhutch.org
>>>> <mailto:hpages at fredhutch.org><mailto:hpages at fredhutch.org>> wrote:
>>>>
>>>> Hi Johannes, Sonali,
>>>>
>>>> On 04/10/2015 09:40 AM, Arora, Sonali wrote:
>>>> Hi Rainer,
>>>>
>>>> Just to be clear - what do you want to be available from AnnotationHub()
>>>> in the end?
>>>>
>>>> Currently the GTF files from Ensembl are already present inside the
>>>> AnnotationHub
>>>>
>>>> library(AnnotationHub)
>>>> ah = AnnotationHub()
>>>> gtf <- query(ah, "GTF")
>>>> gtf <- query(gtf, "Ensembl")
>>>> gtf[1]
>>>> gtf[[1]] # returned to you as  GenomicRanges object.
>>>>
>>>> - why not get the GTF files directly from AnnotationHub instead of
>>>> getting them from the ftp site? Then you can make your EnsDb classes
>>>> from these GRanges.
>>>> It will also make your recipe faster because you will not have to
>>>> download the file and parse the object.
>>>>
>>>> A GRanges object is not the same as a GTF file and I guess Johannes
>>>> wants access to the GTF file. Are these GTF files available on
>>>> AnnotationHub?
>>>>
>>>>
>>>> yes, you're right. I wanted access to the GTF file and most likely
>>>> understood the AnnotationHub idea wrong... my idea was to build a
>>>> recipe that takes as input the GTF file (as the
>>>> makeEnsemblGtfToGRanges) and generates from that the EnsDb SQLite
>>>> database file. I thought that these SQLite files would be generated
>>>> on the fly on the user's computer, but I guess that stuff is
>>>> processed once and stored on your servers, right?
>>> Hi Johannes,
>>>
>>> So you have several options actually.  We sometimes store the files in
>>> S3 and then send them down/cache them as requested and other times the
>>> hub can just point to an existing ftp site (and files get
>>> transformed/cached on the fly when users ask for them).  So you have
>>> three choices here:
>>>
>>> 1) You could just write a function that takes in one of the processed
>>> GRanges objects and transforms it into an EnsDb object. This should be
>>> straightforward and is probably your easiest option since you won't have
>>> to write a recipe OR have any code included into the AnnotationHub.  You
>>> can basically just take advantage of the fact that these data are
>>> already there in the hub waiting to be used.
>>> 2) You could write R code that transforms a GTF file into a sqlite file
>>> and ALSO a recipe to call that (and create metadata) for all the GTF
>>> files.  This will be more work than #1 since you will have to write both
>>> a recipe and port any code that you have for generating the DB files.
>>> But when you are done you would be able to have your resources come
>>> right out of the AnnotationHub.
>>> 3) You could write R code to process a GRanges object into an EnsDb
>>> object and then also write a recipe so that your data resources can be
>>> served up directly from the AnnotationHub, but still take advantage of
>>> what is already there (GRanges).  No new data would need to be added to
>>> the hub since new metadata records could allow users to transform the
>>> data into EnsDb objects on the fly.  This is an elegant solution, but it
>>> will still take more effort than option #1.
>>>
>>> If I were you, I would start with option #1.  That way if (after I got
>>> that working) I still wanted things to be more elegant, then I could
>>> then add a recipe (thus evolving the strategy into option #3...
>>>
>>>
>>>   Marc
>>>
>>>
>>>
>>>> @Johannes - Here is one alternative: You could take a different approach
>>>> and implement some equivalent of makeTxDbFromGRanges() for EnsDb
>>>> objects. So people could just do:
>>>>
>>>>   library(ensembldb)
>>>>   ensdb <- makeEnsDbFromGRanges(gtf[[1]])
>>>>
>>>> like they can do right now with makeTxDbFromGRanges():
>>>>
>>>>   library(GenomicFeatures)
>>>>   txdb <- makeTxDbFromGRanges(gtf[[1]])
>>>>
>>>> That way you don't need a recipe or try to add things to
>>>> AnnotationHub at all.
>>>>
>>>>
>>>> that's a good idea, I will implement that too. just want to make
>>>> sure that I can get all data I'll need (also the genome build
>>>> version, Ensembl version etc from the GRanges, most likely I have to
>>>> guess that from the file name of the RData file).
>>>>
>>>> @Sonali - These GRanges objects I get from AnnotationHub have no genome
>>>> information and their seqlevels are not sorted:
>>>>
>>>>> seqinfo(gtf[[1]])
>>>>   Seqinfo object with 22 sequences from an unspecified genome; no
>>>> seqlengths:
>>>>     seqnames seqlengths isCircular genome
>>>>     X              <NA>       <NA>   <NA>
>>>>     9              <NA>       <NA>   <NA>
>>>>     8              <NA>       <NA>   <NA>
>>>>     7              <NA>       <NA>   <NA>
>>>>     6              <NA>       <NA>   <NA>
>>>>     ...             ...        ...    ...
>>>>     12             <NA>       <NA>   <NA>
>>>>     11             <NA>       <NA>   <NA>
>>>>     10             <NA>       <NA>   <NA>
>>>>     1              <NA>       <NA>   <NA>
>>>>     MT             <NA>       <NA>   <NA>
>>>>
>>>> I know it's easy enough to sort the seqlevels with sortSeqlevels() but
>>>> what about having these things done by the recipe instead?
>>>>
>>>>
>>>> I also have a suggestion there: what if you used also the
>>>> fetchChromLengthsFromEnsembl from the GenomicFeatures package? the
>>>> GTF files are anyway from Ensembl, so getting the seqinfo from there
>>>> would make sense... and I wouldn't have to fetch it separately to
>>>> build the EnsDb.
>>>>
>>>> thanks!
>>>> jo
>>>>
>>>> Thanks,
>>>> H.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>> Sonali.
>>>>
>>>>
>>>> On 4/9/2015 11:14 PM, Rainer Johannes wrote:
>>>> dear all,
>>>>
>>>> I have added a recipe to the AnnotationHubData to provide EnsDb
>>>> classes (from my ensembldb package) based on GTF files from Ensembl.
>>>> Now, after adding the recipe to the AnnotationHubData package and
>>>> installing it (following the vignettes from the AnnotationHub and
>>>> AnnotationHubData) I called
>>>>
>>>> updateResources(AnnotationHubRoot=getWd(), BiocVersion=biocVersion(),
>>>> preparerClasses="EnsemblGtfToEnsDbPreparer", insert=FALSE,
>>>> metadataOnly=TRUE)
>>>>
>>>> and got the output:
>>>>
>>>> Ailuropoda_melanoleuca.ailMel1.78.gtf.gz
>>>> Anas_platyrhynchos.BGI_duck_1.0.78.gtf.gz
>>>> Anolis_carolinensis.AnoCar2.0.78.gtf.gz
>>>> Astyanax_mexicanus.AstMex102.78.gtf.gz
>>>> Bos_taurus.UMD3.1.78.gtf.gz
>>>> Caenorhabditis_elegans.WBcel235.78.gtf.gz
>>>> Callithrix_jacchus.C_jacchus3.2.1.78.gtf.gz
>>>> Canis_familiaris.CanFam3.1.78.gtf.gz
>>>> Cavia_porcellus.cavPor3.78.gtf.gz
>>>> Chlorocebus_sabaeus.ChlSab1.1.78.gtf.gz
>>>> Choloepus_hoffmanni.choHof1.78.gtf.gz
>>>> Ciona_intestinalis.KH.78.gtf.gz
>>>> Ciona_savignyi.CSAV2.0.78.gtf.gz
>>>> Danio_rerio.Zv9.78.gtf.gz
>>>> Dasypus_novemcinctus.Dasnov3.0.78.gtf.gz
>>>> Dipodomys_ordii.dipOrd1.78.gtf.gz
>>>> Drosophila_melanogaster.BDGP5.78.gtf.gz
>>>> Error in function (type, msg, asError = TRUE)  : Access denied: 530
>>>>
>>>> I guess that must be related to the Ensembl ftp? Is anybody else
>>>> experiencing this error?
>>>>
>>>> cheers, jo
>>>>
>>>>
>>>> my session info:
>>>>
>>>> sessionInfo()
>>>> R Under development (unstable) (2015-03-04 r67940)
>>>> Platform: x86_64-apple-darwin14.3.0/x86_64 (64-bit)
>>>> Running under: OS X 10.10.3 (Yosemite)
>>>>
>>>> locale:
>>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>>
>>>> attached base packages:
>>>> [1] parallel  stats4    stats     graphics  grDevices utils     datasets
>>>> [8] methods   base
>>>>
>>>> other attached packages:
>>>> [1] AnnotationHubData_0.0.205 futile.logger_1.4
>>>> [3] AnnotationHub_1.99.81     GenomicRanges_1.19.52
>>>> [5] GenomeInfoDb_1.3.16       IRanges_2.1.43
>>>> [7] S4Vectors_0.5.22          BiocGenerics_0.13.11
>>>>
>>>> loaded via a namespace (and not attached):
>>>>   [1] Rcpp_0.11.5                  BiocInstaller_1.17.7
>>>>   [3] XVector_0.7.4                futile.options_1.0.0
>>>>   [5] GenomicFeatures_1.19.37      bitops_1.0-6
>>>>   [7] tools_3.2.0                  zlibbioc_1.13.3
>>>>   [9] biomaRt_2.23.5               digest_0.6.8
>>>> [11] BSgenome_1.35.20             jsonlite_0.9.15
>>>> [13] RSQLite_1.0.0                shiny_0.11.1
>>>> [15] DBI_0.3.1                    rtracklayer_1.27.11
>>>> [17] httr_0.6.1                   stringr_0.6.2
>>>> [19] Biostrings_2.35.12           Biobase_2.27.3
>>>> [21] R6_2.0.1                     AnnotationDbi_1.29.21
>>>> [23] XML_3.98-1.1                 BiocParallel_1.1.24
>>>> [25] RJSONIO_1.3-0                ensembldb_0.99.15
>>>> [27] lambda.r_1.1.7               Rsamtools_1.19.50
>>>> [29] htmltools_0.2.6              GenomicAlignments_1.3.34
>>>> [31] AnnotationForge_1.9.7        mime_0.3
>>>> [33] interactiveDisplayBase_1.5.6 xtable_1.7-4
>>>> [35] httpuv_1.3.2                 RCurl_1.95-4.5
>>>> [37] VariantAnnotation_1.13.47
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org
>>>> <mailto:Bioc-devel at r-project.org><mailto:Bioc-devel at r-project.org>
>>>> mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>> --
>>>> Herv� Pag�s
>>>>
>>>> Program in Computational Biology
>>>> Division of Public Health Sciences
>>>> Fred Hutchinson Cancer Research Center
>>>> 1100 Fairview Ave. N, M1-B514
>>>> P.O. Box 19024
>>>> Seattle, WA 98109-1024
>>>>
>>>> E-mail: hpages at fredhutch.org<mailto:hpages at fredhutch.org>
>>>> Phone:  (206) 667-5791
>>>> Fax:    (206) 667-1319
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list