[Bioc-devel] AnnotationHubData Error: Access denied: 530

Martin Morgan mtmorgan at fredhutch.org
Fri Apr 17 15:00:51 CEST 2015


On 04/13/2015 02:48 AM, Thomas Maurel wrote:
> Dear Martin,
>
> I have investigated with our Web team and we believe that the command
> attempts to open a number of concurrent sessions in order to download all of
> the files. If that is the case then the problem is that our ftp server is
> configured to limit the number of concurrent sessions per user in order to
> prevent people using scripts to monopolise the server resources (and in some
> cases accidentally DoS attack the server).

Hi Thomas -- thank you for trouble-shooting this.

The code used getURL(url, ...) without specifying a curl= argument. This causes 
a new CURLHandle to be constructed for each call to getURL(). These are closed 
when the garbage collector is run, but that is apparently too infrequent, and 
expensive to run explicitly.

I updated the code to include the argument

   curl=httr::handle_find(url)$handle

which re-uses httr's pool of url-specific handlers hence limiting the number of 
simultaneous open connections. This seems to have been effective.

Thanks again,

Martin


>
> Hope this helps, Regards, Thomas
>> On 10 Apr 2015, at 13:40, Thomas Maurel <maurel at ebi.ac.uk> wrote:
>>
>> Hi Martin,
>>
>>> On 10 Apr 2015, at 13:23, Martin Morgan <mtmorgan at fredhutch.org> wrote:
>>>
>>> On 04/10/2015 04:34 AM, Rainer Johannes wrote:
>>>> hi Martin,
>>>>
>>>> but if that's true, then I will never have a way to test whether the
>>>> recipe actually works, right?
>>>
>>> I guess I don't really know what I'm talking about, and that insert=FALSE
>>> is intended to not actually do the insertion so that the (immediate)
>>> problem is not with AnnotationHubData.
>>>
>>> From the traceback below it seems like the error occurs in calls like the
>>> following
>>>
>>> library(RCurl)
>>> getURL("ftp://ftp.ensembl.org/pub/release-78/gtf/ailuropoda_melanoleuca/
>>> <ftp://ftp.ensembl.org/pub/release-78/gtf/ailuropoda_melanoleuca/>",
>>> dirlistonly=TRUE)
>>>
>>> This seems to sometimes work and sometimes not
>>>
>>>> urls[1]
>>> [1] "ftp://ftp.ensembl.org/pub/release-78/gtf/ailuropoda_melanoleuca/
>>> <ftp://ftp.ensembl.org/pub/release-78/gtf/ailuropoda_melanoleuca/>"
>>>> getURL(urls[1], dirlistonly=TRUE)
>>> [1] "Ailuropoda_melanoleuca.ailMel1.78.gtf.gz\nCHECKSUMS\nREADME\n"
>>>> getURL(urls[1], dirlistonly=TRUE)
>>> [1] "Ailuropoda_melanoleuca.ailMel1.78.gtf.gz\nCHECKSUMS\nREADME\n"
>>>> getURL(urls[1], dirlistonly=TRUE)
>>> Error in function (type, msg, asError = TRUE)  : Access denied: 530
>> You are right, I�ve noticed the same thing. I will investigate and see if
>> there is something wrong with our FTP site machine.
>>
>> Regards, Thomas
>>>
>>>
>>>>
>>>> that's the full traceback:
>>>>
>>>>> updateResources(AnnotationHubRoot=getWd(),
>>>>> BiocVersion=biocVersion(),
>>>> preparerClasses="EnsemblGtfToEnsDbPreparer", insert=FALSE,
>>>> metadataOnly=TRUE) INFO [2015-04-10 13:32:18] Preparer Class:
>>>> EnsemblGtfToEnsDbPreparer Ailuropoda_melanoleuca.ailMel1.78.gtf.gz
>>>> Anas_platyrhynchos.BGI_duck_1.0.78.gtf.gz
>>>> Anolis_carolinensis.AnoCar2.0.78.gtf.gz
>>>> Astyanax_mexicanus.AstMex102.78.gtf.gz Bos_taurus.UMD3.1.78.gtf.gz
>>>> Caenorhabditis_elegans.WBcel235.78.gtf.gz
>>>> Callithrix_jacchus.C_jacchus3.2.1.78.gtf.gz Error in function (type,
>>>> msg, asError = TRUE)  : Access denied: 530
>>>>> traceback()
>>>> 17: fun(structure(list(message = msg, call = sys.call()), class =
>>>> c(typeName, "GenericCurlError", "error", "condition"))) 16: function
>>>> (type, msg, asError = TRUE) { if (!is.character(type)) { i =
>>>> match(type, CURLcodeValues) typeName = if (is.na(i)) character() else
>>>> names(CURLcodeValues)[i] } typeName = gsub("^CURLE_", "", typeName) fun
>>>> = (if (asError) stop else warning) fun(structure(list(message = msg,
>>>> call = sys.call()), class = c(typeName, "GenericCurlError", "error",
>>>> "condition"))) }(67L, "Access denied: 530", TRUE) 15:
>>>> .Call("R_curl_easy_perform", curl, .opts, isProtected, .encoding,
>>>> PACKAGE = "RCurl") 14: curlPerform(curl = curl, .opts = opts, .encoding
>>>> = .encoding) 13: getURL(url, dirlistonly = TRUE) 12:
>>>> strsplit(getURL(url, dirlistonly = TRUE), "\n") 11: (function (url,
>>>> filename, tag, verbose = TRUE) { df2 <- strsplit(getURL(url,
>>>> dirlistonly = TRUE), "\n")[[1]] df2 <- df2[grep(paste0(filename, "$"),
>>>> df2)] drop <- grepl("latest", df2) | grepl("00-", df2) df2 <-
>>>> df2[!drop] df2 <- paste0(url, df2) result <- lapply(df2, function(x) {
>>>> if (verbose) message(basename(x)) tryCatch({ h =
>>>> suppressWarnings(GET(x, config = config(nobody = TRUE, filetime =
>>>> TRUE))) nams <- names(headers(h)) if ("last-modified" %in% nams)
>>>> headers(h)[c("last-modified", "content-length")] else c(`last-modified`
>>>> = NA, `content-length` = NA) }, error = function(err) {
>>>> warning(basename(x), ": ", conditionMessage(err)) list(`last-modified`
>>>> = character(), `content-length` = character()) }) }) size <-
>>>> as.numeric(sapply(result, "[[", "content-length")) date <-
>>>> strptime(sapply(result, "[[", "last-modified"), "%a, %d %b %Y
>>>> %H:%M:%S", tz = "GMT") data.frame(fileurl = url, date, size, genome =
>>>> tag, stringsAsFactors = FALSE) })(dots[[1L]][[8L]], filename =
>>>> dots[[2L]][[1L]], tag = dots[[3L]][[8L]]) 10: mapply(FUN = f, ...,
>>>> SIMPLIFY = FALSE) 9: Map(.ftpFileInfo, urls, filename = "gtf.gz", tag =
>>>> basename(urls)) 8: do.call(rbind, Map(.ftpFileInfo, urls, filename =
>>>> "gtf.gz", tag = basename(urls))) 7:
>>>> .ensemblGtfSourceUrls(.ensemblBaseUrl, justRunUnitTest) 6:
>>>> makeAnnotationHubMetadataFunction(currentMetadata, justRunUnitTest =
>>>> justRunUnitTest, ...) 5: .generalNewResources(importPreparer,
>>>> currentMetadata, makeAnnotationHubMetadataFunction, justRunUnitTest,
>>>> ...) 4: .local(importPreparer, currentMetadata, ...) 3:
>>>> newResources(preparerInstance, listOfExistingResources, justRunUnitTest
>>>> = justRunUnitTest) 2: newResources(preparerInstance,
>>>> listOfExistingResources, justRunUnitTest = justRunUnitTest) 1:
>>>> updateResources(AnnotationHubRoot = getWd(), BiocVersion =
>>>> biocVersion(), preparerClasses = "EnsemblGtfToEnsDbPreparer", insert =
>>>> FALSE, metadataOnly = TRUE)
>>>>>
>>>>
>>>>
>>>>> On 10 Apr 2015, at 13:09, Martin Morgan <mtmorgan at fredhutch.org
>>>>> <mailto:mtmorgan at fredhutch.org <mailto:mtmorgan at fredhutch.org>>>
>>>>> wrote:
>>>>>
>>>>> traceback()
>>>>
>>>
>>>
>>> -- Computational Biology / Fred Hutchinson Cancer Research Center 1100
>>> Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>>>
>>> Location: Arnold Building M1 B861 Phone: (206) 667-2793
>>
>> -- Thomas Maurel Bioinformatician - Ensembl Production Team European
>> Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory
>> Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD United Kingdom
>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________ Bioc-devel at r-project.org
>> mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
> -- Thomas Maurel Bioinformatician - Ensembl Production Team European
> Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory
> Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD United Kingdom
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-devel mailing list