[BioC] GEOquery error
James W. MacDonald
jmacdon at uw.edu
Fri May 2 20:00:51 CEST 2014
After some further testing, it doesn't appear to be an ftp problem
directly, and comes down to the getURL() step in getDirectoryListing():
>
GEOquery:::getDirListing("ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE9nnn/GSE9514/matrix/")
ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE9nnn/GSE9514/matrix/
Error in function (type, msg, asError = TRUE) : couldn't connect to host
But this works with other FTP sites, such as in R's internet test file:
> GEOquery:::getDirListing("ftp://ftp.stats.ox.ac.uk/pub/datasets/csb/")
ftp://ftp.stats.ox.ac.uk/pub/datasets/csb/
[1] "HEADER.html" "ch10.dat" "ch10.sas" "ch10.txt"
"ch11a.dat" "ch11a.sas" "ch11a.txt" "ch11b.dat"
[9] "ch11b.sas" "ch11b.txt" "ch12.dat.gz" "ch12.sas"
"ch12.txt" "ch13.dat.gz" "ch13.sas" "ch13.txt"
[17] "ch14.dat" "ch14.sas" "ch14.txt" "ch15.dat.gz"
"ch15.sas" "ch15.txt" "ch16a.dat" "ch16a.sas"
[25] "ch16a.txt" "ch16b.dat" "ch16b.sas" "ch16b.txt"
"ch17.dat" "ch17.sas" "ch17.txt" "ch18a.dat"
[33] "ch18a.sas" "ch18a.txt" "ch18b.dat.gz" "ch18b.sas"
"ch18b.txt" "ch19.sas" "ch19.txt" "ch19a.dat.gz"
[41] "ch19b.dat.gz" "ch19c.dat.gz" "ch19d.dat.gz" "ch19e.dat.gz"
"ch19f.dat.gz" "ch19g.dat.gz" "ch1a.dat" "ch1a.sas"
[49] "ch1a.txt" "ch1b.dat" "ch1b.sas" "ch1b.txt"
"ch2.dat" "ch2.sas" "ch2.txt" "ch20.dat.gz"
[57] "ch20.sas" "ch20.txt" "ch21a.dat.gz" "ch21a.sas"
"ch21a.txt" "ch21b.dat.gz" "ch21b.sas" "ch21b.txt"
[65] "ch3a.dat" "ch3a.sas" "ch3a.txt" "ch3b.dat"
"ch3b.sas" "ch3b.txt" "ch4a.dat" "ch4a.sas"
[73] "ch4a.txt" "ch4b.dat" "ch4b.sas" "ch4b.txt"
"ch5.dat.gz" "ch5.sas" "ch5.txt" "ch6.dat"
[81] "ch6.sas" "ch6.txt" "ch7.dat.gz" "ch7.sas"
"ch7.txt" "ch8.dat" "ch8.sas" "ch8.txt"
[89] "ch9.dat.gz" "ch9.sas" "ch9.txt" "index.html"
or Ensembl:
> GEOquery:::getDirListing("ftp://ftp.ensembl.org")
ftp://ftp.ensembl.org
[1] "ls-lR.gz" "ls-lR.Z" "pub"
"quota.group" "quota.user"
[6] "update-sym-links" "update-sym-links_orig"
or other random US government ftp sites:
> GEOquery:::getDirListing("ftp://ftp.wcc.nrcs.usda.gov")
ftp://ftp.wcc.nrcs.usda.gov
[1] "BB_Test" "data" "downloads" "fieldops"
"gis" "images" "pub" "publications"
[9] "snowschool" "states" "support" "tmp" "watershed"
"wcs_info" "welcome.msg" "wntsc"
So I wonder if it is a change at NCBI?
Best,
Jim
On 5/2/2014 1:15 PM, James W. MacDonald wrote:
> Hi Sean,
>
> This all works on Linux, and obviously on MacOS for you, but on
> Windows 7, not so much:
>
> > gpl <- getGEO("GPL90")
> File stored at:
> C:\Users\BIOINF~1\AppData\Local\Temp\Rtmp4UPr1i/GPL90.soft
> Warning message:
> In download.file(myurl, destfile, mode = mode, quiet = TRUE, method =
> getOption("download.file.method.GEOquery")) :
> downloaded length 9476281 != reported length 200
>
> But the gpl object looks OK, so I guess the reported length is wrong.
>
> > geoq <- getGEO("GSE9514", GSEMatrix = FALSE)
> File stored at:
> C:\Users\BIOINF~1\AppData\Local\Temp\Rtmp4UPr1i/GSE9514.soft.gz
> Parsing....
> Found 9 entities...
> GPL90 (1 of 9 entities)
> GSM241146 (2 of 9 entities)
> GSM241147 (3 of 9 entities)
> GSM241148 (4 of 9 entities)
> GSM241149 (5 of 9 entities)
> GSM241150 (6 of 9 entities)
> GSM241151 (7 of 9 entities)
> GSM241152 (8 of 9 entities)
> GSM241153 (9 of 9 entities)
> There were 50 or more warnings (use warnings() to see the first 50)
>
> > geoq <- getGEO("GSE9514")
> ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE9nnn/GSE9514/matrix/
> Error in function (type, msg, asError = TRUE) : couldn't connect to host
>
> > setInternet2(use=FALSE)
> > geoq <- getGEO("GSE9514")
> ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE9nnn/GSE9514/matrix/
> Error in function (type, msg, asError = TRUE) :
> Server denied you to change to the given directory
>
> Any suggestions? I can't find anything on the list archives that
> helps. I am thinking it has something to do with Windows Firewall, as
> I can get to
>
> http://ftp.ncbi.nlm.nih.gov/geo/series/GSE9nnn/GSE9514/matrix/
>
> using a browser, but not
>
> ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE9nnn/GSE9514/matrix/
>
> but setting up a specific rule under Windows Firewall to allow R.exe
> ftp access doesn't seem to help.
>
> Best,
>
> Jim
>
>
>
>
> On 5/2/2014 12:20 PM, Sean Davis wrote:
>> Hi, again, James.
>>
>> NCBI is still checking into the issue (may have been a storm-related
>> issue), but your (simplified) example now works for me.
>>
>>> gpl = getGEO('GPL90')
>> File stored at:
>> /var/folders/21/8t47kwys6vqb8606kdfn71780000gn/T//RtmpQXZfrr/GPL90.soft
>>> sessionInfo()
>> R version 3.0.2 Patched (2014-01-22 r64855)
>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] parallel stats graphics grDevices utils datasets methods
>> [8] base
>>
>> other attached packages:
>> [1] GEOquery_2.28.0 Biobase_2.21.7 BiocGenerics_0.7.5
>> [4] BiocInstaller_1.12.0
>>
>> loaded via a namespace (and not attached):
>> [1] RCurl_1.95-4.1 XML_3.95-0.2
>>
>>
>> Sean
>>
>> On Thu, May 1, 2014 at 1:11 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>>> Hi, James.
>>>
>>> Thanks for the report. This is due to a change at NCBI. I am
>>> checking with them to see if the change is meant to be permanent or is
>>> simply a transient issue. I'll let everyone know as soon as I hear
>>> back from NCBI.
>>>
>>> Sean
>>>
>>>
>>> On Thu, May 1, 2014 at 9:19 AM, James W. MacDonald <jmacdon at uw.edu>
>>> wrote:
>>>> Hi Sean,
>>>>
>>>>> geoq <- getGEO("GSE9514")
>>>> ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE9nnn/GSE9514/matrix/
>>>> Found 1 file(s)
>>>> GSE9514_series_matrix.txt.gz
>>>> % Total % Received % Xferd Average Speed Time Time Time
>>>> Current
>>>> Dload Upload Total Spent Left
>>>> Speed
>>>> 100 378k 100 378k 0 0 204k 0 0:00:01 0:00:01
>>>> --:--:--
>>>> 204k
>>>> File stored at:
>>>> /data3/tmp/RtmpkDXZzR/GPL90.soft
>>>> Error in xj[i] : only 0's may be mixed with negative subscripts
>>>>
>>>> And the error appears to come from this section in parseGPL():
>>>>
>>>> if (hasDataTable) {
>>>> nLinesToRead <- NULL
>>>> if (!is.null(n)) {
>>>> nLinesToRead <- n - length(txt)
>>>> }
>>>> dat3 <- fastTabRead(con, n = nLinesToRead, quote = "")
>>>> geoDataTable <- new("GEODataTable", columns = cols, table =
>>>> dat3[1:(nrow(dat3) -
>>>> 1), ])
>>>> }
>>>>
>>>> Where there is no error trapping for the case that fastTabRead
>>>> returns a
>>>> zero row data.frame:
>>>>
>>>> debug: dat3 <- fastTabRead(con, n = nLinesToRead, quote = "")
>>>> Browse[3]> dim(dat3)
>>>> [1] 0 17
>>>> Browse[3]> dat3
>>>> [1] ID ORF
>>>> [3] SPOT_ID Species Scientific Name
>>>> [5] Annotation Date Sequence Type
>>>> [7] Sequence Source Target Description
>>>> [9] Representative Public ID Gene Title
>>>> [11] Gene Symbol ENTREZ_GENE_ID
>>>> [13] RefSeq Transcript ID SGD accession number
>>>> [15] Gene Ontology Biological Process Gene Ontology Cellular Component
>>>> [17] Gene Ontology Molecular Function
>>>> <0 rows> (or 0-length row.names)
>>>>
>>>> Best,
>>>>
>>>> Jim
>>>>
>>>> --
>>>> James W. MacDonald, M.S.
>>>> Biostatistician
>>>> University of Washington
>>>> Environmental and Occupational Health Sciences
>>>> 4225 Roosevelt Way NE, # 100
>>>> Seattle WA 98105-6099
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list