[BioC] GEOquery error
James W. MacDonald
jmacdon at uw.edu
Fri May 2 19:15:36 CEST 2014
Hi Sean,
This all works on Linux, and obviously on MacOS for you, but on Windows
7, not so much:
> gpl <- getGEO("GPL90")
File stored at:
C:\Users\BIOINF~1\AppData\Local\Temp\Rtmp4UPr1i/GPL90.soft
Warning message:
In download.file(myurl, destfile, mode = mode, quiet = TRUE, method =
getOption("download.file.method.GEOquery")) :
downloaded length 9476281 != reported length 200
But the gpl object looks OK, so I guess the reported length is wrong.
> geoq <- getGEO("GSE9514", GSEMatrix = FALSE)
File stored at:
C:\Users\BIOINF~1\AppData\Local\Temp\Rtmp4UPr1i/GSE9514.soft.gz
Parsing....
Found 9 entities...
GPL90 (1 of 9 entities)
GSM241146 (2 of 9 entities)
GSM241147 (3 of 9 entities)
GSM241148 (4 of 9 entities)
GSM241149 (5 of 9 entities)
GSM241150 (6 of 9 entities)
GSM241151 (7 of 9 entities)
GSM241152 (8 of 9 entities)
GSM241153 (9 of 9 entities)
There were 50 or more warnings (use warnings() to see the first 50)
> geoq <- getGEO("GSE9514")
ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE9nnn/GSE9514/matrix/
Error in function (type, msg, asError = TRUE) : couldn't connect to host
> setInternet2(use=FALSE)
> geoq <- getGEO("GSE9514")
ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE9nnn/GSE9514/matrix/
Error in function (type, msg, asError = TRUE) :
Server denied you to change to the given directory
Any suggestions? I can't find anything on the list archives that helps.
I am thinking it has something to do with Windows Firewall, as I can get to
http://ftp.ncbi.nlm.nih.gov/geo/series/GSE9nnn/GSE9514/matrix/
using a browser, but not
ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE9nnn/GSE9514/matrix/
but setting up a specific rule under Windows Firewall to allow R.exe ftp
access doesn't seem to help.
Best,
Jim
On 5/2/2014 12:20 PM, Sean Davis wrote:
> Hi, again, James.
>
> NCBI is still checking into the issue (may have been a storm-related
> issue), but your (simplified) example now works for me.
>
>> gpl = getGEO('GPL90')
> File stored at:
> /var/folders/21/8t47kwys6vqb8606kdfn71780000gn/T//RtmpQXZfrr/GPL90.soft
>> sessionInfo()
> R version 3.0.2 Patched (2014-01-22 r64855)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] GEOquery_2.28.0 Biobase_2.21.7 BiocGenerics_0.7.5
> [4] BiocInstaller_1.12.0
>
> loaded via a namespace (and not attached):
> [1] RCurl_1.95-4.1 XML_3.95-0.2
>
>
> Sean
>
> On Thu, May 1, 2014 at 1:11 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>> Hi, James.
>>
>> Thanks for the report. This is due to a change at NCBI. I am
>> checking with them to see if the change is meant to be permanent or is
>> simply a transient issue. I'll let everyone know as soon as I hear
>> back from NCBI.
>>
>> Sean
>>
>>
>> On Thu, May 1, 2014 at 9:19 AM, James W. MacDonald <jmacdon at uw.edu> wrote:
>>> Hi Sean,
>>>
>>>> geoq <- getGEO("GSE9514")
>>> ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE9nnn/GSE9514/matrix/
>>> Found 1 file(s)
>>> GSE9514_series_matrix.txt.gz
>>> % Total % Received % Xferd Average Speed Time Time Time Current
>>> Dload Upload Total Spent Left Speed
>>> 100 378k 100 378k 0 0 204k 0 0:00:01 0:00:01 --:--:--
>>> 204k
>>> File stored at:
>>> /data3/tmp/RtmpkDXZzR/GPL90.soft
>>> Error in xj[i] : only 0's may be mixed with negative subscripts
>>>
>>> And the error appears to come from this section in parseGPL():
>>>
>>> if (hasDataTable) {
>>> nLinesToRead <- NULL
>>> if (!is.null(n)) {
>>> nLinesToRead <- n - length(txt)
>>> }
>>> dat3 <- fastTabRead(con, n = nLinesToRead, quote = "")
>>> geoDataTable <- new("GEODataTable", columns = cols, table =
>>> dat3[1:(nrow(dat3) -
>>> 1), ])
>>> }
>>>
>>> Where there is no error trapping for the case that fastTabRead returns a
>>> zero row data.frame:
>>>
>>> debug: dat3 <- fastTabRead(con, n = nLinesToRead, quote = "")
>>> Browse[3]> dim(dat3)
>>> [1] 0 17
>>> Browse[3]> dat3
>>> [1] ID ORF
>>> [3] SPOT_ID Species Scientific Name
>>> [5] Annotation Date Sequence Type
>>> [7] Sequence Source Target Description
>>> [9] Representative Public ID Gene Title
>>> [11] Gene Symbol ENTREZ_GENE_ID
>>> [13] RefSeq Transcript ID SGD accession number
>>> [15] Gene Ontology Biological Process Gene Ontology Cellular Component
>>> [17] Gene Ontology Molecular Function
>>> <0 rows> (or 0-length row.names)
>>>
>>> Best,
>>>
>>> Jim
>>>
>>> --
>>> James W. MacDonald, M.S.
>>> Biostatistician
>>> University of Washington
>>> Environmental and Occupational Health Sciences
>>> 4225 Roosevelt Way NE, # 100
>>> Seattle WA 98105-6099
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list