[BioC] GEOquery error

James W. MacDonald jmacdon at uw.edu
Fri May 2 19:15:36 CEST 2014


Hi Sean,

This all works on Linux, and obviously on MacOS for you, but on Windows 
7, not so much:

 > gpl <- getGEO("GPL90")
File stored at:
C:\Users\BIOINF~1\AppData\Local\Temp\Rtmp4UPr1i/GPL90.soft
Warning message:
In download.file(myurl, destfile, mode = mode, quiet = TRUE, method = 
getOption("download.file.method.GEOquery")) :
   downloaded length 9476281 != reported length 200

But the gpl object looks OK, so I guess the reported length is wrong.

 > geoq <- getGEO("GSE9514", GSEMatrix = FALSE)
File stored at:
C:\Users\BIOINF~1\AppData\Local\Temp\Rtmp4UPr1i/GSE9514.soft.gz
Parsing....
Found 9 entities...
GPL90 (1 of 9 entities)
GSM241146 (2 of 9 entities)
GSM241147 (3 of 9 entities)
GSM241148 (4 of 9 entities)
GSM241149 (5 of 9 entities)
GSM241150 (6 of 9 entities)
GSM241151 (7 of 9 entities)
GSM241152 (8 of 9 entities)
GSM241153 (9 of 9 entities)
There were 50 or more warnings (use warnings() to see the first 50)

 > geoq <- getGEO("GSE9514")
ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE9nnn/GSE9514/matrix/
Error in function (type, msg, asError = TRUE)  : couldn't connect to host

 > setInternet2(use=FALSE)
 > geoq <- getGEO("GSE9514")
ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE9nnn/GSE9514/matrix/
Error in function (type, msg, asError = TRUE)  :
   Server denied you to change to the given directory

Any suggestions? I can't find anything on the list archives that helps. 
I am thinking it has something to do with Windows Firewall, as I can get to

http://ftp.ncbi.nlm.nih.gov/geo/series/GSE9nnn/GSE9514/matrix/

using a browser, but not

ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE9nnn/GSE9514/matrix/

but setting up a specific rule under Windows Firewall to allow R.exe ftp 
access doesn't seem to help.

Best,

Jim




On 5/2/2014 12:20 PM, Sean Davis wrote:
> Hi, again, James.
>
> NCBI is still checking into the issue (may have been a storm-related
> issue), but your (simplified) example now works for me.
>
>> gpl = getGEO('GPL90')
> File stored at:
> /var/folders/21/8t47kwys6vqb8606kdfn71780000gn/T//RtmpQXZfrr/GPL90.soft
>> sessionInfo()
> R version 3.0.2 Patched (2014-01-22 r64855)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
> [1] GEOquery_2.28.0      Biobase_2.21.7       BiocGenerics_0.7.5
> [4] BiocInstaller_1.12.0
>
> loaded via a namespace (and not attached):
> [1] RCurl_1.95-4.1 XML_3.95-0.2
>
>
> Sean
>
> On Thu, May 1, 2014 at 1:11 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>> Hi, James.
>>
>> Thanks for the report.  This is due to a change at NCBI.  I am
>> checking with them to see if the change is meant to be permanent or is
>> simply a transient issue.  I'll let everyone know as soon as I hear
>> back from NCBI.
>>
>> Sean
>>
>>
>> On Thu, May 1, 2014 at 9:19 AM, James W. MacDonald <jmacdon at uw.edu> wrote:
>>> Hi Sean,
>>>
>>>> geoq <- getGEO("GSE9514")
>>> ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE9nnn/GSE9514/matrix/
>>> Found 1 file(s)
>>> GSE9514_series_matrix.txt.gz
>>>    % Total    % Received % Xferd  Average Speed   Time    Time Time  Current
>>>                                   Dload  Upload   Total   Spent Left  Speed
>>> 100  378k  100  378k    0     0   204k      0  0:00:01  0:00:01 --:--:--
>>> 204k
>>> File stored at:
>>> /data3/tmp/RtmpkDXZzR/GPL90.soft
>>> Error in xj[i] : only 0's may be mixed with negative subscripts
>>>
>>> And the error appears to come from this section in parseGPL():
>>>
>>> if (hasDataTable) {
>>>          nLinesToRead <- NULL
>>>          if (!is.null(n)) {
>>>              nLinesToRead <- n - length(txt)
>>>          }
>>>          dat3 <- fastTabRead(con, n = nLinesToRead, quote = "")
>>>          geoDataTable <- new("GEODataTable", columns = cols, table =
>>> dat3[1:(nrow(dat3) -
>>>              1), ])
>>>      }
>>>
>>> Where there is no error trapping for the case that fastTabRead returns a
>>> zero row data.frame:
>>>
>>> debug: dat3 <- fastTabRead(con, n = nLinesToRead, quote = "")
>>> Browse[3]> dim(dat3)
>>> [1]  0 17
>>> Browse[3]> dat3
>>>   [1] ID ORF
>>>   [3] SPOT_ID                          Species Scientific Name
>>>   [5] Annotation Date                  Sequence Type
>>>   [7] Sequence Source                  Target Description
>>>   [9] Representative Public ID         Gene Title
>>> [11] Gene Symbol ENTREZ_GENE_ID
>>> [13] RefSeq Transcript ID             SGD accession number
>>> [15] Gene Ontology Biological Process Gene Ontology Cellular Component
>>> [17] Gene Ontology Molecular Function
>>> <0 rows> (or 0-length row.names)
>>>
>>> Best,
>>>
>>> Jim
>>>
>>> --
>>> James W. MacDonald, M.S.
>>> Biostatistician
>>> University of Washington
>>> Environmental and Occupational Health Sciences
>>> 4225 Roosevelt Way NE, # 100
>>> Seattle WA 98105-6099
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list