[BioC] GEOquery

Sean Davis sdavis2 at mail.nih.gov
Tue Jul 8 18:28:53 CEST 2008


On Tue, Jul 8, 2008 at 11:56 AM, Harpreet Saini <hs1 at sanger.ac.uk> wrote:
> Here is the output:
>
>> getURL("ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/")
> [1] "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\"
> \"http://www.w3.org/TR/html4/loose.dtd\">\n<!-- HTML listing generated by
> Squid 2.7.STABLE3 -->\n<!-- Tue, 08 Jul 2008 15:54:09 GMT
> -->\n<HTML><HEAD><TITLE>\nFTP Directory:
> ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/\n</TITLE>\n<STYLE
> type=\"text/css\"><!--BODY{background-color:#ffffff;font-family:verdana,sans-serif}--></STYLE>\n</HEAD><BODY>\n<H2>\nFTP
> Directory: <A HREF=\"/\">ftp://ftp.ncbi.nih.gov</A>/<A
> HREF=\"/pub/\">pub</A>/<A HREF=\"/pub/geo/\">geo</A>/<A
> HREF=\"/pub/geo/DATA/\">DATA</A>/<A
> HREF=\"/pub/geo/DATA/SeriesMatrix/\">SeriesMatrix</A>/<A
> HREF=\"/pub/geo/DATA/SeriesMatrix/GSE4201/\">GSE4201</A>/</H2>\n<PRE>\n<A
> HREF=\"../\"><IMG border=\"0\"
> SRC=\"http://cachesrv1a.internal.sanger.ac.uk/squid-internal-static/icons/anthony-dirup.gif\"
> ALT=\"[DIRUP]\"></A> <A HREF=\"../\">Parent Directory</A> \n<A
> HREF=\"GSE4201_series_matrix.txt.gz\"><IMG border=\"0\"
> SRC=\"http://cachesrv1a.internal.sanger.ac.uk/squid-internal-static/icons/anthony-text.gif\"
> ALT=\"[FILE]\"></A> <A
> HREF=\"GSE4201_series_matrix.txt.gz\">GSE4201_series_matrix.txt.gz</A> . .
> Apr 13 05:32    909K\n</PRE>\n<HR noshade
> size=\"1px\">\n<ADDRESS>\nGenerated Tue, 08 Jul 2008 15:54:09 GMT by
> cachesrv1a.internal.sanger.ac.uk
> (squid/2.7.STABLE3)\n</ADDRESS></BODY></HTML>\n"
> Warning messages:
> 1: In if (nchar(val) == nchar(x)) return(NA) :
>  the condition has length > 1 and only the first element will be used
> 2: In if (nchar(val) == nchar(x)) return(NA) :
>  the condition has length > 1 and only the first element will be used

So, this appears to be the problem.  It looks like your proxy is
intercepting the ftp directory listing and converting it to HTML.  I
do not know how to solve this problem, as it appears to be a proxy
configuration issue at your institution.  However, I can't say for
sure.  The output of the getURL() command should look like:

> getURL("ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/")
[1] "-r--r--r--   1 ftp      anonymous   930471 Apr 13 05:32
GSE4201_series_matrix.txt.gz\n"

Notice how yours is much longer and is HTML, not plain text.

Sean

> Sean Davis wrote:
>>
>> On Tue, Jul 8, 2008 at 11:46 AM, Harpreet Saini <hs1 at sanger.ac.uk> wrote:
>>
>>>
>>> Hi Sean,
>>>
>>> There is one more thing. In my .Rprofile file, the download.file.method
>>> option is 'wget' and we are behind the firewall.
>>>
>>> But, when I used GSEMatrix=FALSE" option, then its working.
>>>
>>
>> Harpreet, could you do me another favor and send the output of:
>>
>> getURL("ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/")
>>
>> Sean
>>
>>
>>
>>>>
>>>>  g<-getGEO("GSE4201",GSEMatrix=F)
>>>>
>>>
>>> --16:41:59--
>>>  ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SOFT/by_series/GSE4201/GSE4201
>>> _family.soft.gz
>>>         => `/tmp/Rtmpv8lgf9/GSE4201.soft.gz'
>>> Resolving wwwcache.sanger.ac.uk... 172.18.24.2, 172.18.24.1
>>> Connecting to wwwcache.sanger.ac.uk[172.18.24.2]:3128... connected.
>>> Proxy request sent, awaiting response... 200 OK
>>> Length: 4,305,926 [text/plain]
>>>
>>> 100%[====================================>] 4,305,926     10.77M/s
>>>
>>> 16:41:59 (10.76 MB/s) - `/tmp/Rtmpv8lgf9/GSE4201.soft.gz' saved
>>> [4305926/4305926 ]
>>>
>>> File stored at:
>>> /tmp/Rtmpv8lgf9/GSE4201.soft
>>> Parsing....
>>> ^PLATFORM = GPL1319
>>>
>>> Harpreet
>>>
>>>
>>> Sean Davis wrote:
>>>
>>>>
>>>> On Tue, Jul 8, 2008 at 11:09 AM, Harpreet Saini <hs1 at sanger.ac.uk>
>>>> wrote:
>>>>
>>>>
>>>>>
>>>>> Hi Sean,
>>>>>
>>>>> Sorry to bother you again.
>>>>> But, I tried again many times, and still I am getting the same error:
>>>>>
>>>>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
>>>>> na.strings,
>>>>>  :
>>>>>  line 1 did not have 8 elements
>>>>> In addition: Warning messages:
>>>>> 1: In if (nchar(val) == nchar(x)) return(NA) :
>>>>>  the condition has length > 1 and only the first element will be used
>>>>> 2: In if (nchar(val) == nchar(x)) return(NA) :
>>>>>  the condition has length > 1 and only the first element will be used
>>>>>
>>>>> I use the following commands:
>>>>>
>>>>>
>>>>>>
>>>>>> library(GEOquery)
>>>>>> gse4 <- getGEO("GSE4201", GSEMatrix = TRUE)
>>>>>>
>>>>>>
>>>>
>>>> Thanks, Harpreet.  Could you send me the complete input and output?  I
>>>> cannot see what the output looks like to see if there is a problem
>>>> with download or what else might be the problem.
>>>>
>>>> Sean
>>>>
>>>>
>>>>
>>>>>
>>>>> Sean Davis wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> On Tue, Jul 8, 2008 at 12:00 AM, Harpreet Saini <hs1 at sanger.ac.uk>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Hi Sean,
>>>>>>>
>>>>>>> Here is the output of sessionInfo()
>>>>>>>
>>>>>>> R version 2.7.0 (2008-04-22)
>>>>>>> i686-pc-linux-gnu
>>>>>>>
>>>>>>> locale:
>>>>>>> C
>>>>>>>
>>>>>>> attached base packages:
>>>>>>> [1] tools     stats     graphics  grDevices datasets  utils
>>>>>>> methods
>>>>>>> [8] base
>>>>>>>
>>>>>>> other attached packages:
>>>>>>> [1] GEOquery_2.4.0 RCurl_0.9-3    Biobase_2.0.1
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Thanks, Harpreet.  That looks fine.  Was the download interrupted?  If
>>>>>> you could try it again and include the entire session (input and
>>>>>> output) if it fails, that might be helpful.
>>>>>>
>>>>>> Sean
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> "Sean Davis" <sdavis2 at mail.nih.gov> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jul 7, 2008 at 11:34 PM, Harpreet Saini <hs1 at sanger.ac.uk>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I am trying to obtain GSE matrix files as expression sets by
>>>>>>>>> turning
>>>>>>>>> the GSEMatrix true as following:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> gse<-getGEO("GSE2553", GSEMatrix = TRUE)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I am getting the following error:
>>>>>>>>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
>>>>>>>>> na.strings,  :
>>>>>>>>>  line 1 did not have 8 elements
>>>>>>>>> In addition: Warning messages:
>>>>>>>>> 1: In if (nchar(val) == nchar(x)) return(NA) :
>>>>>>>>>  the condition has length > 1 and only the first element will be
>>>>>>>>> used
>>>>>>>>> 2: In if (nchar(val) == nchar(x)) return(NA) :
>>>>>>>>>  the condition has length > 1 and only the first element will be
>>>>>>>>> used
>>>>>>>>>
>>>>>>>>> Any help?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks, Harpreet, for the report.  Could you send the output of
>>>>>>>> sessionInfo()?  On R-devel, I am not able to reproduce the error, so
>>>>>>>> it would help to have the further detail.
>>>>>>>>
>>>>>>>> Sean
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Harpreet Kaur Saini
>>>>> Team 101, Room No. D313
>>>>> Wellcome Trust Sanger Institute
>>>>> Wellcome Trust Genome Campus
>>>>> Hinxton, Cambridge, CB10 1SA
>>>>> United Kingdom
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> The Wellcome Trust Sanger Institute is operated by Genome Research
>>>>> Limited,
>>>>> a charity registered in England with number 1021457 and a company
>>>>> registered
>>>>> in England with number 2742969, whose registered office is 215 Euston
>>>>> Road,
>>>>> London, NW1 2BE.
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>> Harpreet Kaur Saini
>>> Team 101, Room No. D313
>>> Wellcome Trust Sanger Institute
>>> Wellcome Trust Genome Campus
>>> Hinxton, Cambridge, CB10 1SA
>>> United Kingdom
>>>
>>>
>>>
>>> --
>>> The Wellcome Trust Sanger Institute is operated by Genome Research
>>> Limited,
>>> a charity registered in England with number 1021457 and a company
>>> registered
>>> in England with number 2742969, whose registered office is 215 Euston
>>> Road,
>>> London, NW1 2BE.
>>>
>>>
>>
>>
>>
>>
>
>
> --
> Harpreet Kaur Saini
> Team 101, Room No. D313
> Wellcome Trust Sanger Institute
> Wellcome Trust Genome Campus
> Hinxton, Cambridge, CB10 1SA
> United Kingdom
>
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome Research Limited,
> a charity registered in England with number 1021457 and a company registered
> in England with number 2742969, whose registered office is 215 Euston Road,
> London, NW1 2BE.
>



More information about the Bioconductor mailing list