[BioC] Importing data from GEOquery

Vincent Carey 525-2265 stvjc at channing.harvard.edu
Wed Jun 25 22:21:11 CEST 2008


On Wed, 25 Jun 2008, Sean Davis wrote:

> >
> > On Wed, 25 Jun 2008, Kini, Aditya M wrote:
> >
> >> Hi,
> >>
> >> I am repeatedly getting this error message when I try to import a file from GEO. Here is the code:
> >>
> >> > gsm.1 <- getGEO("GSM3612")
> >> trying URL 'http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?targ=self&acc=GSM3612&form=text&view=full'
> >> Content type 'geo/text' length unknown
> >> opened URL
> >> downloaded 1.5 Mb
> >>
> >> File stored at:
> >> C:\Users\Aditya\AppData\Local\Temp\Rtmp5ZePB5/GSM3612.soft
> >> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
> >>   scan() expected 'a real', got '!sample_table_end'
> >> > gsm.1
> >> Error: object "gsm.1" not found
> >>
> >> Please let me know what the problem is.
>
> Thanks, Vince, for checking into the problem.  It looks like those
> files have a multibyte character (that was supposed to be a degree
> symbol, from the looks of it) that is problematic in at least some
> locales.  I don't know of an easy way to fix the problem, as the files
> at NCBI are supposed to all be in the same character encoding (UTF-8).
>  If anyone knows of a solution, let me know.
>
>

i just checked it on a more recent version of R and we get more info:

downloaded 1.5 Mb

File stored at:
/tmp/RtmpjQn5oh/GSM3612.soft
Error in make.names(col.names, unique = TRUE) :
  invalid multibyte string 29
In addition: Warning messages:
1: In grep("!\\w+_table_begin", txt[i], perl = TRUE) :
  input string 1 is invalid in this locale
2: In grep("!\\w+_table_begin", txt[i], perl = TRUE) :
  input string 1 is invalid in this locale
3: In grep("^#", txt, perl = TRUE) :
  input string 42 is invalid in this locale
4: In grep("^#", txt, perl = TRUE) :
  input string 67 is invalid in this locale
5: In grep("!\\w*_", txt, perl = TRUE, value = TRUE) :
  input string 42 is invalid in this locale
6: In grep("!\\w*_", txt, perl = TRUE, value = TRUE) :
  input string 67 is invalid in this locale
7: In grep(leader, txt) : input string 42 is invalid in this locale
8: In grep(leader, txt) : input string 67 is invalid in this locale
> sessionInfo()
R version 2.8.0 Under development (unstable) (--)
x86_64-unknown-linux-gnu

locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] tools     stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] GEOquery_2.5.0 RCurl_0.9-3    Biobase_2.1.7

while sean has diagnosed the problem, i report the sessionInfo() because the locale
may also contribute to problems in such endeavors.  people reporting data read problems
should take care to provide all this information in the future.

The information transmitted in this electronic communica...{{dropped:10}}



More information about the Bioconductor mailing list