[R] Reading gz compressed csv file - 'incomplete line found'

Paolo Innocenti innocenti.paolo at gmail.com
Fri Jan 21 01:38:50 CET 2011


Hi all,

I am trying to download, decompress and read a csv file. My code:

myurl <- 
"ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/series/GSE24729/GSE24729_MitoNuclear_suppl_male_stats.csv.gz" 

#
myfile <- "GSE24729_MitoNuclear_suppl_male_stats.csv.gz"
#
download.file(myurl, destfile=myfile, mode="w")
#
mycon <- gzcon(gzfile(myfile, open="r"))
#
mydata <- read.csv(textConnection(readLines(mycon)))
#
close(mycon)

works under my linux distribution, but under windows, I get the 
following warning:

 > myurl <- 
"ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/series/GSE24729/GSE24729_MitoNuclear_suppl_male_stats.csv.gz" 

 > myfile <- "GSE24729_MitoNuclear_suppl_male_stats.csv.gz"
 > download.file(myurl, destfile=myfile, mode="w")
trying URL 
'ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/series/GSE24729/GSE24729_MitoNuclear_suppl_male_stats.csv.gz' 

ftp data connection made, file length 535641 bytes
opened URL
downloaded 523 Kb

 > mycon <- gzcon(gzfile(myfile, open="r"))
 > mydata <- read.csv(textConnection(readLines(mycon)))
Warning message:
In readLines(mycon) :
   incomplete final line found on 
'gzcon(GSE24729_MitoNuclear_suppl_male_stats.csv.gz)'
 > close(mycon)

I can read only 30 lines, and then stops working. Does anyone have any 
suggestion? I suspect the problem lies in gzcon/gzfile not decompressing 
properly, or in some other problem with the end of line/end of file, but 
the help files are a bit above my level of understanding.

Thanks,
paolo

 > sessionInfo()
R version 2.12.1 (2010-12-16)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
  [1] lattice_0.19-13      drosophila2.db_2.4.5 org.Dm.eg.db_2.4.6
  [4] GOstats_2.16.0       RSQLite_0.9-4        DBI_0.2-5
  [7] graph_1.28.0         Category_2.16.0      AnnotationDbi_1.12.0
[10] xtable_1.5-6         GEOquery_2.16.3      ellipse_0.3-5
[13] RColorBrewer_1.0-2   hopach_2.10.0        cluster_1.13.2
[16] limma_3.6.9          genefilter_1.32.0    vsn_3.18.0
[19] affy_1.28.0          Biobase_2.10.0

loaded via a namespace (and not attached):
  [1] affyio_1.18.0         annotate_1.28.0       GO.db_2.4.5
  [4] GSEABase_1.12.2       preprocessCore_1.12.0 RBGL_1.26.0
  [7] RCurl_1.5-0.1         splines_2.12.1        survival_2.36-2
[10] tools_2.12.1          XML_3.2-0.2



More information about the R-help mailing list