[R] Efficient way of loading files in R

Fri Sep 7 12:14:58 CEST 2018

Ask on the Bioconductor support site https://support.bioconductor.org

Provide (on the support site) the output of the R commands

   library(GEOquery)
   sessionInfo()

Also include (copy and paste) the output of the command that fails. I have

 > gseEset2 <- getGEO('GSE76896')[[1]]
Found 1 file(s)
GSE76896_series_matrix.txt.gz
trying URL 
'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE76nnn/GSE76896/matrix/GSE76896_series_matrix.txt.gz'
Content type 'application/x-gzip' length 40561936 bytes (38.7 MB)
==================================================
downloaded 38.7 MB

Parsed with column specification:
cols(
   .default = col_double(),
   ID_REF = col_character()
)
See spec(...) for full column specifications.
|=================================================================| 100% 
   84 MB
File stored at:
/tmp/Rtmpe4NWji/GPL570.soft
|=================================================================| 100% 
   75 MB
 > sessionInfo()
R version 3.5.1 Patched (2018-08-22 r75177)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS

Matrix products: default
BLAS: /home/mtmorgan/bin/R-3-5-branch/lib/libRblas.so
LAPACK: /home/mtmorgan/bin/R-3-5-branch/lib/libRlapack.so

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] bindrcpp_0.2.2      GEOquery_2.49.1     Biobase_2.41.2
[4] BiocGenerics_0.27.1 BiocManager_1.30.2

loaded via a namespace (and not attached):
  [1] Rcpp_0.12.18     tidyr_0.8.1      crayon_1.3.4     dplyr_0.7.6
  [5] assertthat_0.2.0 R6_2.2.2         magrittr_1.5     pillar_1.3.0
  [9] stringi_1.2.4    rlang_0.2.2      curl_3.2         limma_3.37.4
[13] xml2_1.2.0       tools_3.5.1      readr_1.1.1      glue_1.3.0
[17] purrr_0.2.5      hms_0.4.2        compiler_3.5.1   pkgconfig_2.0.2
[21] tidyselect_0.2.4 bindr_0.1.1      tibble_1.4.2

On 09/07/2018 06:08 AM, Deepa wrote:
> Hello,
> 
> I am using a bioconductor package in R.
> The command that I use reads the contents of a file downloaded from a
> database and creates an expression object.
> 
> The syntax works perfectly fine when the input size is of 10 MB. Whereas,
> when the file size is around 40MB the object isn't created.
> 
> Is there an efficient way of loading a large input file to create the
> expression object?
> 
> This is my code,
> 
> 
> library(gcrma)
> library(limma)
> library(biomaRt)
> library(GEOquery)
> library(Biobase)
> require(GEOquery)
> require(Biobase)
> gseEset1 <- getGEO('GSE53454')[[1]] #filesize 10MB
> gseEset2 <- getGEO('GSE76896')[[1]] #file size 40MB
> 
> ##gseEset2 doesn't load and isn't created
> 
> Many thanks
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>