[R] reading in data with variable length

Liaw, Andy andy_liaw at merck.com
Tue Dec 6 16:16:03 CET 2005


Use file() connection in conjunction with readLines() and strsplit() should
do it.  I would try to count the number of lines in the file first, and
create a list with that many components, then fill it in.  I believe the
"array of cells" in Matlab is sort of equivalent to a list in R, but that's
beyond my knowledge of Matlab...

Andy

From: John McHenry
> 
> I have very large csv files (up to 1GB each of ASCII text). 
> I'd like to be able to read them directly in to R. The 
> problem I am having is with the variable length of the data 
> in each record.
>    
>   Here's a (simplified) example:
>    
>   $ cat foo.csv
> Name,Start Month,Data
> Foo,10,-0.5615,2.3065,0.1589,-0.3649,1.5955
> Bar,21,0.0880,0.5733,0.0081,2.0253,-0.7602,0.7765,0.2810,1.854
> 6,0.2696,0.3316,0.1565,-0.4847,-0.1325,0.0454,-1.2114
>    
>   The records consist of rows with some set comma-separated 
> fields (e.g. the "Name" & "Start Month" fields in the above) 
> and then the data follow as a variable-length list of 
> comma-separated values until a new line is encountered.
>    
>   Now I can use e.g.
>    
>   fileName="foo.csv"  
> ta<-read.csv(fileName, header=F, skip=1, sep=",", dec=".", fill=T)  
>    
>   which does the job nicely:
>    
>      V1 V2      V3     V4     V5      V6      V7     V8    V9 
>    V10    V11    V12    V13     V14     V15    V16     V17
> 1 Foo 10 -0.5615 2.3065 0.1589 -0.3649  1.5955     NA    NA   
>   NA     NA     NA     NA      NA      NA     NA      NA
> 2 Bar 21  0.0880 0.5733 0.0081  2.0253 -0.7602 0.7765 0.281 
> 1.8546 0.2696 0.3316 0.1565 -0.4847 -0.1325 0.0454 -1.2114
> 
>    
>   but the problem is with files on the order of 1GB this 
> either crunches for ever or runs out of memory trying ... 
> plus having all those NAs isn't too pretty to look at. 
>    
>   (I have a MATLAB version that can read this stuff into an 
> array of cells in about 3 minutes).
>    
>   I really want a fast way to read the data part into a list; 
> that way I can access data in the array of lists containing 
> the records by doing something ta[[i]]$data.
>    
>   Ideas?
>    
>   Thanks,
>    
>   Jack.
> 
> 			
> ---------------------------------
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
>




More information about the R-help mailing list