[R] Row limit for read.table

Wed Jan 17 17:39:48 CET 2007

Frank McCown wrote:
> I have been trying to read in a large data set using read.table, but 
> I've only been able to grab the first 50,871 rows of the total 122,269 rows.
>
>  > f <- 
> read.table("http://www.cs.odu.edu/~fmccown/R/Tchange_rates_crawled.dat", 
> header=TRUE, nrows=123000, comment.char="", sep="\t")
>  > length(f$change_rate)
> [1] 50871
>
>  From searching the email archives, I believe this is due to size limits 
> of a data frame.  So...
>   
I think you believe wrongly...
> 1) Why doesn't read.table give a proper warning when it doesn't place 
> every read item into a data frame?
>   
That isn't the problem, it is a somewhat obscure interaction between
quote= and sep= that is doing you in. Remove the sep="\t" and/or add
quote="" and your life should be easier.
> 2) Why isn't there a parameter to read.table that allows the user to 
> specify which columns s/he is interested in?  This functionality would 
> allow extraneous columns to be ignored which would improve memory usage.
>
>   
There is!  check out colClasses

> cc <- rep("NULL",5)
> cc[4:5] <- NA
> f <-
read.table("http://www.cs.odu.edu/~fmccown/R/Tchange_rates_crawled.dat",
header=TRUE, sep="\t", quote="", colClasses=cc)
> str(f)
'data.frame':   122271 obs. of  2 variables:
 $ recovered  : Factor w/ 5 levels "changed","identical",..: 5 3 3 3 2 2
2 2 1 2 ...
 $ change_rate: num  1 0 0 1 0 0 0 0 0 0 ...

> I've already made a work-around by loading the table into mysql and 
> doing a select on the 2 columns I need.  I just wonder why the above 2 
> points aren't implemented.  Maybe they are and I'm totally missing it.
>
> Thanks,
> Frank
>
>
>   


-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907