[R] How to import specific column(s) using "read.table"?
Thomas Lumley
tlumley at u.washington.edu
Mon Aug 9 22:52:01 CEST 2004
On Mon, 9 Aug 2004, F Duan wrote:
> Dear R people,
>
> I have a very big tab-delim txt file with header and I only want to import
> several columns into R. I checked the options for "read.table" and only
> found "nrows" which lets you specify the maximum number of rows to read in.
> Although I can use some text editors (e.g., wordpad) to edit the txt file first
> before running R, I feel its not very convenient. The reason for me to do this
> is that if I import the whole file into R, it will eat up too much of my
> systems memory. Even after I remove it later, I still cant release the memory.
>
You can't avoid reading the whole file, but you can avoid having it in
memory.
I'll assume you know how many lines are in the file, call it N. (this
isn't necessary but it is tidier) and that you are interested in columns
10 and 110, both numeric
If you do something like
inputfile<-file("inputfile.txt",open="r")
result<-data.frame(col10=numeric(N), col110=numeric(N))
chunksize<-1000
nchunks<- ceiling(N/1000)
for(i in 1:nchunks){
chunk<-read.table(inputfile,nrows=chunksize)
result[ (i-1)*chunksize+ (1:chunksize),]<-chunk[,c(10,110)]
}
close(inputfile)
you can choose the chunk size so that the memory use is not too bad.
There are also more efficient ways that make you do more of the work (eg
read in lines of text with readLines and use regular expressions to
extract the columns you need)
-thomas
More information about the R-help
mailing list