[R] How to import specific column(s) using "read.table"?

A.J. Rossini rossini at blindglobe.net
Mon Aug 9 22:52:31 CEST 2004


If you've got access to unix tools (i.e. linux or cygwin), consider
the "cut" command.  Great for column selection.


Thomas Lumley <tlumley at u.washington.edu> writes:

> On Mon, 9 Aug 2004, F Duan wrote:
>
>> Dear R people,
>>
>> I have a very big tab-delim txt file with header and I only want to import
>> several columns into R. I checked the options for "read.table" and only
>> found "nrows" which lets you specify the maximum number of rows to read in.
>> Although I can use some text editors (e.g., wordpad) to edit the txt file first
>> before running R, I feel it’s not very convenient. The reason for me to do this
>> is that if I import the whole file into R, it will eat up too much of my
>> system’s memory. Even after I remove it later, I still can’t release the memory.
>>
>
> You can't avoid reading the whole file, but you can avoid having it in
> memory.
>
> I'll assume you know how many lines are in the file, call it N. (this
> isn't necessary  but it is tidier) and that you are interested in columns
> 10 and 110, both numeric
>
> If you do something like
>
> inputfile<-file("inputfile.txt",open="r")
> result<-data.frame(col10=numeric(N), col110=numeric(N))
> chunksize<-1000
> nchunks<- ceiling(N/1000)
>
> for(i in 1:nchunks){
> 	chunk<-read.table(inputfile,nrows=chunksize)
> 	result[ (i-1)*chunksize+ (1:chunksize),]<-chunk[,c(10,110)]
> }
>
> close(inputfile)
>
> you can choose the chunk size so that the memory use is not too bad.
>
> There are also more efficient ways that make you do more of the work (eg
> read in lines of text with readLines and use regular expressions to
> extract the columns you need)
>
> 	-thomas
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

-- 
Anthony Rossini			    Research Associate Professor
rossini at u.washington.edu            http://www.analytics.washington.edu/ 
Biomedical and Health Informatics   University of Washington
Biostatistics, SCHARP/HVTN          Fred Hutchinson Cancer Research Center
UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable
FHCRC  (M/W): 206-667-7025 FAX=206-667-4812 | use Email

CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}}




More information about the R-help mailing list