[R] How to set a filter during reading tables

Juliet Hannah juliet.hannah at gmail.com
Mon Jun 1 01:37:40 CEST 2009


There are several things you can tell read.table to make it faster.

First, as mentioned, setting colClasses helps. I think telling read.table how
many rows and columns there are also helps.

When this was not sufficient,  I've had to do the data processing
using Python, Perl, or awk.

If that had not been convenient I would have tried the sqldf solution that was
mentioned.

That covers all the options I'm familiar with. I'm also curious about other ways
to selectively read in rows in R. Let me know what ends up working.



On Sun, May 31, 2009 at 2:17 PM,  <guox at ucalgary.ca> wrote:
> Since there are many rows, using read.table we spent too much on reading
> in rows that we do not want. We are wondering if there is a way to read
> only rows that we are interested in. Thanks,
>
> -james
>> I think you can use readLines(n=1) in loop to skip unwanted rows.
>>
>> On Mon, Jun 1, 2009 at 12:56 AM,  <guox at ucalgary.ca> wrote:
>>> Thanks, Juliet.
>>> It works for filtering columns.
>>> I am also wondering if there is a way to filter rows.
>>> Thanks again.
>>> -james
>>>
>>>> One can use colClasses to set which columns get read in. For the
>>>> columns you don't
>>>> want you can set those to NULL. For example,
>>>>
>>>> cc <- c("NULL",rep("numeric",9))
>>>>
>>>> myData <-
>>>> read.table("myFile.txt",header=TRUE,colClasses=cc,nrow=numRows).
>>>>
>>>>
>>>> On Wed, May 27, 2009 at 12:27 PM,  <guox at ucalgary.ca> wrote:
>>>>> We are reading big tables, such as,
>>>>>
>>>>> Chemicals <-
>>>>> read.table('ftp://ftp.bls.gov/pub/time.series/wp/wp.data.7.Chemicals',header
>>>>> = TRUE, sep = '\t', as.is =T)
>>>>>
>>>>> I was wondering if it is possible to set a filter during loading so
>>>>> that
>>>>> we just load what we want not the whole table each time. Thanks,
>>>>>
>>>>> -james
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>>
>>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>
>
>




More information about the R-help mailing list