[R] How to exclude lines that match certain regex when using read.table?

Peng Yu pengyu.ut at gmail.com
Fri Dec 4 04:13:43 CET 2009


On Thu, Dec 3, 2009 at 9:09 PM, Sharpie <chuck at sharpsteen.net> wrote:
>
>
> pengyu.ut wrote:
>>
>> I'm thinking of using external program 'grep' and pipe() to do so. But
>> I'm wondering if there is a more efficient way to do so purely in R
>>
>
> I would just suck the whole table in using read.table(), locate the lines
> that I don't want using apply() and grepl() and then reduce the data set:
>
>  dataSet <- read.table( "someData.txt" )
>
>  dataToDrop <- apply( dataSet, 1, function( row ){
>
>    return(
>      any( grepl( "regex", row ) )
>    )
>
>  })
>
>  dataSet <- subset( dataSet, !dataToDrop )
>
> Since this solution executes entirely in R without resorting to system()
> calls, it should be portable between platforms.

This is not acceptable for my case. The orignal file, which is in .gz
format, is about 100MB. It's original size should be pretty big. But I
only needs about 2% of the data in the original file. It takes a long
time to just read all the file in, if I use your method.




More information about the R-help mailing list