[R] Help searching a matrix for only certain records

Jim Holtman jholtman at gmail.com
Sun Mar 3 17:22:04 CET 2013


there are way "more efficient" ways of doing many of the operations , but you probably won't see any differences unless you have very large objects (several hunfred thousand entries), or have to do it a lot of times.  My background is in computer performance and for the most part I have found that the easiest/mostbstraight forward ways are fine most of the time.

a more efficient way might be:

testdata <- testdata[match(c('SAO ', 'FL-15'), testdata$REC.TYPE), ]

you can always use 'system.time' to determine how long actions take.

for multiple comparisons use %in%

Sent from my iPad

On Mar 3, 2013, at 9:22, Matt Borkowski <mathias1979 at yahoo.com> wrote:

> Thank you for your response Jim! I will give this one a try! But a couple followup questions...
> 
> In my search for a solution, I had seen something stating match() is much more efficient than subset() and will cut down significantly on computing time. Is there any truth to that?
> 
> Also, I found the following solution which works for matching a single condition, but I couldn't quite figure out how to  modify it it to search for both my acceptable conditions...
> 
>> testdata <- testdata[testdata$REC.TYPE == "SAO",,drop=FALSE]
> 
> -Matt
> 
> 
> 
> 
> --- On Sun, 3/3/13, jim holtman <jholtman at gmail.com> wrote:
> 
> From: jim holtman <jholtman at gmail.com>
> Subject: Re: [R] Help searching a matrix for only certain records
> To: "Matt Borkowski" <mathias1979 at yahoo.com>
> Cc: r-help at r-project.org
> Date: Sunday, March 3, 2013, 8:00 AM
> 
> Try this:
> 
> dataset <- subset(dataset, grepl("(SAO |FL-15)", REC.TYPE))
> 
> 
> On Sun, Mar 3, 2013 at 1:11 AM, Matt Borkowski <mathias1979 at yahoo.com> wrote:
>> Let me start by saying I am rather new to R and generally consider myself to be a novice programmer...so don't assume I know what I'm doing :)
>> 
>> I have a large matrix, approximately 300,000 x 14. It's essentially a 20-year dataset of 15-minute data. However, I only need the rows where the column I've named REC.TYPE contains the string "SAO  " or "FL-15".
>> 
>> My horribly inefficient solution was to search the matrix row by row, test the REC.TYPE column and essentially delete the row if it did not match my criteria. Essentially...
>> 
>>> j <- 1
>>> for (i in 1:nrow(dataset)) {
>>>     if(dataset$REC.TYPE[j] != "SAO  " && dataset$RECTYPE[j] != "FL-15") {
>>>       dataset <- dataset[-j,]  }
>>>     else {
>>>       j <- j+1  }
>>> }
>> 
>> After watching my code get through only about 10% of the matrix in an hour and slowing with every row...I figure there must be a more efficient way of pulling out only the records I need...especially when I need to repeat this for another 8 datasets.
>> 
>> Can anyone point me in the right direction?
>> 
>> Thanks!
>> 
>> Matt
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> -- 
> Jim Holtman
> Data Munger Guru
> 
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
> 



More information about the R-help mailing list