[R] Help searching a matrix for only certain records
Jim Holtman
jholtman at gmail.com
Sun Mar 3 17:22:04 CET 2013
there are way "more efficient" ways of doing many of the operations , but you probably won't see any differences unless you have very large objects (several hunfred thousand entries), or have to do it a lot of times. My background is in computer performance and for the most part I have found that the easiest/mostbstraight forward ways are fine most of the time.
a more efficient way might be:
testdata <- testdata[match(c('SAO ', 'FL-15'), testdata$REC.TYPE), ]
you can always use 'system.time' to determine how long actions take.
for multiple comparisons use %in%
Sent from my iPad
On Mar 3, 2013, at 9:22, Matt Borkowski <mathias1979 at yahoo.com> wrote:
> Thank you for your response Jim! I will give this one a try! But a couple followup questions...
>
> In my search for a solution, I had seen something stating match() is much more efficient than subset() and will cut down significantly on computing time. Is there any truth to that?
>
> Also, I found the following solution which works for matching a single condition, but I couldn't quite figure out how to modify it it to search for both my acceptable conditions...
>
>> testdata <- testdata[testdata$REC.TYPE == "SAO",,drop=FALSE]
>
> -Matt
>
>
>
>
> --- On Sun, 3/3/13, jim holtman <jholtman at gmail.com> wrote:
>
> From: jim holtman <jholtman at gmail.com>
> Subject: Re: [R] Help searching a matrix for only certain records
> To: "Matt Borkowski" <mathias1979 at yahoo.com>
> Cc: r-help at r-project.org
> Date: Sunday, March 3, 2013, 8:00 AM
>
> Try this:
>
> dataset <- subset(dataset, grepl("(SAO |FL-15)", REC.TYPE))
>
>
> On Sun, Mar 3, 2013 at 1:11 AM, Matt Borkowski <mathias1979 at yahoo.com> wrote:
>> Let me start by saying I am rather new to R and generally consider myself to be a novice programmer...so don't assume I know what I'm doing :)
>>
>> I have a large matrix, approximately 300,000 x 14. It's essentially a 20-year dataset of 15-minute data. However, I only need the rows where the column I've named REC.TYPE contains the string "SAO " or "FL-15".
>>
>> My horribly inefficient solution was to search the matrix row by row, test the REC.TYPE column and essentially delete the row if it did not match my criteria. Essentially...
>>
>>> j <- 1
>>> for (i in 1:nrow(dataset)) {
>>> if(dataset$REC.TYPE[j] != "SAO " && dataset$RECTYPE[j] != "FL-15") {
>>> dataset <- dataset[-j,] }
>>> else {
>>> j <- j+1 }
>>> }
>>
>> After watching my code get through only about 10% of the matrix in an hour and slowing with every row...I figure there must be a more efficient way of pulling out only the records I need...especially when I need to repeat this for another 8 datasets.
>>
>> Can anyone point me in the right direction?
>>
>> Thanks!
>>
>> Matt
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
More information about the R-help
mailing list