[R] Tricky filtering

Thu Oct 31 10:23:41 CET 2019

Hi Bert, thanks for your replay, and sorry for not be so clear. Let´s try:

What if the 2 records with largest Mean_power are not the same as the two
with largest N_records. Do you want to keep all four records?
In the sample data that I used to understand what is going on, this never
happened.  But, if so, I should ignore N_records and use just Mean_power.

Or various combinations of this question that would keep 3 records.
No, at this moment, I just need to keep with one record. Maybe in the
future I will need to filter the raw data, but now I just need to have one
record in ANT01 OR ANT02 per day.

And will you always have two records on a date, or could you have just one?
Yes, I can have just one record. Probably will be with the ANT that have
lower Mean_power.

And if the 2 records with largest Mean_power always also have the largest
N_records, then you only need to choose the two with largest Mean_power and
can ignore the N_records, right?
Right, exactly that!

Thanks for your attention and help!

Raoni

Em qui, 31 de out de 2019 às 01:17, Bert Gunter <bgunter.4567 using gmail.com>
escreveu:

> Thanks for the nice dput example, but your specification confuses me.
> What if the 2 records with largest Mean_power are not the same as the two
> with largest N_records. Do you want to keep all four records? Or various
> combinations of this question that would keep 3 records. And will you
> always have two records on a date, or could you have just one? And if the 2
> records with largest Mean_power always also have the largest N_records,
> then you only need to choose the two with largest Mean_power and can ignore
> the N_records, right?
>
> Once you have answered these questions -- or someone else has a better
> understanding than I -- it should be easy. It will require a loop of one
> form or another, however, and therefore might take a while.
>
> Cheers,
> Bert
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Wed, Oct 30, 2019 at 7:55 PM Cacique Samurai <caciquesamurai using gmail.com>
> wrote:
>
>> Hi all,
>>
>> I had a fish telemetry data with more then 11 million lines. I had some
>> false records in the data, that I have to eliminate. I can solve this
>> using
>> a loop, but I think that dplyr:: filter could be faster and elegant. I
>> just
>> can't figure out how to do it.
>>
>> At this moment, I already summarized this raw data, and had something like
>> this (dput at end of e-mail):
>>
>> Date Station Antenna Mean_power N_records *Action need (manually
>> inserted)*
>> 29/03/2019 ANT01 1 108 1704 Remove
>> 29/03/2019 ANT01 2 94 1219 Remove
>> 29/03/2019 ANT02 1 220 3029 Keep
>> 29/03/2019 ANT02 2 219 2711 Keep
>> 30/03/2019 ANT01 1 204 2289 Keep
>> 30/03/2019 ANT01 2 172 1477 Keep
>> 30/03/2019 ANT02 1 88 913 Remove
>> 30/03/2019 ANT02 2 72 1080 Remove
>> 30/03/2019 ETE01 AH0 87 1 Keep
>>
>> The problem occurs between Stations ANT01 and ANT02. In the same day, I
>> have to keep the pair of records that have bigger Mean_power and more
>> N_records. In this example, I have to keep records in Station ANT02 in
>> 29/03 and of ANT01 and ETE01 in 30/03. If I do not have more than ANT01
>> and
>> ANT02 in the same day, it was a simple question.
>>
>> I have to do this for each marked fish, that is identified by a Code
>> supres
>> here for resuming.
>>
>> Thanks in advanced,
>>
>> Raoni
>>
>>
>> structure(list(Date = structure(c(17984, 17984, 17984, 17984, 17985,
>> 17985,
>> 17985, 17985, 17985), class = "Date"),
>> Station = c("ANT01","ANT01", "ANT02", "ANT02", "ANT01", "ANT01", "ANT02",
>> "ANT02","ETE01"),
>> Antenna = c("1", "2", "1", "2", "1", "2", "1", "2","AH0"),
>> Media_power = c(108, 94, 220, 219, 204, 172, 88, 72, 87), N_records =
>> c(1704L, 1219L, 3029L, 2711L, 2289L, 1477L, 913L, 1080L, 1L)),
>> row.names = c(NA, -9L), class = c("grouped_df", "tbl_df", "tbl",
>> "data.frame"),
>> groups = structure(list(Date = structure(c(17984, 17984, 17985, 17985,
>> 17985), class = "Date"), Station = c("ANT01",
>> "ANT02", "ANT01", "ANT02", "ETE01"), .rows = list(1:2, 3:4, 5:6, 7:8,
>> 9L)),
>> row.names = c(NA, -5L), class = c("tbl_df", "tbl",
>> "data.frame"), .drop = TRUE))
>>
>>
>>
>>
>>
>>
>>
>> --
>> Raoni Rosa Rodrigues
>> Research Associate of Fish Transposition Center CTPeixes
>> Universidade Federal de Minas Gerais - UFMG
>> Brasil
>> rodrigues.raoni using gmail.com
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

-- 
Raoni Rosa Rodrigues
Research Associate of Fish Transposition Center CTPeixes
Universidade Federal de Minas Gerais - UFMG
Brasil
rodrigues.raoni using gmail.com

	[[alternative HTML version deleted]]