[R] Conditional subsetting
Gabor Grothendieck
ggrothendieck at gmail.com
Wed Aug 8 04:25:07 CEST 2007
Read the data in and transform the Date, Time and Quality columns
so that Date and Time are the respective chron classes (see RNews 4/1 Help
Desk article) and Quality is a factor such that lower levels are better.
Then sort the data frame by Date, Quality and time from noon. That would
actually be the default ordering if the shown factor levels are the only ones
since they naturally fall into that order but we have explicitly ordered them
in case your extension to other levels no longer obeys that property.
Finally take the first row from each Date group.
Lines <- "DeployID Date.Time LocationQuality Latitude Longitude
STM05-1 28/02/2005 17:35 Good -35.562 177.158
STM05-1 28/02/2005 19:44 Good -35.487 177.129
STM05-1 28/02/2005 23:01 Unknown -35.399 177.064
STM05-1 01/03/2005 07:28 Unknown -34.978 177.268
STM05-1 01/03/2005 18:06 Poor -34.799 177.027
STM05-1 01/03/2005 18:47 Poor -34.85 177.059
STM05-2 28/02/2005 12:49 Good -35.928 177.328
STM05-2 28/02/2005 21:23 Poor -35.926 177.314
"
library(chron)
# in next line replace textConnection(Lines) with "myfile.dat"
DF <- read.table(textConnection(Lines), skip = 1, as.is = TRUE,
col.names = c("Id", "Date", "Time", "Quality", "Lat", "Long"))
DF <- transform(DF,
Date = chron(Date, format = "d/m/y"),
Time = times(paste(Time, "00", sep = ":")),
Quality = factor(Quality, levels = c("Good", "Poor", "Unknown")))
o <- order(DF$Date, as.numeric(DF$Quality), abs(DF$Time - times("12:00:00")))
DF <- DF[o,]
DF[tapply(row.names(DF), DF$Date, head, 1), ]
# The last line above could alternately be written like this:
do.call("rbind", by(DF, DF$Date, head, 1))
On 8/7/07, Tim Sippel <tsip002 at ec.auckland.ac.nz> wrote:
> Thanks Ross. Yes, upon your remark it seems that maybe I need to be
> selecting first for the best available location, and then by the time stamp
> nearest to 12pm. It seems like tapply(), or aggregate() or aggregate.ts()
> might be applicable, but I'm working through trying different functions and
> have yet to find what works. Suggestions on what function to start with
> would be appreciated.
>
>
>
> Tim Sippel (MSc)
>
> School of Biological Sciences
>
> Auckland University
>
> Private Bag 92019
>
> Auckland 1020
>
> New Zealand
>
> +64-9-373-7599 ext. 84589 (work)
>
> +64-9-373-7668 (Fax)
>
> +64-21-593-001 (mobile)
>
>
>
> _____
>
> From: Ross Darnell [mailto:r.darnell at uq.edu.au]
> Sent: 08 August 2007 12:43
> To: Tim Sippel; r-help at stat.math.ethz.ch
> Subject: RE: [R] Conditional subsetting
>
>
>
>
>
> Assuming *A* animal can only be in one location at any one time, I don't
> understand how once you have selected the location nearest 12pm how you can
> select based on the type of location?
> Do you mean to select on location before timei.e. from the best location
> visited that day which was closest to 12pm?
>
> Perhaps for condition 2 for a animal which.min(time-1200) (replace 1200
> with properly defined timestamp representing 12:00pm)
>
> Ross Darnell
>
>
>
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch on behalf of Tim Sippel
> Sent: Wed 08-Aug-07 9:37 AM
> To: r-help at stat.math.ethz.ch
> Subject: [R] Conditional subsetting
>
> Hello-
>
> Upon searching the email archives and reading documentation I haven't found
> what I'm looking for. I have a dataset with a time/date stamp on a series
> of geographic locations (lat/lon) associated with the movements of animals.
> On some days there are numerous locations and some days there is only one
> location. Associated with each location is an indication that the quality
> of the location is either "good", "poor", or "unknown". For each animal, I
> want to extract only one location for each day. And I want the location
> extracted to be nearest to 12pm on any given day, and highest quality
> possible.
>
> So the order of priority for my extraction will be: 1. only one location per
> animal each day; 2. the location selected be as close to 12pm as possible;
> 3. the selected location comes from the best locations available (ie. take
> from pool of "good" locations first, or select from "poor" locations if a
> "good" location isn't available, or "unknown" if nothing else is available).
>
>
>
>
> I think aspect of this task that has me particularly stumped is how to
> select only one location for each day, and for that location to be a close
> to 12pm as possible.
>
>
>
> An example of my dataset follows:
>
>
>
>
>
> DeployID
>
>
>
> Date.Time
>
>
>
> LocationQuality
>
> Latitude
>
> Longitude
>
>
> STM05-1
>
>
>
> 28/02/2005 17:35
>
>
>
> Good
>
> -35.562
>
> 177.158
>
>
> STM05-1
>
>
>
> 28/02/2005 19:44
>
>
>
> Good
>
> -35.487
>
> 177.129
>
>
> STM05-1
>
>
>
> 28/02/2005 23:01
>
>
>
> Unknown
>
> -35.399
>
> 177.064
>
>
> STM05-1
>
>
>
> 01/03/2005 07:28
>
>
>
> Unknown
>
> -34.978
>
> 177.268
>
>
> STM05-1
>
>
>
> 01/03/2005 18:06
>
>
>
> Poor
>
> -34.799
>
> 177.027
>
>
> STM05-1
>
>
>
> 01/03/2005 18:47
>
>
>
> Poor
>
> -34.85
>
> 177.059
>
>
> STM05-2
>
>
>
> 28/02/2005 12:49
>
>
>
> Good
>
> -35.928
>
> 177.328
>
>
> STM05-2
>
>
>
> 28/02/2005 21:23
>
>
>
> Poor
>
> -35.926
>
> 177.314
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Many thanks for your input. I'm using R 2.5.1 on Windows XP.
>
>
>
> Cheers,
>
>
>
> Tim Sippel (MSc)
>
> School of Biological Sciences
>
> Auckland University
>
> Private Bag 92019
>
> Auckland 1020
>
> New Zealand
>
> +64-9-373-7599 ext. 84589 (work)
>
> +64-9-373-7668 (Fax)
>
> +64-21-593-001 (mobile)
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list