[R] use subset to trim data but include last per category
William Dunlap
wdunlap at tibco.com
Sun Sep 9 18:23:38 CEST 2012
> I would like to change the
> subset clause to be iter %% 500 _or_ the record is the last per n
If your data.frame df is sorted by n you can define the function
isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE)
and use it as
subset(df, iter %% 500 == 0 | isLastInRun(n))
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Giovanni Azua
> Sent: Sunday, September 09, 2012 8:14 AM
> To: r-help at r-project.org
> Subject: [R] use subset to trim data but include last per category
>
> Hello,
>
> I bumped into the following funny use-case. I have too much data for a given plot. I have
> the following data frame df:
>
> > str(df)
> 'data.frame': 5015 obs. of 5 variables:
> $ n : Factor w/ 5 levels "1000","2000",..: 1 1 1 1 1 1 1 1 1 1 ...
> $ iter : int 10 20 30 40 50 60 70 80 90 100 ...
> $ Error : num 1.05e-02 1.24e-03 3.67e-04 1.08e-04 4.05e-05 ...
> $ Duality_Gap: num 20080 3789 855 443 321 ...
> $ Runtime : num 0.00536 0.01353 0.01462 0.01571 0.01681 ...
>
> But if I plot e.g. Runtime vs log(Duality Gap) I have too many observations due to taking a
> snapshot every 10 iterations rather than say 500 and the plot looks very cluttered. So I
> would like to trim the data frame including only those records for which iter is multiple of
> 500 and so I do this:
>
> df <- subset(df, iter %% 500 == 0)
>
> This gives me almost exactly what I need except that the last and most important Duality
> Gap observations are of course gone due to the filtering ... I would like to change the
> subset clause to be iter %% 500 _or_ the record is the last per n (n is my problem size and
> category in this case) ... how can I do that?
>
> I thought of adding a new column that flags whether a given row is the last element per
> category as "last" Boolean but this is a bit too complicated .. is there a simpler condition
> construct that can be used with the subset command?
>
> TIA,
> Best regards,
> Giovanni
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list