[R] use subset to trim data but include last per category

William Dunlap wdunlap at tibco.com
Sun Sep 9 18:23:38 CEST 2012


> I would like to change the
> subset clause to be iter %% 500 _or_ the record is the last per n 

If your data.frame df is sorted by n you can define the function
   isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE)
and use it as
   subset(df, iter %% 500 == 0 | isLastInRun(n)) 

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Giovanni Azua
> Sent: Sunday, September 09, 2012 8:14 AM
> To: r-help at r-project.org
> Subject: [R] use subset to trim data but include last per category
> 
> Hello,
> 
> I bumped into the following funny use-case. I have too much data for a given plot. I have
> the following data frame df:
> 
> > str(df)
> 'data.frame':	5015 obs. of  5 variables:
>  $ n          : Factor w/ 5 levels "1000","2000",..: 1 1 1 1 1 1 1 1 1 1 ...
>  $ iter       : int  10 20 30 40 50 60 70 80 90 100 ...
>  $ Error      : num  1.05e-02 1.24e-03 3.67e-04 1.08e-04 4.05e-05 ...
>  $ Duality_Gap: num  20080 3789 855 443 321 ...
>  $ Runtime    : num  0.00536 0.01353 0.01462 0.01571 0.01681 ...
> 
> But if I plot e.g. Runtime vs log(Duality Gap) I have too many observations due to taking a
> snapshot every 10 iterations rather than say 500 and the plot looks very cluttered. So I
> would like to trim the data frame including only those records for which iter is multiple of
> 500 and so I do this:
> 
> df <- subset(df, iter %% 500 == 0)
> 
> This gives me almost exactly what I need except that the last and most important Duality
> Gap observations are of course gone due to the filtering ... I would like to change the
> subset clause to be iter %% 500 _or_ the record is the last per n (n is my problem size and
> category in this case) ... how can I do that?
> 
> I thought of adding a new column that flags whether a given row is the last element per
> category as "last" Boolean but this is a bit too complicated .. is there a simpler condition
> construct that can be used with the subset command?
> 
> TIA,
> Best regards,
> Giovanni
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list