[R] use subset to trim data but include last per category
Giovanni Azua
bravegag at gmail.com
Sun Sep 9 17:13:50 CEST 2012
Hello,
I bumped into the following funny use-case. I have too much data for a given plot. I have the following data frame df:
> str(df)
'data.frame': 5015 obs. of 5 variables:
$ n : Factor w/ 5 levels "1000","2000",..: 1 1 1 1 1 1 1 1 1 1 ...
$ iter : int 10 20 30 40 50 60 70 80 90 100 ...
$ Error : num 1.05e-02 1.24e-03 3.67e-04 1.08e-04 4.05e-05 ...
$ Duality_Gap: num 20080 3789 855 443 321 ...
$ Runtime : num 0.00536 0.01353 0.01462 0.01571 0.01681 ...
But if I plot e.g. Runtime vs log(Duality Gap) I have too many observations due to taking a snapshot every 10 iterations rather than say 500 and the plot looks very cluttered. So I would like to trim the data frame including only those records for which iter is multiple of 500 and so I do this:
df <- subset(df, iter %% 500 == 0)
This gives me almost exactly what I need except that the last and most important Duality Gap observations are of course gone due to the filtering ... I would like to change the subset clause to be iter %% 500 _or_ the record is the last per n (n is my problem size and category in this case) ... how can I do that?
I thought of adding a new column that flags whether a given row is the last element per category as "last" Boolean but this is a bit too complicated .. is there a simpler condition construct that can be used with the subset command?
TIA,
Best regards,
Giovanni
More information about the R-help
mailing list