[R] help incorporating data subset lengths in function with ddply
Jeff Newmiller
jdnewmil at dcn.davis.ca.us
Thu Apr 17 03:18:23 CEST 2014
Note that ddply is a heavyweight solution, and as your data gets larger
you may find that using it for little things like this hits performance.
Also, "df" is a base function that you might actually want to use someday,
and you also introduce confusion in the mind of someone reading your code
if you redefine it this way.
existingdf <- read.csv( text=
"storm,Q_time,Q
s1,2008-08-07 21:15:00,0.000
s1,2008-08-07 21:16:00,3.020
s1,2008-08-07 21:17:00,6.041
s1,2008-08-07 21:18:00,9.061
s1,2008-08-07 21:19:00,12.082
s1,2008-08-07 21:20:00,15.102
s1,2008-08-07 21:21:00,18.123
s1,2008-08-07 21:22:00,11.143
s1,2008-08-07 21:23:00,0.000
s2,2010-10-05 21:00:00,0.000
s2,2010-10-05 21:01:00,1.812
s2,2010-10-05 21:02:00,3.625
s2,2010-10-05 21:03:00,5.437
s2,2010-10-05 21:04:00,7.249
s2,2010-10-05 21:05:00,9.061
s2,2010-10-05 21:06:00,0.874
s2,2010-10-05 21:07:00,0.000
", as.is=TRUE )
library(plyr)
# plyr solution
newdf <- ddply( existingdf
, "storm"
, function( DF ) {
transform( DF
, duration=seq.int( length.out=nrow( DF ) ) )
}
)
# base R solution
newdf2 <- transform( existingdf
, duration=ave( rep( 1, nrow(existingdf) )
, storm
, FUN=cumsum ) )
On Wed, 16 Apr 2014, Steve E. wrote:
> Dear R Community,
>
> I am having some trouble with a task that I hope you might be able to help
> with. I have a dataset that includes the time and corresponding stream
> discharge from numerous storms (example of structure with simplified data
> below). I would like to produce a field that details the duration of each
> storm, where each storm is a subset of the data and the duration runs from
> zero to end for each unique storm. I have been trying to accomplish this
> with ddply but to no avail as I am unable to provide ddply (e.g., below)
> with the length of the storm (i.e., subset of data). Thank you in advance,
> any help would be appreciated.
>
>
> existing df:
> storm,Q_time,Q
> s1,2008-08-07 21:15:00,0.000
> s1,2008-08-07 21:16:00,3.020
> s1,2008-08-07 21:17:00,6.041
> s1,2008-08-07 21:18:00,9.061
> s1,2008-08-07 21:19:00,12.082
> s1,2008-08-07 21:20:00,15.102
> s1,2008-08-07 21:21:00,18.123
> s1,2008-08-07 21:22:00,11.143
> s1,2008-08-07 21:23:00,0.000
> s2,2010-10-05 21:00:00,0.000
> s2,2010-10-05 21:01:00,1.812
> s2,2010-10-05 21:02:00,3.625
> s2,2010-10-05 21:03:00,5.437
> s2,2010-10-05 21:04:00,7.249
> s2,2010-10-05 21:05:00,9.061
> s2,2010-10-05 21:06:00,0.874
> s2,2010-10-05 21:07:00,0.000
>
> desired df:
> storm,Q_time,Q, duration
> s1,2008-08-07 21:15:00,0.000,1
> s1,2008-08-07 21:16:00,3.020,2
> s1,2008-08-07 21:17:00,6.041,3
> s1,2008-08-07 21:18:00,9.061,4
> s1,2008-08-07 21:19:00,12.082,5
> s1,2008-08-07 21:20:00,15.102,6
> s1,2008-08-07 21:21:00,18.123,7
> s1,2008-08-07 21:22:00,11.143,8
> s1,2008-08-07 21:23:00,0.000,9
> s2,2010-10-05 21:00:00,0.000,1
> s2,2010-10-05 21:01:00,1.812,2
> s2,2010-10-05 21:02:00,3.625,3
> s2,2010-10-05 21:03:00,5.437,4
> s2,2010-10-05 21:04:00,7.249,5
> s2,2010-10-05 21:05:00,9.061,6
> s2,2010-10-05 21:06:00,0.874,7
> s2,2010-10-05 21:07:00,0.000,8
>
> I have been trying variations of the following statement, but I cannot seem
> to get the length of the subset correct as I receive an error of the type
> 'Error: arguments imply differing number of rows: 2401, 0'.
>
> newdf <- ddply(df, "storm", transform, FUN = function(x)
> {duration=seq(from=1, by=1, length.out=nrow(x))})
>
> I would really like to get a handle on ddply in this instance as it will be
> quite helpful for many other similar calculations that I need to do with
> this dataset.
>
> Thanks again,
> Stevan
>
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/help-incorporating-data-subset-lengths-in-function-with-ddply-tp4688926.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
More information about the R-help
mailing list