[R] Corrected - R 3.0.2 How to Split-Apply-Combine using various Columns

Jeff Newmiller jdnewmil at dcn.davis.CA.us
Sat Jan 25 22:38:13 CET 2014


Sorry, messed up the second ddply example:

dta3 <- ddply( dta2, c("slot","classf"), function(DF){data.frame( Total=nrow(DF),
MeanVelocity= mean( DF$TimeMeanVelocity ) ) } )

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:
>While you seem to be making some progress in communicating your
>problem, the format is still HTML (so it is a mess) and the subject and
>approach of the question are still a poor fit for this list. We are not
>here to DO your work for you, yet you seem to have an overly long list
>of "needs" that suggest you want a complete solution. What you should
>be looking for here are suggestions for how to solve pieces of this
>task so that you can do the work of creating your own solution.
>Some tools that I find useful for this kind of problem are the cut
>function, the plyr package, and the reshape2 package. Others might find
>the aggregate function or the sqldf package or the datatable package or
>the new dplyr package helpful. Each function and package has
>documentation with examples that you should read before using them
>(e.g. ?cut).
>
>Some example calculations are (with dta as your sample data frame):
>
>library(plyr)
>dta$slot <- cut( dta$frame, seq(22,9322,300))
>dta$classf <- factor(dta$class, levels=1:3,
>labels=c("motorcycle","car","truck"))
>dta2 <- ddply( dta, c("slot","classf","vehicle"),
>function(DF){data.frame( TimeMeanVelocity=mean(DF$velocity) ) } )
>dta3 <- ddply( dta2, c("slot","classf"), function(DF){data.frame(
>MeanVelocity=mean( Total=nrow(DF), DF$TimeMeanVelocity ) ) } )
>
>Then you need to fold the total and mean velocity into wide form using
>the dcast function from the reshape2 package (read the documentation)
>and merge them with the merge or cbind functions.
>
>Good luck, and keep working on making your questions clear.
>---------------------------------------------------------------------------
>Jeff Newmiller                        The     .....       .....  Go
>Live...
>DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
>Go...
>                                     Live:   OO#.. Dead: OO#..  Playing
>Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>/Software/Embedded Controllers)               .OO#.       .OO#. 
>rocks...1k
>---------------------------------------------------------------------------
>
>Sent from my phone. Please excuse my brevity.
>
>umair durrani <umairdurrani at outlook.com> wrote:
>>Hello everyone,Here is the version using dput. I am sorry for the junk
>>I posted before. I have a large vehicle trajectory data of which
>>following is a small part:  
>>structure(list(vehicle = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,2L, 2L),
>>frame = c(221L, 222L, 223L, 224L, 115L, 116L, 117L, 118L, 119L, 120L,
>>121L), globalx = c(6451259.685, 6451261.244, 6451262.831, 6451264.362,
>>6451181.179, 6451183.532, 6451185.884, 6451188.237, 6451190.609,
>>6451192.912, 6451195.132), class = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>>2L, 2L, 2L), velocity = c(23.37, 23.16, 22.94, 22.85, 35, 35.01,
>35.03,
>>34.92, 34.49, 33.66, 32.5), lane = c(5L, 5L, 5L, 5L, 4L, 4L, 4L, 4L,
>>4L, 4L, 4L)), .Names = c("vehicle", "frame", "globalx", "class",
>>"velocity", "lane"), row.names = c(85L, 86L, 87L, 88L, 447L, 448L,
>>449L, 450L, 451L, 452L, 453L), class = "data.frame")
>>Explanation of Columns:vehicle = unique ID of vehicle. It is repeated
>>(in column) for every frame in which it was observed;frame= ID of the
>>frame in which the vehicle was observed. One frame is 0.1 seconds
>>long;class = class of vehicle i.e. 1=motorcycle, 2=car,
>>3=truck;velocity= velocity of vehicle in feet per second;lane= lane
>>number in which vehicle is present in a particular frame;
>>
>>'frame' number can also repeat e.g. in frame 120 the example data
>shows
>>vehicle 2 was observed but in the original data many more vehicles
>>might have been observed in this frame. Similarly, 'class' is defined
>>above and all three classes are present in the original data (here
>>example data only shows classes 2 and 3 i.e. cars and trucks).
>>I need to determine two things:1) Number of vehicles observed in every
>>30 seconds i.e. 300 frames 2) Average velocity of each vehicle class
>in
>>every 30 seconds
>>> This means that the first step might be to determine the minimum and
>>maximum frame numbers and then divide them in slots so that every slot
>>has 300 frames. In my original data I found 22 as min and 9233 as max
>>frame number. This makes 30 time slots as 22-322, 322-622, ...,
>>9022-9233. I need following columns in one table as an output (note
>>that Timeslot column should contain the time intervals as described
>>before): TimeSlot, Total-Cars, Total-Trucks, Total-Motorcycles,
>>MeanVelocity-Cars, MeanVelocity-Trucks, MeanVelocity-Motorcycles
>>
>>
>> 		 	   		   		 	   		  
>>	[[alternative HTML version deleted]]
>>
>>______________________________________________
>>R-help at r-project.org mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list