[R] Corrected - R 3.0.2 How to Split-Apply-Combine using various Columns

Jeff Newmiller jdnewmil at dcn.davis.CA.us
Sat Jan 25 22:28:31 CET 2014


While you seem to be making some progress in communicating your problem, the format is still HTML (so it is a mess) and the subject and approach of the question are still a poor fit for this list. We are not here to DO your work for you, yet you seem to have an overly long list of "needs" that suggest you want a complete solution. What you should be looking for here are suggestions for how to solve pieces of this task so that you can do the work of creating your own solution.
Some tools that I find useful for this kind of problem are the cut function, the plyr package, and the reshape2 package. Others might find the aggregate function or the sqldf package or the datatable package or the new dplyr package helpful. Each function and package has documentation with examples that you should read before using them (e.g. ?cut).

Some example calculations are (with dta as your sample data frame):

library(plyr)
dta$slot <- cut( dta$frame, seq(22,9322,300))
dta$classf <- factor(dta$class, levels=1:3, labels=c("motorcycle","car","truck"))
dta2 <- ddply( dta, c("slot","classf","vehicle"), function(DF){data.frame( TimeMeanVelocity=mean(DF$velocity) ) } )
dta3 <- ddply( dta2, c("slot","classf"), function(DF){data.frame( MeanVelocity=mean( Total=nrow(DF), DF$TimeMeanVelocity ) ) } )

Then you need to fold the total and mean velocity into wide form using the dcast function from the reshape2 package (read the documentation) and merge them with the merge or cbind functions.

Good luck, and keep working on making your questions clear.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

umair durrani <umairdurrani at outlook.com> wrote:
>Hello everyone,Here is the version using dput. I am sorry for the junk
>I posted before. I have a large vehicle trajectory data of which
>following is a small part:  
>structure(list(vehicle = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,2L, 2L),
>frame = c(221L, 222L, 223L, 224L, 115L, 116L, 117L, 118L, 119L, 120L,
>121L), globalx = c(6451259.685, 6451261.244, 6451262.831, 6451264.362,
>6451181.179, 6451183.532, 6451185.884, 6451188.237, 6451190.609,
>6451192.912, 6451195.132), class = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>2L, 2L, 2L), velocity = c(23.37, 23.16, 22.94, 22.85, 35, 35.01, 35.03,
>34.92, 34.49, 33.66, 32.5), lane = c(5L, 5L, 5L, 5L, 4L, 4L, 4L, 4L,
>4L, 4L, 4L)), .Names = c("vehicle", "frame", "globalx", "class",
>"velocity", "lane"), row.names = c(85L, 86L, 87L, 88L, 447L, 448L,
>449L, 450L, 451L, 452L, 453L), class = "data.frame")
>Explanation of Columns:vehicle = unique ID of vehicle. It is repeated
>(in column) for every frame in which it was observed;frame= ID of the
>frame in which the vehicle was observed. One frame is 0.1 seconds
>long;class = class of vehicle i.e. 1=motorcycle, 2=car,
>3=truck;velocity= velocity of vehicle in feet per second;lane= lane
>number in which vehicle is present in a particular frame;
>
>'frame' number can also repeat e.g. in frame 120 the example data shows
>vehicle 2 was observed but in the original data many more vehicles
>might have been observed in this frame. Similarly, 'class' is defined
>above and all three classes are present in the original data (here
>example data only shows classes 2 and 3 i.e. cars and trucks).
>I need to determine two things:1) Number of vehicles observed in every
>30 seconds i.e. 300 frames 2) Average velocity of each vehicle class in
>every 30 seconds
>> This means that the first step might be to determine the minimum and
>maximum frame numbers and then divide them in slots so that every slot
>has 300 frames. In my original data I found 22 as min and 9233 as max
>frame number. This makes 30 time slots as 22-322, 322-622, ...,
>9022-9233. I need following columns in one table as an output (note
>that Timeslot column should contain the time intervals as described
>before): TimeSlot, Total-Cars, Total-Trucks, Total-Motorcycles,
>MeanVelocity-Cars, MeanVelocity-Trucks, MeanVelocity-Motorcycles
>
>
> 		 	   		   		 	   		  
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list