[R] First. Last. Data row selection

Sharpie chuck at sharpsteen.net
Tue Feb 23 20:51:38 CET 2010



wookie1976 wrote:
> 
> I am in the process of switching from SAS over to R.  I am working on very
> large CSV datasets that contain vehicle information.  As I am processing
> the data, I need to select the first (or sometimes the second) record (by
> date) for any records that have the same license plate number.  In SAS,
> there is a function called 'first.' that can be used on sorted datasets to
> pull out those first entries for each occurrence of a particular variable 
> (in this case the variable is 'license plate') found in the data.  I have
> spent some time looking around and cannot seem to find an equivalent
> function in R.  Can anyone recommend an efficient technique that would
> pull this off?  I assume the database must first be sorted by vehicle
> plate and date, and then apply the filter or function.  Any help would be
> greatly appreciated.  
> 
> Thanks, Joe
> 

For the selection of first and last elements from a list, data frame or
matrices, look at the head() or tail() functions.  The split() function can
be used to subset a data.frame into smaller collections based on factors
such as the year or license plate.

There is a way to combine the effects of split() with another function such
as head() using the base function by() or a function like ddply() from
Hadley's plyr package.  To give an example, I would require some example
data (preferable pasted as the output from dput(), tabularized data tends to
get mangled in email and requires reprocessing and reformatting before it
can be loaded as an R object).

-Charlie
-- 
View this message in context: http://n4.nabble.com/First-Last-Data-row-selection-tp1566260p1566418.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list