[R] how to select the first observation only?

William Dunlap wdunlap at tibco.com
Thu Apr 22 05:41:19 CEST 2010


> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of gallon li
> Sent: Wednesday, April 21, 2010 7:18 PM
> To: r-help
> Subject: [R] how to select the first observation only?
> 
> Dear r-helpers,
> 
> I have a very simple question. Suppose my data is like
> 
> id=c(rep(1,2),rep(2,2))
> b=c(2,3,4,5)
> m=cbind(id,b)
> 
> > m
>      id b
> [1,]  1 2
> [2,]  1 3
> [3,]  2 4
> [4,]  2 5
> I wish to select the first observation for each id. That is, I want to
> quickly select two rows:
> 
> id b
> 1 2
> 2 4

The following will quickly select the first row
in each run of identical 'id's.  If your data
is sorted by 'id' then it solves your problem.

  > isFirstInRun <- function(x) c(TRUE, x[-1] != x[-length(x)])
  > m[ isFirstInRun(m[,"id"]), , drop=FALSE]
       id b
  [1,]  1 2
  [2,]  2 4

If the 'id' column contains NA's then you need
to decide how a run of NA's should be handled.
E.g., turning it into a factor with an NA in the
levels:
  m[ isFirstInRun(factor(m[,"id"], exclude=NULL)), ]
will select the first in a run of NA's and
  isNaOrTrue <- function(x) is.na(x) | x
  m[ isNaOrTrue(isFirstInRun(m[,"id"])), ]
will treat each NA in 'id' as a unique value (a run
of length 1).

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

> 
> only. how should i do this?
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



More information about the R-help mailing list