[R] converting stata's by syntax to R

Mon Aug 1 17:02:56 CEST 2005

try
> attach(dat)
> dat<-dat[order(fam,wt),]
#sort the data ,as the stata's byable command does
> lis<-by(dat,fam,function(x) x[length(x$fam),])
#equall your stata command ,but return a list.
> do.call(rbind,lis)
#to make the list to be a matrix-like result.
  fam  wt keep
1   1 1.0    1
2   2 1.0    1
3   3 0.4    0
4   4 0.4    0

	
======= 2005-08-01 22:24:27 ÄúÔÚÀ´ÐÅÖÐÐ´µÀ£º=======

>I am struggling with migrating some stata code to R.  I have a data
>frame containing, sometimes, repeat observations (rows) of the same
>family.  I want to keep only one observation per family, selecting
>that observation according to some other variable.  An example data
>frame is:
>
># construct example data
>fam <- c(1,2,3,3,4,4,4)
>wt <- c(1,1,0.6,0.4,0.4,0.4,0.2)
>keep <- c(1,1,1,0,1,0,0)
>dat <- as.data.frame(cbind(fam,wt,keep))
>dat
>
>I want to keep the observation for which wt is a maximum, and where
>this doesn't identify a unique observation, to keep just one anyway,
>not caring which.  Those observations are indicated above by keep==1.
>(Note, keep <- c(1,1,1,0,0,1,0) would be fine too, but not
>c(1,1,1,0,0,0,1)).
>
>The stata code I would use is
>bys fam (wt): keep if _n==_N
>
>This is my (long-winded) attempt in R:
>
># first keep those rows where wt=max_fam(wt)
>maxwt <- by(dat,dat$fam,function(x) max(x[,2]))
>maxwt <- sapply(maxwt,"[[",1)
>maxwt.dat <- data.frame("maxwt"=maxwt,"fam"=as.integer(names(maxwt)))
>dat <- merge(dat,maxwt.dat)
>dat <- dat[dat$wt==dat$maxwt,]
>dat
>
>Now I am stuck - I want to keep either row with fam==4, and have tried
>playing around with combinations of sample and apply or by, but with
>no success.  I can only find an inefficient for-loop solution:
>      
># identify those rows with >1 observation
>more <- by(dat,dat$fam,function(x) dim(x)[1])
>more <- sapply(more,"[[",1)
>more.dat <- data.frame("more"=more,"fam"=as.integer(names(more)))
>dat <- merge(dat,more.dat)
>
># sample from those for whom more>1
>result<-dat[dat$more==1,]
>for(f in unique(dat$fam[dat$more>1])) {
>  rows <- rownames(dat[dat$fam==f,])
>  result <- rbind(result,dat[sample(rows,1),])
>}
>result
>
>I am sure that for something so simple in stata to be so complicated
>in R must indicate ignorance of R on my part, but searches of help
>files and RSiteSearch hasn't led to any better solution.
>
>Any suggestions would be most helpful! Thanks, C.
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

= = = = = = = = = = = = = = = = = = = =
			

2005-08-01

------
Deparment of Sociology
Fudan University

Blog:http://sociology.yculblog.com