[R] Subset data in long format

Doran, Harold HDoran at air.org
Tue Jun 6 23:15:12 CEST 2006


Apologies, but there were some word wrap issues in the prior email it
seems. So, here is code for the sample data to avoid confusion 


tmp <- data.frame(id = 1:3, matrix(rnorm(30), ncol=10) )

long <- reshape(tmp, idvar='id', varying=list(names(tmp)[2:11]),
v.names=('item'),timevar='position' , direction='long')

long <- long[order(long$id) , ]

long <- long[c(-2,-13),]

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Doran, Harold
> Sent: Tuesday, June 06, 2006 5:08 PM
> To: r-help at stat.math.ethz.ch
> Subject: [R] Subset data in long format
> 
> I have data in a "long" format where each row is a student 
> and each student occupies multiple rows with multiple 
> observations. I need to subset these data based on a 
> condition which I am having difficulty defining. 
> 
> The dataset I am working with is large, but here is a simple 
> data structure to illustrate the issue
> 
> tmp <- data.frame(id = 1:3, matrix(rnorm(30), ncol=10) ) long 
> <- reshape(tmp, idvar='id', varying=list(names(tmp)[2:11]), 
> v.names=('item'),timevar='position' , direction='long') long 
> <- long[order(long$id) , ] long <- long[c(-2,-13),]
> 
> What I need to do is subset these data so I have the first 6 
> rows for each unique ID. The problem is that the data are 
> unbalanced in that each ID has a different number of 
> observations (which I why I removed obs 2 and 13).
> 
> If the data were balanced, the subset would be trivial and I 
> could just do
> 
> long <- subset(long, position < 7)
> 
> However, the data are not balanced. Consequently, if I were 
> to do this for the unbalanced data I would not have the first 
> 6 obs for the first ID. I would only have the first 5. 
> Theoretically, what I want for id1(and for each unique id) is this
> 
> ID1 <- subset(long, id==1)
> ID1[1:6,]
> 
> However, the goal is to subset the entire dataframe at once 
> such that the subset returns a new dataframe with the first 6 
> rows for each unique id. Is there a feasible method for doing 
> this subset that anyone can suggest? My actual dataset has 
> more than 24,000 unique ids, so I am hoping to avoid looping 
> through this if possible.
> 
> Thanks,
> Harold
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>



More information about the R-help mailing list