[R] Subset rows over multiple columns
Gabor Grothendieck
ggrothendieck at gmail.com
Fri Apr 14 01:34:50 CEST 2006
Try this:
tt2 <- tt
tt2[,1] <- as.character(tt2[,1])
tt2[,2] <- as.character(tt2[,2])
f <- function(x) with(tt2, mean(righta_a[x == itd_1 | x == itd_45]))
sapply(unique(unlist(tt2[,1:2])), f)
On 4/13/06, Doran, Harold <HDoran at air.org> wrote:
> I have a data frame where I need to subset certain rows before I compute
> the mean of another variable. However, the value that I need to subset
> by is found in multiple columns. For example, in the data below the
> value R0000160 is found in the first and second columns (itd_1 and
> itd_45). These data are student responses to multiple choice test items
> from a computer adaptive test. So, the variable itd_1 denotes that item
> i was presented to student k in position t and then the variable
> righta_a and righta_b denotes a correct (1) or incorrect response to
> that item when it was presented.
>
> My goal is to get the p-value (mean of the binary variable) for each
> item irrespective of when it was presented to the student.
>
> So, in the sample case below, I would use all elements in righta_a
> (except for the second to last) and then only the second to last value
> in righta_b.
>
> > tail(tt)
> itd_1 itd_45 righta_a righta_b
> 18407 R0000160 R0208470 1 0
> 18412 R0000160 R0238140 0 1
> 18417 R0000160 R0259690 1 1
> 18422 R0000160 R0000730 1 1
> 18450 R0113750 R0000160 1 1
> 18456 R0000160 R0238690 0 1
>
> One thing I can envision doing is using the reshape option such that
> itd_1 and itd_45 would be in the "long" format. This would cause for
> itd_1 and itd_45 to be stacked in a single column as well as righta_a
> and righta_b and then I could then use tapply and get what I need
> without any subsetting. That is
>
> testScores <- reshape(tt, idvar='id', varying=list(c('itd_1', 'itd_45'),
> c('righta_a', 'righta_b')), v.names=c('item','answer'),
> timevar='item_position', direction='long')
>
> with(testScores, tapply(answer, item, mean))
>
> Or I could get
>
> with(testScores, tapply(answer, list(item, position), mean))
>
> The only problem here is that I have some duplicate IDs in the data and
> reshape doesn't like turning data on its head in that situation, so I
> would need to tinker with those first.
>
> So, I have what I think would be a solution, I wonder if there is
> another way to preserve the data in this "wide" format and get the
> estimates I need? Maybe it is just easier to use reshape. Any
> suggestions?
>
> Harold
> Windows Xp
> R 2.2.1
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
More information about the R-help
mailing list