[R] Conditional looping over a set of variables in R
Peter Ehlers
ehlers at ucalgary.ca
Sun Oct 24 17:17:33 CEST 2010
This won't be as quick as Bill's elegant solution, but it's a one-liner:
apply(d, 1, function(x), match(1, x))
See ?match.
-Peter Ehlers
On 2010-10-22 10:36, David Herzberg wrote:
> Bill, thanks so much for this. I'll get a chance to test it later today, and will post the outcome.
>
>
> David S. Herzberg, Ph.D.
> Vice President, Research and Development
> Western Psychological Services
> 12031 Wilshire Blvd.
> Los Angeles, CA 90025-1251
> Phone: (310)478-2061 x144
> FAX: (310)478-7838
> email: davidh at wpspublish.com
>
>
>
> -----Original Message-----
> From: William Dunlap [mailto:wdunlap at tibco.com]
> Sent: Friday, October 22, 2010 9:52 AM
> To: David Herzberg; r-help at r-project.org
> Subject: RE: [R] Conditional looping over a set of variables in R
>
> You were a bit vague about the format of your data.
> I'm assuming all columns were numeric and the entries are one of 0, 1, and NA (missing value). I made a little function to generate random data of that format for testing purposes:
>
> makeData<- function (nrow = 1500, ncol = 140, pMissing = 0.1) {
> # pMissing if proportion of missing values
> m<- matrix(sample(c(1, 0), size = nrow * ncol, replace = TRUE),
> nrow, ncol)
> m[runif(nrow * ncol)< pMissing]<- NA
> data.frame(m)
> }
>
> E.g.,
>
> > set.seed(168)
> > d<- makeData(15,3)
> > d
> X1 X2 X3
> 1 1 1 1
> 2 0 0 NA
> 3 0 1 0
> 4 0 0 NA
> 5 0 1 1
> 6 0 0 NA
> 7 1 0 0
> 8 0 1 1
> 9 0 0 1
> 10 1 1 NA
> 11 0 0 1
> 12 0 0 0
> 13 NA NA NA
> 14 0 0 0
> 15 1 0 0
>
> I think the following function does what you want.
> The algorithm is pretty similar to what you showed.
>
> columnOfFirstOne<- function(data) {
> # col will be return value, one entry per row of data.
> # Fill it with NA's: NA in output will mean there were no 1's in row
> col<- rep(as.integer(NA), nrow(data))
> for (j in seq_len(ncol(data))) { # loop over columns
> # For each entry in 'col', if it has not been set yet
> # and this entry the j'th column of data is 1 (and not
> missing)
> # then set to the column number.
> col[is.na(col)& !is.na(data[, j])& data[, j] == 1]<- j
> }
> col # return this from function
> }
>
> With the above data we get
> > columnOfFirstOne(d)
> [1] 1 NA 2 NA 2 NA 1 2 3 1 3 NA NA NA 1
>
> It seems quick enough for a dataset of your size
> > dd<- makeData(nrow=1500, ncol=140)
> > system.time(columnOfFirstOne(dd)) # time in seconds
> user system elapsed
> 0.08 0.00 0.08
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org
>> [mailto:r-help-bounces at r-project.org] On Behalf Of David Herzberg
>> Sent: Friday, October 22, 2010 8:34 AM
>> To: r-help at r-project.org
>> Subject: [R] Conditional looping over a set of variables in R
>>
>> Here's the problem I'm trying to solve in R: I have a data frame that
>> consists of about 1500 cases (rows) of data from kids who took a test
>> of listening comprehension. The columns are their scores (1 = correct,
>> 0 = incorrect, . = missing) on 140 test items. The items are numbered
>> sequentially and are ordered by increasing difficulty as you go from
>> left to right across the columns. I want R to go through the data and
>> find the first correct response for each case. Because of basal and
>> ceiling rules, many cases have missing data on many items before the
>> first correct response appears.
>>
>> For each case, I want R to evaluate the item responses sequentially
>> starting with item 1. If the score is 0 or missing, proceed to the
>> next item and evaluate it. If the score is 1, stop the operation for
>> that case, record the item number of that first correct response in a
>> new variable, proceed to the next case, and restart the operation.
>>
>> In SPSS, this operation would be carried out with LOOP, VECTOR, and DO
>> IF, as follows (assuming the data set is already loaded):
>>
>> * DECLARE A NEW VARIABLE TO HOLD THE ITEM NUMBER OF THE FIRST CORRECT
>> RESPONSE, SET IT EQUAL TO 0.
>> numeric LCfirst1.
>> comp LCfirst1 = 0
>>
>> * DECLARE A VECTOR TO HOLD THE 140 ITEM RESPONSE VARIABLES.
>> vector x=LC1a_score to LC140a_score.
>>
>> * SET UP A LOOP THAT WILL RUN FROM 1 TO 140, AS LONG AS
>> LCfirst1 = 0. "#i" IS AN INDEX VARIABLE THAT INCREASES BY 1 EACH TIME
>> THE LOOP RUNS.
>> loop #i=1 to 140 if (LCfirst1 = 0).
>>
>> * SET UP A CONDITIONAL TRANSFORMATION THAT IS EVALUATED FOR EACH
>> ELEMENT OF THE VECTOR. THUS, WHEN #i = 1, THE EXPRESSION EVALUATES
>> THE FIRST ELEMENT OF THE VECTOR (THAT IS, THE FIRST OF THE 140 ITEM
>> RESPONSES). AS THE LOOP RUNS AND #i INCREASES, SUBSEQUENT VECTOR
>> ELELMENTS ARE EVALUATED.
>> THE do if STATEMENT RETAINS CONTROL AND KEEPS LOOPING THROUGH THE
>> VECTOR UNTIL A '1' IS ENCOUNTERED.
>> + do if x(#i) = 1.
>>
>> * WHEN A '1' IS ENCOUNTERED, CONTROL PASSES TO THE NEXT STATEMENT,
>> WHICH RECODES THE VALUE OF THAT VECTOR ELEMENT TO '99'.
>> + comp x(#i) = 99.
>>
>> * AND THEN CONTROL PASSES TO THE NEXT STATEMENT, WHICH RECODES THE
>> VALUE OF LCfirst1 TO THE CURRENT INDEX VALUE, THUS CAPTURING THE ITEM
>> NUMBER OF THE FIRST CORRECT RESPONSE FOR THAT CASE. CHANGING THE VALUE
>> OF LCfirst1 ALSO CAUSE S THE LOOP TO STOP EXECUTING FOR THAT CASE, AND
>> THE PROGRAM MOVES TO THE NEXT CASE AND RESTARTS THE LOOP.
>> + comp LCfirst1 = #i.
>> + end if.
>> end loop.
>> exe.
>>
>> After several hours of trying to translate this procedure to R, I'm
>> stumped. I played around with creating a list to hold the item
>> responses variables (analogous to 'vector' in SPSS), but when I tried
>> to use the list in an R procedure, I kept getting a warning along the
>> lines of 'the list contains> 1 element, only the first element will
>> be used'. So perhaps a list is not the appropriate class to 'hold'
>> these variables?
>>
>> It seems that some nested arrangement of 'for' 'while' and/or 'lapply'
>> will allow me to recreate the operation described above? How do I set
>> up the indexing operation analogous to 'loop #i' in SPSS?
>>
>> Any help is appreciated, and I'm happy to provide more information if
>> needed.
>>
>> David S. Herzberg, Ph.D.
>> Vice President, Research and Development Western Psychological
>> Services
>> 12031 Wilshire Blvd.
>> Los Angeles, CA 90025-1251
>> Phone: (310)478-2061 x144
>> FAX: (310)478-7838
>> email: davidh at wpspublish.com
>>
More information about the R-help
mailing list