[R] Conditional looping over a set of variables in R
William Dunlap
wdunlap at tibco.com
Fri Oct 22 18:52:29 CEST 2010
You were a bit vague about the format of your data.
I'm assuming all columns were numeric and the entries
are one of 0, 1, and NA (missing value). I made a
little function to generate random data of that format
for testing purposes:
makeData <- function (nrow = 1500, ncol = 140, pMissing = 0.1)
{
# pMissing if proportion of missing values
m <- matrix(sample(c(1, 0), size = nrow * ncol, replace = TRUE),
nrow, ncol)
m[runif(nrow * ncol) < pMissing] <- NA
data.frame(m)
}
E.g.,
> set.seed(168)
> d <- makeData(15,3)
> d
X1 X2 X3
1 1 1 1
2 0 0 NA
3 0 1 0
4 0 0 NA
5 0 1 1
6 0 0 NA
7 1 0 0
8 0 1 1
9 0 0 1
10 1 1 NA
11 0 0 1
12 0 0 0
13 NA NA NA
14 0 0 0
15 1 0 0
I think the following function does what you want.
The algorithm is pretty similar to what you showed.
columnOfFirstOne <- function(data) {
# col will be return value, one entry per row of data.
# Fill it with NA's: NA in output will mean there were no 1's in
row
col <- rep(as.integer(NA), nrow(data))
for (j in seq_len(ncol(data))) { # loop over columns
# For each entry in 'col', if it has not been set yet
# and this entry the j'th column of data is 1 (and not
missing)
# then set to the column number.
col[is.na(col) & !is.na(data[, j]) & data[, j] == 1] <- j
}
col # return this from function
}
With the above data we get
> columnOfFirstOne(d)
[1] 1 NA 2 NA 2 NA 1 2 3 1 3 NA NA NA 1
It seems quick enough for a dataset of your size
> dd <- makeData(nrow=1500, ncol=140)
> system.time(columnOfFirstOne(dd)) # time in seconds
user system elapsed
0.08 0.00 0.08
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of David Herzberg
> Sent: Friday, October 22, 2010 8:34 AM
> To: r-help at r-project.org
> Subject: [R] Conditional looping over a set of variables in R
>
> Here's the problem I'm trying to solve in R: I have a data
> frame that consists of about 1500 cases (rows) of data from
> kids who took a test of listening comprehension. The columns
> are their scores (1 = correct, 0 = incorrect, . = missing)
> on 140 test items. The items are numbered sequentially and
> are ordered by increasing difficulty as you go from left to
> right across the columns. I want R to go through the data and
> find the first correct response for each case. Because of
> basal and ceiling rules, many cases have missing data on many
> items before the first correct response appears.
>
> For each case, I want R to evaluate the item responses
> sequentially starting with item 1. If the score is 0 or
> missing, proceed to the next item and evaluate it. If the
> score is 1, stop the operation for that case, record the item
> number of that first correct response in a new variable,
> proceed to the next case, and restart the operation.
>
> In SPSS, this operation would be carried out with LOOP,
> VECTOR, and DO IF, as follows (assuming the data set is
> already loaded):
>
> * DECLARE A NEW VARIABLE TO HOLD THE ITEM NUMBER OF THE FIRST
> CORRECT RESPONSE, SET IT EQUAL TO 0.
> numeric LCfirst1.
> comp LCfirst1 = 0
>
> * DECLARE A VECTOR TO HOLD THE 140 ITEM RESPONSE VARIABLES.
> vector x=LC1a_score to LC140a_score.
>
> * SET UP A LOOP THAT WILL RUN FROM 1 TO 140, AS LONG AS
> LCfirst1 = 0. "#i" IS AN INDEX VARIABLE THAT INCREASES BY 1
> EACH TIME THE LOOP RUNS.
> loop #i=1 to 140 if (LCfirst1 = 0).
>
> * SET UP A CONDITIONAL TRANSFORMATION THAT IS EVALUATED FOR
> EACH ELEMENT OF THE VECTOR. THUS, WHEN #i = 1, THE
> EXPRESSION EVALUATES THE FIRST ELEMENT OF THE VECTOR (THAT
> IS, THE FIRST OF THE 140 ITEM RESPONSES). AS THE LOOP RUNS
> AND #i INCREASES, SUBSEQUENT VECTOR ELELMENTS ARE EVALUATED.
> THE do if STATEMENT RETAINS CONTROL AND KEEPS LOOPING THROUGH
> THE VECTOR UNTIL A '1' IS ENCOUNTERED.
> + do if x(#i) = 1.
>
> * WHEN A '1' IS ENCOUNTERED, CONTROL PASSES TO THE NEXT
> STATEMENT, WHICH RECODES THE VALUE OF THAT VECTOR ELEMENT TO '99'.
> + comp x(#i) = 99.
>
> * AND THEN CONTROL PASSES TO THE NEXT STATEMENT, WHICH
> RECODES THE VALUE OF LCfirst1 TO THE CURRENT INDEX VALUE,
> THUS CAPTURING THE ITEM NUMBER OF THE FIRST CORRECT RESPONSE
> FOR THAT CASE. CHANGING THE VALUE OF LCfirst1 ALSO CAUSE S
> THE LOOP TO STOP EXECUTING FOR THAT CASE, AND THE PROGRAM
> MOVES TO THE NEXT CASE AND RESTARTS THE LOOP.
> + comp LCfirst1 = #i.
> + end if.
> end loop.
> exe.
>
> After several hours of trying to translate this procedure to
> R, I'm stumped. I played around with creating a list to hold
> the item responses variables (analogous to 'vector' in SPSS),
> but when I tried to use the list in an R procedure, I kept
> getting a warning along the lines of 'the list contains > 1
> element, only the first element will be used'. So perhaps a
> list is not the appropriate class to 'hold' these variables?
>
> It seems that some nested arrangement of 'for' 'while' and/or
> 'lapply' will allow me to recreate the operation described
> above? How do I set up the indexing operation analogous to
> 'loop #i' in SPSS?
>
> Any help is appreciated, and I'm happy to provide more
> information if needed.
>
> David S. Herzberg, Ph.D.
> Vice President, Research and Development
> Western Psychological Services
> 12031 Wilshire Blvd.
> Los Angeles, CA 90025-1251
> Phone: (310)478-2061 x144
> FAX: (310)478-7838
> email: davidh at wpspublish.com
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list