[R] Conditional looping over a set of variables in R
Petr PIKAL
petr.pikal at precheza.cz
Tue Oct 26 08:41:24 CEST 2010
Hi
r-help-bounces at r-project.org napsal dne 25.10.2010 20:41:55:
> Adrienne, there's one glitch when I implement your solution below. When
the
> loop encounters a case with no data at all (that is, all 140 item
responses
> are missing), it aborts and prints this error message: " ERROR: argument
is
> of length zero".
>
> I wonder if there's a logical condition I could add that would enable R
to
> skip these empty cases and continue executing on the next case that
contains data.
>
> Thanks, Dave
>
> David S. Herzberg, Ph.D.
> Vice President, Research and Development
> Western Psychological Services
> 12031 Wilshire Blvd.
> Los Angeles, CA 90025-1251
> Phone: (310)478-2061 x144
> FAX: (310)478-7838
> email: davidh at wpspublish.com
>
>
>
> From: wootten.adrienne at gmail.com [mailto:wootten.adrienne at gmail.com] On
Behalf
> Of Adrienne Wootten
> Sent: Friday, October 22, 2010 9:09 AM
> To: David Herzberg
> Cc: r-help at r-project.org
> Subject: Re: [R] Conditional looping over a set of variables in R
>
> David,
>
> here I'm referring to your data as testmat, a matrix of 140 columns and
1500
> rows, but the same or similar notation can be applied to data frames in
R. If
> I understand correctly, you are looking for the first response (column)
where
> you got a value of 1. I'm assuming also that since your missing values
are
> characters then your two numeric values are also characters. keeping
all this
> in mind, try something like this.
If you really only want to know which column in each row has first
occurrence of 1 (or any other value) you can get rid of looping and use
other R capabilities.
> set.seed(111)
> mat<-matrix(sample(1:3, 20, replace=T),5,4)
> mat
[,1] [,2] [,3] [,4]
[1,] 2 2 2 2
[2,] 3 1 2 1
[3,] 2 2 1 3
[4,] 2 2 1 1
[5,] 2 1 1 2
> mat.w<-which(mat==1, arr.ind=T)
> tapply(mat.w[,2], mat.w[,1], min)
2 3 4 5
2 3 3 2
> mat[2, ]<-NA
> mat
[,1] [,2] [,3] [,4]
[1,] 2 2 2 2
[2,] NA NA NA NA
[3,] 2 2 1 3
[4,] 2 2 1 1
[5,] 2 1 1 2
and this approach smoothly works with NA values too
> mat.w<-which(mat==1, arr.ind=T)
> tapply(mat.w[,2], mat.w[,1], min)
3 4 5
3 3 2
You can then use modify such output as you have info about columns and
rows. I am sure there are other maybe better options, e.g.
lll<-as.list(as.data.frame(t(mat)))
> unlist(lapply(lll, function(x) min(which(x==1))))
V1 V2 V3 V4 V5
Inf Inf 3 3 2
Regards
Petr
>
> first = c() # your extra variable which will eventually contain the
first
> correct response for each case
>
> for(i in 1:nrow(testmat)){
>
> c = 1
>
> while( c<=ncol(testmat) | testmat[i,c] != "1" ){
>
> if( testmat[i,c] == "1"){
>
> first[i] = c
> break # will exit the while loop once it finds the first correct answer,
and
> then jump to the next case
>
> } else {
>
> c=c+1 # procede to the next column if not
>
> }
>
> }
>
> }
>
>
> Hope this helps you out a bit.
>
> Adrienne Wootten
> NCSU
>
> On Fri, Oct 22, 2010 at 11:33 AM, David Herzberg <davidh at wpspublish.com<
> mailto:davidh at wpspublish.com>> wrote:
> Here's the problem I'm trying to solve in R: I have a data frame that
consists
> of about 1500 cases (rows) of data from kids who took a test of
listening
> comprehension. The columns are their scores (1 = correct, 0 = incorrect,
. =
> missing) on 140 test items. The items are numbered sequentially and are
> ordered by increasing difficulty as you go from left to right across the
> columns. I want R to go through the data and find the first correct
response
> for each case. Because of basal and ceiling rules, many cases have
missing
> data on many items before the first correct response appears.
>
> For each case, I want R to evaluate the item responses sequentially
starting
> with item 1. If the score is 0 or missing, proceed to the next item and
> evaluate it. If the score is 1, stop the operation for that case, record
the
> item number of that first correct response in a new variable, proceed to
the
> next case, and restart the operation.
>
> In SPSS, this operation would be carried out with LOOP, VECTOR, and DO
IF, as
> follows (assuming the data set is already loaded):
>
> * DECLARE A NEW VARIABLE TO HOLD THE ITEM NUMBER OF THE FIRST CORRECT
> RESPONSE, SET IT EQUAL TO 0.
> numeric LCfirst1.
> comp LCfirst1 = 0
>
> * DECLARE A VECTOR TO HOLD THE 140 ITEM RESPONSE VARIABLES.
> vector x=LC1a_score to LC140a_score.
>
> * SET UP A LOOP THAT WILL RUN FROM 1 TO 140, AS LONG AS LCfirst1 = 0.
"#i" IS
> AN INDEX VARIABLE THAT INCREASES BY 1 EACH TIME THE LOOP RUNS.
> loop #i=1 to 140 if (LCfirst1 = 0).
>
> * SET UP A CONDITIONAL TRANSFORMATION THAT IS EVALUATED FOR EACH ELEMENT
OF
> THE VECTOR. THUS, WHEN #i = 1, THE EXPRESSION EVALUATES THE FIRST
ELEMENT OF
> THE VECTOR (THAT IS, THE FIRST OF THE 140 ITEM RESPONSES). AS THE LOOP
RUNS
> AND #i INCREASES, SUBSEQUENT VECTOR ELELMENTS ARE EVALUATED. THE do if
> STATEMENT RETAINS CONTROL AND KEEPS LOOPING THROUGH THE VECTOR UNTIL A
'1' IS
> ENCOUNTERED.
> + do if x(#i) = 1.
>
> * WHEN A '1' IS ENCOUNTERED, CONTROL PASSES TO THE NEXT STATEMENT, WHICH
> RECODES THE VALUE OF THAT VECTOR ELEMENT TO '99'.
> + comp x(#i) = 99.
>
> * AND THEN CONTROL PASSES TO THE NEXT STATEMENT, WHICH RECODES THE VALUE
OF
> LCfirst1 TO THE CURRENT INDEX VALUE, THUS CAPTURING THE ITEM NUMBER OF
THE
> FIRST CORRECT RESPONSE FOR THAT CASE. CHANGING THE VALUE OF LCfirst1
ALSO
> CAUSE S THE LOOP TO STOP EXECUTING FOR THAT CASE, AND THE PROGRAM MOVES
TO THE
> NEXT CASE AND RESTARTS THE LOOP.
> + comp LCfirst1 = #i.
> + end if.
> end loop.
> exe.
>
> After several hours of trying to translate this procedure to R, I'm
stumped. I
> played around with creating a list to hold the item responses variables
> (analogous to 'vector' in SPSS), but when I tried to use the list in an
R
> procedure, I kept getting a warning along the lines of 'the list
contains > 1
> element, only the first element will be used'. So perhaps a list is not
the
> appropriate class to 'hold' these variables?
>
> It seems that some nested arrangement of 'for' 'while' and/or 'lapply'
will
> allow me to recreate the operation described above? How do I set up the
> indexing operation analogous to 'loop #i' in SPSS?
>
> Any help is appreciated, and I'm happy to provide more information if
needed.
>
> David S. Herzberg, Ph.D.
> Vice President, Research and Development
> Western Psychological Services
> 12031 Wilshire Blvd.
> Los Angeles, CA 90025-1251
> Phone: (310)478-2061 x144
> FAX: (310)478-7838
> email: davidh at wpspublish.com<mailto:davidh at wpspublish.com>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org<mailto:R-help at r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list