[R] recoding data with loops

Erik Iverson iverson at biostat.wisc.edu
Tue May 20 00:49:12 CEST 2008


Got it, I did not know of the 'recode' function in car.

So you would like to recode those specific columns then?  Once again, we 
can do it without a loop, this time with the help of a function called 
lapply, which applies a function to each item in a list in turn.

Try:

reverse_me_varnames <- c("HEQUAL", "HREVDIS1", "HREVDIS2")
reversed_varnames <-paste("R", reverse_me_varnames, sep = "")

## See ?paste

mdf[reversed_varnames] <-
   lapply(mdf[reverse_me_varnames],
          function(x) recode(x, recodes = "5:7=NA; 1=4; 2=3; 3=2; 4=1;",
                 as.factor.result = FALSE))

Now what does this actually mean?  To the left of '<-' is simply the new 
columns of our data.frame.  We want to then use lapply to do some 
function to a list of objects.  The first argument to lapply is that 
list.  In this case, it is simply the columns of the data.frame you want 
reversed.  A data.frame is a list in R.  See ?list and ?data.frame. 
Then, the next argument to lapply is a function that we want to perform 
on each element in our list.  So, we create a function that accepts as 
input a variable I simply call 'x'.  This 'x' is going to be an item 
from the list we passed lapply, which is one of the columns of mdf in 
'reverse_me_varnames'.

We then use the recode function in the car package to recode x, in a 
similar way to what you tried before.  This function of x we define will 
get called three times in the above example, once for each of 
reverse_me_varnames.  It will then assign those three new columns to the 
left-hand side of the <- operator, which are three newly-named columns.

To see why what you tried before did not work, with the for loop, try:

mdf$HEQUAL

contrasted with

t1 <- c("HEQUAL")
mdf$t1

 From the help for ?Extract, $ does not allow 'computed' indices.

I hope this helps!

Erik


Donald Braman wrote:
> Erik,
> 
> Your example was just what I needed to generate the data -- many, many 
> thanks!  The names() function was something I had not grasped fully. I 
> now have this and it works very nicely:
> 
> var_list <- c("HEQUAL", "EWEALTH", "ERADEQ", "HREVDIS1", "EDISCRIM", 
> "HREVDIS2")
> mdf <- data.frame(replicate(length(var_list), sample(7,100, replace = 
> TRUE))) ## generate random data
> names(mdf) ## default names
> names(mdf) <- var_list ## use our names
> mdf
> 
> I'm still trying to figure out how to recode (using the car package) 
> data into new variables using a similar loop. Basically, I'm not sure 
> how to call the variable name and append it to the dataframe name in a 
> loop.  In Stata I'd do this using single quotes, but clearly that's not 
> how R works.  I tried several variations on this:
> 
> reverse_me_varnames <- c("HEQUAL", "HREVDIS1", "HREVDIS2")
> reversed_varnames <- c("RHEQUAL", "RHREVDIS1", "RHREVDIS2")
> for(i in 1:length(reverse_me_varnames))
>  {mdf$reversed_varnames[i] <- recode(mdf$reverse_me_varnames[i], 
> '5:7=NA; 1=4; 2=3; 3=2; 4=1;', as.factor.result=FALSE)
> 
> While I don't get an error message, the data don't change.  Any advice 
> on reverse coding non-continguous variables?
> 
> 
> 
> On Mon, May 19, 2008 at 4:12 PM, Donald Braman <donald.braman at gmail.com 
> <mailto:donald.braman at gmail.com>> wrote:
> 
>     Many thanks --
> 
>     You are right; I had rnorm() and sample() mixed up in my code. I'll
>     work on generating a normal ordinal sample next.
> 
>     Cheers, Don
> 
> 
>     On Mon, May 19, 2008 at 4:07 PM, Erik Iverson
>     <iverson at biostat.wisc.edu <mailto:iverson at biostat.wisc.edu>> wrote:
> 
>         Hello -
> 
> 
>         Donald Braman wrote:
> 
>             # I'm new to R and am trying to get the hang of how it handles
>             # dataframes & loops. If anyone can help me with some simple
>             tasks,
>             # I'd be much obliged.
> 
>             # First, i'd like to generate some random data in a dataframe
>             # to efficiently illustrate what I'm up to.
>             # let's say I have six variables as listed below (I really
>             # have hundreds, but a few will illustrate the point).
>             # I want to generate my dataframe (mdf)
>             # with the 6 variables X 100 values with rnorm(7).
>             # How do I do this?  I tried many variations on the following:
> 
>             var_list <- c("HEQUAL", "EWEALTH", "ERADEQ", "HREVDIS1",
>             "EDISCRIM",
>             "HREVDIS2")
>             for(i in 1:length(var_list)) {var_list[1] <- rnorm(100)}
>             mdf <- data.frame(cbind(varlist[1:length(var_list)])
>             mdf
> 
>         There are many ways to do this. Do you mean that you want 6
>         columns, 100 observations in each column, each a sample from a
>         normal distribution with mean = 7 and sd = 1?  You can do this
>         without looping in one of several ways.  If you are coming from
>         a SAS environment (my guess since you talk of looping over
>         data.frames), you may be used to looping through a data object.
>          In R, you can usually avoid this since many functions are
>         vectorized, or take a 'whole object' approach.
> 
> 
>         var_list <- c("HEQUAL", "EWEALTH", "ERADEQ", "HREVDIS1",
>         "EDISCRIM", "HREVDIS2")
> 
>         mdf <- data.frame(replicate(6, rnorm(100, 7))) ## generate
>         random data
>         names(mdf) ## default names
>         names(mdf) <- var_list ## use our names
> 
> 
> 
>             # Then, I'd like to recode the variables that begin with the
>             letter "H".
>             # I've tried many variations of the following, but to no avail:
> 
>             reverse_list <- c("HEQUAL", "HREVDIS1", "HREVDIS2")
>             reversed_list <- c("RHEQUAL", "RHREVDIS1", "RHREVDIS2")
>             for(i in 1:length(reverse_list))
>              {mdf[ ,e_reversed_list][[i]] <- recode(mdf[
>             ,e_reverse_list][[i]],
>             '5:99=NA; 1=4; 2=3; 3=2; 4=1; ', as.factor.result=FALSE)
> 
> 
>         I'm not quite sure what you are after here.  What do you mean by
>         recode? What package is your 'recode' function located in?
> 
>         It appears that you may be under the impression that the
>         data.frame contains integers, but certainly it will not since it
>         was generated with rnorm?  sample can generate a samples of the
>         type you may be after, for example,
> 
>          > sample(7, 100, replace = TRUE)
> 
>         Best,
>         Erik Iverson
> 
> 
> 
> 
>     -- 
>     Donald Braman
>     http://www.law.gwu.edu/Faculty/profile.aspx?id=10123
>     http://research.yale.edu/culturalcognition
>     http://ssrn.com/author=286206 
> 
> 
> 
> 
> -- 
> Donald Braman
> http://www.law.gwu.edu/Faculty/profile.aspx?id=10123
> http://research.yale.edu/culturalcognition
> http://ssrn.com/author=286206



More information about the R-help mailing list