[R] recoding data with loops
Erik Iverson
iverson at biostat.wisc.edu
Tue May 20 00:49:12 CEST 2008
Got it, I did not know of the 'recode' function in car.
So you would like to recode those specific columns then? Once again, we
can do it without a loop, this time with the help of a function called
lapply, which applies a function to each item in a list in turn.
Try:
reverse_me_varnames <- c("HEQUAL", "HREVDIS1", "HREVDIS2")
reversed_varnames <-paste("R", reverse_me_varnames, sep = "")
## See ?paste
mdf[reversed_varnames] <-
lapply(mdf[reverse_me_varnames],
function(x) recode(x, recodes = "5:7=NA; 1=4; 2=3; 3=2; 4=1;",
as.factor.result = FALSE))
Now what does this actually mean? To the left of '<-' is simply the new
columns of our data.frame. We want to then use lapply to do some
function to a list of objects. The first argument to lapply is that
list. In this case, it is simply the columns of the data.frame you want
reversed. A data.frame is a list in R. See ?list and ?data.frame.
Then, the next argument to lapply is a function that we want to perform
on each element in our list. So, we create a function that accepts as
input a variable I simply call 'x'. This 'x' is going to be an item
from the list we passed lapply, which is one of the columns of mdf in
'reverse_me_varnames'.
We then use the recode function in the car package to recode x, in a
similar way to what you tried before. This function of x we define will
get called three times in the above example, once for each of
reverse_me_varnames. It will then assign those three new columns to the
left-hand side of the <- operator, which are three newly-named columns.
To see why what you tried before did not work, with the for loop, try:
mdf$HEQUAL
contrasted with
t1 <- c("HEQUAL")
mdf$t1
From the help for ?Extract, $ does not allow 'computed' indices.
I hope this helps!
Erik
Donald Braman wrote:
> Erik,
>
> Your example was just what I needed to generate the data -- many, many
> thanks! The names() function was something I had not grasped fully. I
> now have this and it works very nicely:
>
> var_list <- c("HEQUAL", "EWEALTH", "ERADEQ", "HREVDIS1", "EDISCRIM",
> "HREVDIS2")
> mdf <- data.frame(replicate(length(var_list), sample(7,100, replace =
> TRUE))) ## generate random data
> names(mdf) ## default names
> names(mdf) <- var_list ## use our names
> mdf
>
> I'm still trying to figure out how to recode (using the car package)
> data into new variables using a similar loop. Basically, I'm not sure
> how to call the variable name and append it to the dataframe name in a
> loop. In Stata I'd do this using single quotes, but clearly that's not
> how R works. I tried several variations on this:
>
> reverse_me_varnames <- c("HEQUAL", "HREVDIS1", "HREVDIS2")
> reversed_varnames <- c("RHEQUAL", "RHREVDIS1", "RHREVDIS2")
> for(i in 1:length(reverse_me_varnames))
> {mdf$reversed_varnames[i] <- recode(mdf$reverse_me_varnames[i],
> '5:7=NA; 1=4; 2=3; 3=2; 4=1;', as.factor.result=FALSE)
>
> While I don't get an error message, the data don't change. Any advice
> on reverse coding non-continguous variables?
>
>
>
> On Mon, May 19, 2008 at 4:12 PM, Donald Braman <donald.braman at gmail.com
> <mailto:donald.braman at gmail.com>> wrote:
>
> Many thanks --
>
> You are right; I had rnorm() and sample() mixed up in my code. I'll
> work on generating a normal ordinal sample next.
>
> Cheers, Don
>
>
> On Mon, May 19, 2008 at 4:07 PM, Erik Iverson
> <iverson at biostat.wisc.edu <mailto:iverson at biostat.wisc.edu>> wrote:
>
> Hello -
>
>
> Donald Braman wrote:
>
> # I'm new to R and am trying to get the hang of how it handles
> # dataframes & loops. If anyone can help me with some simple
> tasks,
> # I'd be much obliged.
>
> # First, i'd like to generate some random data in a dataframe
> # to efficiently illustrate what I'm up to.
> # let's say I have six variables as listed below (I really
> # have hundreds, but a few will illustrate the point).
> # I want to generate my dataframe (mdf)
> # with the 6 variables X 100 values with rnorm(7).
> # How do I do this? I tried many variations on the following:
>
> var_list <- c("HEQUAL", "EWEALTH", "ERADEQ", "HREVDIS1",
> "EDISCRIM",
> "HREVDIS2")
> for(i in 1:length(var_list)) {var_list[1] <- rnorm(100)}
> mdf <- data.frame(cbind(varlist[1:length(var_list)])
> mdf
>
> There are many ways to do this. Do you mean that you want 6
> columns, 100 observations in each column, each a sample from a
> normal distribution with mean = 7 and sd = 1? You can do this
> without looping in one of several ways. If you are coming from
> a SAS environment (my guess since you talk of looping over
> data.frames), you may be used to looping through a data object.
> In R, you can usually avoid this since many functions are
> vectorized, or take a 'whole object' approach.
>
>
> var_list <- c("HEQUAL", "EWEALTH", "ERADEQ", "HREVDIS1",
> "EDISCRIM", "HREVDIS2")
>
> mdf <- data.frame(replicate(6, rnorm(100, 7))) ## generate
> random data
> names(mdf) ## default names
> names(mdf) <- var_list ## use our names
>
>
>
> # Then, I'd like to recode the variables that begin with the
> letter "H".
> # I've tried many variations of the following, but to no avail:
>
> reverse_list <- c("HEQUAL", "HREVDIS1", "HREVDIS2")
> reversed_list <- c("RHEQUAL", "RHREVDIS1", "RHREVDIS2")
> for(i in 1:length(reverse_list))
> {mdf[ ,e_reversed_list][[i]] <- recode(mdf[
> ,e_reverse_list][[i]],
> '5:99=NA; 1=4; 2=3; 3=2; 4=1; ', as.factor.result=FALSE)
>
>
> I'm not quite sure what you are after here. What do you mean by
> recode? What package is your 'recode' function located in?
>
> It appears that you may be under the impression that the
> data.frame contains integers, but certainly it will not since it
> was generated with rnorm? sample can generate a samples of the
> type you may be after, for example,
>
> > sample(7, 100, replace = TRUE)
>
> Best,
> Erik Iverson
>
>
>
>
> --
> Donald Braman
> http://www.law.gwu.edu/Faculty/profile.aspx?id=10123
> http://research.yale.edu/culturalcognition
> http://ssrn.com/author=286206
>
>
>
>
> --
> Donald Braman
> http://www.law.gwu.edu/Faculty/profile.aspx?id=10123
> http://research.yale.edu/culturalcognition
> http://ssrn.com/author=286206
More information about the R-help
mailing list