[R] Non-Parametric Adventures in R

Peter Dalgaard pdalgd at gmail.com
Sun Oct 3 14:31:32 CEST 2010


> ----------------------------------------------------------------------------------------------------
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> ----------------------------------------------------------------------------------------------------
> 4) The coding for all of my drug variables is identical, and I'd like to
> create a loop that goes through and labels accordingly
> 
> I'm not having good success with this yet, but here's what I'm trying.
> 
> X[1,] <- factor(X[1,], levels = c(0,1,2,3,4,5), labels= c("none","last
> week","last 3 month","last year","regular use at least 3 months","unknown
> length of usage"))
> 
> I know I would need to replace the [1,] with something that gives me the
> column, but I'm not sure what to put syntactically at the moment.

[I assume you meant X[,1] there]

Well a for loop like in 5) is not out of reach, you just need to figure
out what to loop over. It's probably neatest to do it by name, but you
could also do it by number (and that may be more convenient if the drug
variables are listed sequentially).

drugvar <- c(5,7,9,13)
--OR--
drugvar <- c("aspirin","warfarin", "heroin", "nicotine")

in either case,

mylabels <- c("none","last week","last 3 month","last year","regular use
at least 3 months","unknown length of usage")

for (i in drugvar)
   X[i] <- factor(X[i], levels = 0:5, labels= mylabels)

(Or X[,drugvar] but single index will extract the column as well.)

Or, using a more advanced idiom:

X[drugvar] <- lapply(X[drugvar], factor, levels=0:5, labels=mylabels)


> ----------------------------------------------------------------------------------------------------
> 5) I had more success creating new variables based on the old ones.  So I
> end up with yes/no answers to drug usage
> 
> for (i in 24:56)
> {
>   X[,i+173] <- ifelse(X[,i] >0,c(1),c(0))
> }

(Don't use c(0). Not that it is that harmful, it is just unnecessary and
labels yourself as a newbie...).

I'd write the ifelse() bit as as.numeric(X[,i] > 0), and the whole thing
is very close to

X <- cbind(X, as.numeric(X[24:56] > 0))

except for colnames issues,

> 
> I'd like to have been able to make a new variable name based off of the old
> variable name (i.e. dropping "_when" from the end of each and replace it
> with "_yn")


sub() is your friend:

Z <- as.data.frame(as.numeric(X[24:56]>0))
names(Z) <- sub("_when$", "_yn", names(Z))
X <- cbind(X, Z)

> 
> ---------------------------------------------------------------------------------------------------
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> ---------------------------------------------------------------------------------------------------
> 6)  I'm able to make a cross-tabulated table and perform a X-squared test
> just fine with my recoded variable
> 
> table(X$race,X[,197])
> prop.test(table(X$race,X[,197]))
> 
> but I would like to be able to do so with all of my drugs, although I can't
> seem to make that work
> 
> for (i in 197:229)
> {
>   table(X$race,X[,i])
>   prop.test(table(X$race,X[,i]))
> }

That's basically fine, just remember to print() the results when they
are generated in a loop.

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list