[R] Non-Parametric Adventures in R

Sun Oct 3 14:47:10 CEST 2010

Jamesp <james.jrp015 at gmail.com> [Sat, Oct 02, 2010 at 11:27:09PM CEST]:
> 
[...]
> ----------------------------------------------------------------------------------------------------
> 1) I was thinking I'd have to go through each nominal variable (i.e.
> table(X$race) ), but I think I have it figured out now.  summary(X) is nice,
> but I need to recode nominal data with labels so the results are meaningful.
> 

Labels are not a concept which comes with R-base. You may want to try
the Hmisc package and the label and describe functions. Unfortunately,
reporting functions in R-base make no use of labels.

> -----------------------------------------------------------------------------------------------------
> 2) I had an issue with multiple plots overwriting each other, and I managed
> to bypass that with:
> par(mfrow=c(2,1))
> I have to update it to correspond to the number of plots I think.  There's
> probably a better way to do this.
> 

Try for example
pdf("yourfilename.pdf")

 ... plotting routines ...

dev.off()

R does not provide a graphics browser by itself, only one graphic window,
so you may want to use the capabilities of external programs such as
your favourite pdf viewer.

> barplot(table(X$race))  prints out a barplot so that's great 

plot(table(numeric variable)) draws barplots with scaled x axis, which I
think is even greater when looking at integer random variables.

> 
> -----------------------------------------------------------------------------------------------------
> 3) I was able to code my data so it shows up in tables better with
> X$race <- factor(X$race, levels = c(0,2), labels = c("African
> American","White,Non-Hispanic"))
> 
> ----------------------------------------------------------------------------------------------------
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> ----------------------------------------------------------------------------------------------------
> 4) The coding for all of my drug variables is identical, and I'd like to
> create a loop that goes through and labels accordingly
> 

Cycle over the column names, one example:

x <- data.frame(replicate(8, sample(as.factor(c("Black", "Asian", "White", "Hispanic", "Native")), 
                                    20, replace=TRUE)))

for (col in c("X2", "X3", "X4")) { 
    levels(x[[col]])[c(2, 5)] <- c("African American", "White, non-Hispanic") }

Generally, the use of loops is not encouraged. Here it is a simple thing 
to do as you need the modification of x as a side effect.

> ----------------------------------------------------------------------------------------------------
> 5) I had more success creating new variables based on the old ones.  So I
> end up with yes/no answers to drug usage
> 
> for (i in 24:56)
> {
>   X[,i+173] <- ifelse(X[,i] >0,c(1),c(0))
> }
> 
> I'd like to have been able to make a new variable name based off of the old
> variable name (i.e. dropping "_when" from the end of each and replace it
> with "_yn")
> 

untested, but along these lines (pls provide a small data example with
your questions so they can be addressed more directly):

for (col in grep("_when$", colnames(X))) {
    X[, sub("_when$", "_yn")] <- ifelse(X[, col] > 0, 1, 0)
}

if you insist on coding your _yn variables as numeric. In R, the data
type boolean exists, so it would be more idiomatic to simply have
X[, col] > 0 without the ifelse() construct.

> ---------------------------------------------------------------------------------------------------
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> ---------------------------------------------------------------------------------------------------
> 6)  I'm able to make a cross-tabulated table and perform a X-squared test
> just fine with my recoded variable
> 
> table(X$race,X[,197])
> prop.test(table(X$race,X[,197]))
> 
> but I would like to be able to do so with all of my drugs, although I can't
> seem to make that work
> 
> for (i in 197:229)
> {
>   table(X$race,X[,i])
>   prop.test(table(X$race,X[,i]))
> }
> 

in my toy example:

apply(x[, -1], 2, function(vec) fisher.test(table(x[, 1], vec)))

Note the non-use of a loop here, the upside being that a list
of test results is returned (which you'd have to build yourself
if using a loop). I couldn't apply a prop test here as I didn't
have vectors of trials and successes, and I wonder how you got
them out of your table() function.

If you don't understand each single command, type ?commandname.
If you have any further questions after reading up on the 
descriptions, feel free to post them here, but please provide
toy examples of your own.
-- 
Johannes Hüsing               There is something fascinating about science. 
                              One gets such wholesale returns of conjecture 
mailto:johannes at huesing.name  from such a trifling investment of fact.                
http://derwisch.wikidot.com         (Mark Twain, "Life on the Mississippi")