[R] Non-Parametric Adventures in R

Jamesp james.jrp015 at gmail.com
Tue Oct 5 22:57:10 CEST 2010


Thanks for the feedback, this is really helpful.

4) Your solution works like a charm.  I opted for reference by column number
since I had so many.

ynFields = c(12:22,58:229)
ynLabel = c("No","Yes")
X[ynFields] <- lapply(X[ynFields], factor, levels=0:1, labels=ynLabel)

whenFields = c(24:56)
whenLabel = c("Never","<=Week","<=3 months","<=year","regular use >=3
mo","user for unknown time")
X[whenFields] <- lapply(X[whenFields], factor, levels=0:5, labels=whenLabel)


5) For the creation of new columns with names based off the old ones I
adjusted to:

Z <- as.data.frame(X[24:56]>0)
names(Z) <- sub("_when$", "_yn", names(Z))
X <- cbind(X, Z)

instead of:

Z <- as.data.frame(as.numeric(X[24:56]>0))
names(Z) <- sub("_when$", "_yn", names(Z))
X <- cbind(X, Z)

because R would literally name column 197 as "as.numeric(X[24:56>0])", and
leave the rest alone.

As is, it does not put in the proper values (maybe because I had to drop the
"as.numeric()" portion).

I added:

for (i in 24:56)
{
  X[,i+173] <- as.numeric(X[,i] > 0)
}

afterwards and get the values I want, but maybe a slight change to the
previous code can eliminate my need of the for loop.

6)  Print is exactly what I needed to get output from loops, that helps me
greatly.

I'm making more of a mess with the code at the moment, trying nasty things
like:

for (j in 3:5)
{
  print (names(X[j]))
  for (i in 197:229)
  {
    print (names(X[i]))
    print(table(X[,j],X[,i]))
    #print(prop.test(table(X$race,X[,i])))
    print ("--------------------------------")
  }
}

My intent is to look at the drug usage by demographic data(frequency and
x-square).  I sort of get that in a piecemeal way, but it's quite a nasty
output.

a) I commented out "prop.test(table(X$race,X[,i]))" because it works until
it runs into a drug with no successes on a column, then the program halts. 
My first instinct would be to add an if statement, but I bet R has something
more elegant.

b) My final goal would be to get some output similar to the following in a
sample size of 50 persay.


_____________________________________________________________________
variable           drug1       drug2        drug3     drug4     drug5   ... 
drug n
                     (n=5)       (n=10)      (n=8)     (n=7)      (n=5)      
(n=0)
                     no (%)      no (%)      no (%)   no (%)    no (%)    
no (%) 
_____________________________________________________________________
Gender           *              **
  Male            2  (0.04)    9 (0.18)     ... 
  Female         3  (0.06)    1 (0.02)     ...

Ethnicity         ...
  Caucasian
  African American

...

Demographic
  Level1
  level2
  ...
  LevelN
_____________________________________________________________________
* p < 0.05
** p < 0.01

I figure some of this would be needed to be done by hand, but the closer I
can get the better.  At the moment, I plan on reading up on table and xtabs,
and try to find a way to skip the x-square tests that would hang (maybe one
of the apply functions works for this).  If I can store the resulting
p-values then maybe I can output a custom table in the worst case.  

Thanks again for the help.  The discussion helps me understand how to use R
quite a bit better.
-- 
View this message in context: http://r.789695.n4.nabble.com/Non-Parametric-Adventures-in-R-tp2952754p2956852.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list