[R] Non-Parametric Adventures in R
Jamesp
james.jrp015 at gmail.com
Tue Oct 5 22:57:10 CEST 2010
Thanks for the feedback, this is really helpful.
4) Your solution works like a charm. I opted for reference by column number
since I had so many.
ynFields = c(12:22,58:229)
ynLabel = c("No","Yes")
X[ynFields] <- lapply(X[ynFields], factor, levels=0:1, labels=ynLabel)
whenFields = c(24:56)
whenLabel = c("Never","<=Week","<=3 months","<=year","regular use >=3
mo","user for unknown time")
X[whenFields] <- lapply(X[whenFields], factor, levels=0:5, labels=whenLabel)
5) For the creation of new columns with names based off the old ones I
adjusted to:
Z <- as.data.frame(X[24:56]>0)
names(Z) <- sub("_when$", "_yn", names(Z))
X <- cbind(X, Z)
instead of:
Z <- as.data.frame(as.numeric(X[24:56]>0))
names(Z) <- sub("_when$", "_yn", names(Z))
X <- cbind(X, Z)
because R would literally name column 197 as "as.numeric(X[24:56>0])", and
leave the rest alone.
As is, it does not put in the proper values (maybe because I had to drop the
"as.numeric()" portion).
I added:
for (i in 24:56)
{
X[,i+173] <- as.numeric(X[,i] > 0)
}
afterwards and get the values I want, but maybe a slight change to the
previous code can eliminate my need of the for loop.
6) Print is exactly what I needed to get output from loops, that helps me
greatly.
I'm making more of a mess with the code at the moment, trying nasty things
like:
for (j in 3:5)
{
print (names(X[j]))
for (i in 197:229)
{
print (names(X[i]))
print(table(X[,j],X[,i]))
#print(prop.test(table(X$race,X[,i])))
print ("--------------------------------")
}
}
My intent is to look at the drug usage by demographic data(frequency and
x-square). I sort of get that in a piecemeal way, but it's quite a nasty
output.
a) I commented out "prop.test(table(X$race,X[,i]))" because it works until
it runs into a drug with no successes on a column, then the program halts.
My first instinct would be to add an if statement, but I bet R has something
more elegant.
b) My final goal would be to get some output similar to the following in a
sample size of 50 persay.
_____________________________________________________________________
variable drug1 drug2 drug3 drug4 drug5 ...
drug n
(n=5) (n=10) (n=8) (n=7) (n=5)
(n=0)
no (%) no (%) no (%) no (%) no (%)
no (%)
_____________________________________________________________________
Gender * **
Male 2 (0.04) 9 (0.18) ...
Female 3 (0.06) 1 (0.02) ...
Ethnicity ...
Caucasian
African American
...
Demographic
Level1
level2
...
LevelN
_____________________________________________________________________
* p < 0.05
** p < 0.01
I figure some of this would be needed to be done by hand, but the closer I
can get the better. At the moment, I plan on reading up on table and xtabs,
and try to find a way to skip the x-square tests that would hang (maybe one
of the apply functions works for this). If I can store the resulting
p-values then maybe I can output a custom table in the worst case.
Thanks again for the help. The discussion helps me understand how to use R
quite a bit better.
--
View this message in context: http://r.789695.n4.nabble.com/Non-Parametric-Adventures-in-R-tp2952754p2956852.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list