[R] Generating summary statistics and simple statistical analysis from my data-set: how can I automate the analysis?
dereksloan
djsloan at liv.ac.uk
Tue May 3 16:06:13 CEST 2011
I am fairly new to R and have a (for me) slightly complicated set of data to
analyse. It contains several continuous and categorical variables for a
group of individuals – e.g;
ID Sex Age Familysize Phone Education
1 M 23 3 Yes Primary
2 F 25 4 Yes Secondary
3 M 33 5 No Tertiary
4 F 45 1 Yes Secondary
5 F 67 10 Yes Secondary
I want to summarise it in a table as follows;
All individuals Male Female
Comparison between sexes
(I want to put p-values in this column)
Age Median (range) Median (range) Median (range) Wilcoxon rank
sum test
Family size Median (range) Median (range) Median (range) Wilcoxon rank
sum test
Phone Number Yes (%) Number Yes (%) Number Yes (%) Chi-squared test
Education
Chi-squared test
Primary Number (%) Number (%) Number (%)
Secondary Number (%) Number (%) Number (%)
Tertiary Number (%) Number (%) Number (%)
How can I use R to do this?
For the continuous variables I know I can write code like;
summary(Age)
by(Age,data["Sex"],summary)
wilcox.test(Age~Sex)
summary(Familysize)
by(Familysize,data[“Sex”],summary)
Wilcox.test(Familysize~Sex)
but is there any way of automating/looping the analysis so that I get
summaries and comparative statistical analysis of all of the continuous
variables in a single command? I’m sure this could be done by some kind of
‘looping’ given that the analysis is always the same. Presumably I then
still have to copy the output of interest (medians, ranges, p-values) into
the summary table manually?
For each categorical variable I have really cumbersome code from which I can
extract the information I need from each variable for the summary table–
e.g,
tphone<-xtabs(~Phone+Sex,data=data)
N<-margin.table(tphone,2)
tphone1<-rbind(tphone,N)
Total<-margin.table(tphone1,1)
tphone1<-cbind(tfbise3xul1,Total)
tphone1<-t(tphone1)
tphone1<-as.data.frame(tphone1)
tphone2<-within(tphone1,{
per.No<-100*(No/N)
per.Yes<-100*(Yes/N)
tphone2<-tphone2[,c(3,2,4,1,5)]
tphone2
chisq.test(tphone)
but there must be better ways of generating the counts, percentages, and
simple statistical analysis which I need. Again, can I loop it to do all of
my categorical variables at once?
Obviously my dataset has more continuous and categorical variables than
those shown above but I’ve abbreviated it for simplicity of explanation – I
need to write simpler/looped code so that the whole thing is not crazily
long-winded.
Sorry that my approach so far is so bad and long-winded! R is a long uphill
curve to start with, so I’m be very grateful for any help I can get from
anyone who won’t laugh at me.
Derek
--
View this message in context: http://r.789695.n4.nabble.com/Generating-summary-statistics-and-simple-statistical-analysis-from-my-data-set-how-can-I-automate-th-tp3492537p3492537.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list