[R] R for simple stats
Frank E Harrell Jr
fharrell at virginia.edu
Fri Jun 28 20:57:45 CEST 2002
You might also take a look at some functions in the Hmisc library, e.g.:
set.seed(1)
x <- runif(1000)
g <- factor(sample(letters[1:4],1000,T))
describe(x)
x
n missing unique Mean .05 .10 .25 .50 .75 .90
1000 0 1000 0.5043 0.06128 0.11650 0.26521 0.50441 0.74055 0.90252
.95
0.95984
lowest : 0.003536 0.004208 0.004228 0.006153 0.006443
highest: 0.998321 0.998607 0.998766 0.999014 0.999439
options(digits=3)
s <- function(y) c(Mean=mean(y),Median=median(y),SD=sqrt(var(y)))
summary(x ~ g, fun=s)
x N=1000
+-------+-+----+-----+------+-----+
| | |N |Mean |Median|SD |
+-------+-+----+-----+------+-----+
|g |a| 254|0.495|0.469 |0.283|
| |b| 243|0.523|0.533 |0.294|
| |c| 249|0.495|0.481 |0.278|
| |d| 254|0.505|0.514 |0.289|
+-------+-+----+-----+------+-----+
|Overall| |1000|0.504|0.504 |0.286|
+-------+-+----+-----+------+-----+
summarize(x, g, s) # to cross-classify g -> llist(g1,g2)
g x Median SD # x column=Mean
1 a 0.495 0.469 0.283
2 b 0.523 0.533 0.294
3 c 0.495 0.481 0.278
4 d 0.505 0.514 0.289
Frank Harrell
On Fri, 28 Jun 2002 11:21:32 -0700
Brett Magill <bmagill at earthlink.net> wrote:
> The code attached creates a function for descriptives statistics called
> dstats. Enter the name of the column you want to summarize and dstats will
> produce a nice summary. If you have a data frame of numeric variables and
> want to summarize by column, you can use something like:
>
> apply(data.frame.name,2,dstats)
>
> wrap t( ) around the above to get the output in a format that I find more
> useable.
>
> Brett
>
>
>
> dstats<-function(x,na.rm=T,digits=3) {
>
> dstats<-NULL
>
> dstats[1]<-mean(x,na.rm=na.rm)
> dstats[2]<-sd(x,na.rm=na.rm)
> dstats[3]<-var(x,na.rm=na.rm)
> dstats[4]<-min(x,na.rm=na.rm)
> dstats[5]<-max(x,na.rm=na.rm)
> dstats[6]<-length(unique(x))
> dstats[7]<-sum(!is.na(x))
> dstats[8]<-sum(is.na(x))
>
> dstats<-round(dstats,digits=digits)
> names(dstats)<-c("mean","sd","variance","min","max","unique","n","miss")
>
> return(dstats)
> }
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
--
Frank E Harrell Jr Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list