[R] sas to r
Frank E Harrell Jr
f.harrell at vanderbilt.edu
Sat Jul 17 14:36:20 CEST 2004
Greg Adkison wrote:
> I would be incredibly grateful to anyone who'll help me translate some
> SAS code into R code.
>
> Say for example that I have a dataset named "dat1" that includes five
> variables: wshed, site, species, bda, and sla. I can calculate with the
> following SAS code the mean, CV, se, and number of observations of
> "bda" and "sla" for each combination of "wshed," "species," and "site,"
> restricting the species considered to only three of several species in
> dat1 (b, c, and p). Moreover, I can output these calculations and
> grouping variables to a dataset named "dat2" that will reside in RAM
> and include the variables wshed, site, species, mBdA, msla, cBda,
> sBdA, ssla, nBda, and nsla.
>
> proc sort data=dat1;
> by wshed site species;
> proc means data=dat1 noprint mean cv stderr n;
> by wshed site species;
> where species in ('b', 'c', 'p');
> var BdA sla;
> output out=dat2
> mean=mBdA msla
> cv=cBdA csla
> stderr=sBdA ssla
> n=nBdA nsla;
>
> Thanks,
> Greg
The following handles any number of analysis variables, with automatic
naming of all statistics computed from them. It requires the Hmisc package.
# Generate some data. Put one NA in sla.
set.seed(1)
dat1 <- expand.grid(wshed=1:2, site=c('A','B'),
species=c('a','b','c','p'),
reps=1:10)
n <- nrow(dat1)
dat1 <- transform(dat1,
BdA = rnorm(n, 100, 20),
sla = c(rnorm(n-1, 200, 30), NA))
# Can use upData function in Hmisc in place of transform
# Summarization function, per stratum, for a matrix of analysis
# variables
g <- function(y) {
n <- apply(y, 2, function(z) sum(!is.na(z)))
m <- apply(y, 2, mean, na.rm=TRUE)
s <- apply(y, 2, sd, na.rm=TRUE)
cv <- s/m
se <- s/sqrt(n)
w <- c(m, cv, se, n)
names(w) <- t(outer(c('m','c','s','n'), colnames(y), paste, sep=''))
w
}
library(Hmisc)
dat2 <- with(dat1,
summarize(cbind(BdA, sla),
llist(wshed, site, species),
g,
subset=species %in% c('b','c','p'),
stat.name='mBdA')
)
options(digits=3)
dat2 # is a data frame
wshed site species mBdA msla cBdA csla sBdA ssla nBdA nsla
1 1 A b 100.5 195 0.133 0.1813 4.23 11.20 10 10
2 1 A c 99.7 206 0.101 0.1024 3.17 6.68 10 10
3 1 A p 101.4 188 0.239 0.1580 7.65 9.39 10 10
4 1 B b 109.9 203 0.118 0.1433 4.09 9.21 10 10
5 1 B c 98.4 221 0.193 0.1250 6.01 8.72 10 10
6 1 B p 102.9 203 0.216 0.1446 7.03 9.29 10 10
7 2 A b 95.8 195 0.241 0.2011 7.31 12.40 10 10
8 2 A c 98.7 207 0.194 0.1274 6.04 8.33 10 10
9 2 A p 102.2 191 0.217 0.1709 7.01 10.31 10 10
10 2 B b 97.8 191 0.235 0.2079 7.27 12.58 10 10
11 2 B c 100.9 194 0.164 0.0987 5.24 6.07 10 10
12 2 B p 103.0 209 0.144 0.0769 4.69 5.35 10 9
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
More information about the R-help
mailing list