[R] vectorization
Rau, Roland
Rau at demogr.mpg.de
Fri Jun 17 20:53:08 CEST 2005
Hi,
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Dimitri Joe
> Sent: Friday, June 17, 2005 7:01 PM
> To: R-Help
> Subject: [R] vectorization
>
> Hi there,
>
> I have a data frame (mydata) with 1 numeric variable (income)
> and 1 factor (education). I want a new column in this data
> with the median income for each education level. A obviously
> inneficient way to do this is
>
I guess the attached code (incl. simulating your data structure) is not
the most efficient way to do this, but at least (I hope so!) it does
what you wanted it to do:
####################### Beginning of Example Code
income <- runif(100)
education <- as.factor(sample(c("high", "middle", "low"),
size=length(income), replace=TRUE))
mydata <- data.frame(inc=income, edu=education)
mymedians <- tapply(X=mydata$inc, INDEX=mydata$edu, FUN=median)
mydata$medians <- ifelse(mydata$edu=="high", mymedians["high"], 0)
mydata$medians <- ifelse(mydata$edu=="middle", mymedians["middle"],
mydata$medians)
mydata$medians <- ifelse(mydata$edu=="low", mymedians["low"],
mydata$medians)
head(mydata)
mymedians
####################### End of Example Code
Maybe one can increase the speed, but I think it is sufficient for your
case of 30,000 cases as you can see from the timing on my desktop
computer here (WinXP Pro SP2, P4, 3GHz, 512MB RAM):
> time.check <- function(){
+ income <- runif(30000)
+ education <- as.factor(sample(c("high", "middle", "low"),
size=length(income), replace=TRUE))
+ mydata <- data.frame(inc=income, edu=education)
+
+ mymedians <- tapply(X=mydata$inc, INDEX=mydata$edu, FUN=median)
+
+ mydata$medians <- ifelse(mydata$edu=="high", mymedians["high"], 0)
+ mydata$medians <- ifelse(mydata$edu=="middle", mymedians["middle"],
mydata$medians)
+ mydata$medians <- ifelse(mydata$edu=="low", mymedians["low"],
mydata$medians)
+ return(NULL)
+ }
> system.time(time.check())
[1] 0.36 0.02 0.38 NA NA
>
> version
_
platform i386-pc-mingw32
arch i386
os mingw32
system i386, mingw32
status beta
major 2
minor 1.0
year 2005
month 04
day 04
language R
Best,
Roland
+++++
This mail has been sent through the MPI for Demographic Rese...{{dropped}}
More information about the R-help
mailing list