[R] More efficient option to append()?
Timothy Bates
timothy.c.bates at gmail.com
Thu Aug 18 09:46:35 CEST 2011
This takes a few seconds to do 1 million lines, and remains explicit/for loop form
numberofSalaryBands = 1000000 # 2000000
x = sample(1:15,numberofSalaryBands, replace=T)
y = sample((1:10)*1000, numberofSalaryBands, replace=T)
df = data.frame(x,y)
finalN = sum(df$x)
myVar = rep(NA, finalN)
outIndex = 1
i = 1
for (i in 1:numberofSalaryBands) {
kount = df$x[i]
myVar[outIndex:(outIndex+kount-1)] = rep(df$y[i], kount) # Make x[i] copies of value y[i]
outIndex = outIndex+kount
}
head(myVar)
plyr::count(myVar)
On Aug 18, 2011, at 12:17 AM, Alex Ruiz Euler wrote:
>
>
> Dear R community,
>
> I have a 2 million by 2 matrix that looks like this:
>
> x<-sample(1:15,2000000, replace=T)
> y<-sample(1:10*1000, 2000000, replace=T)
> x y
> [1,] 10 4000
> [2,] 3 1000
> [3,] 3 4000
> [4,] 8 6000
> [5,] 2 9000
> [6,] 3 8000
> [7,] 2 10000
> (...)
>
>
> The first column is a population expansion factor for the number in the
> second column (household income). I want to expand the second column
> with the first so that I end up with a vector beginning with 10
> observations of 4000, then 3 observations of 1000 and so on. In my mind
> the natural approach would be to create a NULL vector and append the
> expansions:
>
> myvar<-NULL
> myvar<-append(myvar, replicate(x[1],y[1]), 1)
>
> for (i in 2:length(x)) {
> myvar<-append(myvar,replicate(x[i],y[i]),sum(x[1:i])+1)
> }
>
> to end with a vector of sum(x), which in my real database corresponds
> to 22 million observations.
>
> This works fine --if I only run it for the first, say, 1000
> observations. If I try to perform this on all 2 million observations
> it takes long, way too long for this to be useful (I left it running
> 11 hours yesterday to no avail).
>
>
> I know R performs well with operations on relatively large vectors. Why
> is this so inefficient? And what would be the smart way to do this?
>
> Thanks in advance.
> Alex
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list