[R] Help
jim holtman
jholtman at gmail.com
Wed Apr 27 03:40:48 CEST 2011
Is this what you were looking for as output. You did not show what
the output would look like:
> x
var1 var2 X. varN
1 122 nnn1 … 1
2 213 nnn2 … 2
3 422 nnn4 … 2
4 432 … … 3
5 441 … … 4
6 500 … … 4
7 550 … … 4
> str(x)
'data.frame': 7 obs. of 4 variables:
$ var1: int 122 213 422 432 441 500 550
$ var2: Factor w/ 4 levels "…","nnn1","nnn2",..: 2 3 4 1 1 1 1
$ X. : Factor w/ 1 level "…": 1 1 1 1 1 1 1
$ varN: int 1 2 2 3 4 4 4
> x$newCol <- ave(x$var1, x$varN, FUN=sum)
> x
var1 var2 X. varN newCol
1 122 nnn1 … 1 122
2 213 nnn2 … 2 635
3 422 nnn4 … 2 635
4 432 … … 3 432
5 441 … … 4 1491
6 500 … … 4 1491
7 550 … … 4 1491
>
On Tue, Apr 26, 2011 at 6:31 PM, петрович <bistanz at gmail.com> wrote:
> Hey Everyone!
> I´m a quite new R user .. I found a problem that I'd like to share with you
> and help me find a solution.
> I have a large txt. file which I opened with read.table command, and what I
> understood from many R manuals is that I have a kind of matrix readed with
> read.table,
> I've used order() to sort my data and now my problem is: I have a variable
> that has many repeated values and I would like to operate with the row
> indexes of "these repeated values": for example, suppose I have:
>
> var1 var2 … varN
> 122 nnn1 … 1
> 213 nnn2 … 2
> 422 nnn4 … 2
> 432 … … 3
> 441 … … 4
> 500 … … 4
> 550 … … 4
>
> So I want to obtain a new column where all elements of var1 are added at the
> places where varN are repetead ... so for varN=2 the new column correspond
> to this element will be 213+422, for varN=4 will be 441+500+550, where there
> is no such repeated values obviously there´s nothing to do and varN is the
> unique value.
> I made a function to do this but is not so good, (I hava a database with
> around 1 million rows and 5 columns) actually, this function works for not
> so large data:
>
> suma.rep=function(X,Y){
> resp=numeric(0)
> Z=unique(Y)
> for (i in (1:length(Z)))
> resp=c(resp,sum(X[which(Y==Z[i])]))
> return(resp)}
>
> When I run this function with my large data, R appears calculating and I
> think it would take so long to make my new required column.(maybe 4 days)
> Question1: I "feel" that maybe there's a command that could help me to do
> this "simple" operation more elegant, I googled it but I couldnt find... Is
> there any such a command?
> Question2: Is a good idea to handle large data bases files with R, as in my
> example?
>
> Thank you so much for your help.
> Christian Paúl
>
> [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
More information about the R-help
mailing list