[R] searching for data.frame rows / processing of rows
chao gai
chaogai at duineveld.demon.nl
Thu Oct 5 11:33:45 CEST 2006
I tought that aggregate was the way to go, but only for large dataframes it is
faster.
> df <- read.table(stdin(),header=TRUE)
0: Location Time X Y
1: 1 0 1.6 9.3
2: 1 3 4.2 10.4
3: 1 6 2.7 16.3
4: 2 0 0.5 2.1
5: 2 3 NA 3.6
6: 2 3 5.0 0.06
7: 2 6 3.4 14.0
8:
> aggregate(df[,3:4],df[,1:2],FUN=mean,na.rm=TRUE)
Location Time X Y
1 1 0 1.6 9.30
2 2 0 0.5 2.10
3 1 3 4.2 10.40
4 2 3 5.0 1.83
5 1 6 2.7 16.30
6 2 6 3.4 14.00
> system.time( aggregate(df[,3:4],df[,1:2],FUN=mean,na.rm=TRUE) )
[1] 0.008 0.000 0.008 0.000 0.000
>
> system.time( do.call(rbind, by(df, df[2:1], colMeans, na.rm = TRUE)))
[1] 0.005 0.000 0.005 0.000 0.000
>
>
> df <- data.frame(Location=rep(1:50,50),
+ Time=sample(rep(1:50,each=10),2500,replace=TRUE),
+ X=runif(2500),Y=runif(2500))
>
> system.time( aggregate(df[,3:4],df[,1:2],FUN=mean,na.rm=TRUE) )
[1] 0.162 0.000 0.163 0.000 0.000
>
> system.time( do.call(rbind, by(df, df[2:1], colMeans, na.rm = TRUE)))
[1] 2.179 0.006 2.216 0.000 0.000
Kees
Gabor Grothendieck wrote:
> Grouping the data frame by the first two columns, apply colMeans
> and then rbind the resulting by-structure together:
>
> do.call(rbind, by(DF, DF[2:1], colMeans, na.rm = TRUE))
>
>
> On 10/5/06, Greg Tarpinian <sasprog474 at yahoo.com> wrote:
>> R 2.3.1, WinXP:
>>
>> I have a puzzling problem that I suspect may be solved using
>> grep or a regular expression but am at a loss how to actually do it...
>> My data.frame looks like
>>
>> Location Time X Y
>> -------- ---- --- ---
>> 1 0 1.6 9.3
>> 1 3 4.2 10.4
>> 1 6 2.7 16.3
>> 2 0 0.5 2.1
>> 2 3 NA 3.6
>> 2 3 5.0 0.06
>> 2 6 3.4 14.0
>>
>> and so forth. I would like to search for duplicate Time values
>> within a Location and take the numerical average (where possible)
>> of the elements in X and Y. These numerical averages should
>> then be used to create a single row where multiple rows once
>> existed. So, I would like to obtain
>>
>> 2 3 5.0 1.83
>>
>> for the two Time = 3 rows for Location = 2 and use it to replace
>> these two rows. Ideally, avoiding for(i in 1:blah) loops would be
>> nice because the data.frame has about 10,000 rows that need to
>> be searched and processed. My intent is to do some comparing of
>> SAS to R -- the DATA step processing in SAS is quite fast and
>> using the RETAIN statement along with the LAG( ) function allows
>> this sort of thing to be done rapidly.
>>
>>
>> Thanks in advance,
>>
>> Greg
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list