[R] using ddply but preserving some of the outside data

xavier.chardon at free.fr xavier.chardon at free.fr
Thu Aug 6 09:36:08 CEST 2009


Hi,

Apart from everything that's been answered already, it seems in your first mail that you were confusing merge and rbind.

rbind is to append rows to a data set.
merge performs joints, like in a relational database


----- Mail Original -----
De: "Jarrett Byrnes" <byrnes at msi.ucsb.edu>
À: "R help" <R-help at stat.math.ethz.ch>
Envoyé: Mercredi 5 Août 2009 21h00:40 GMT +01:00 Amsterdam / Berlin / Berne / Rome / Stockholm / Vienne
Objet: [R] using ddply but preserving some of the outside data

I have a bit of a quandy.  I'm working with a data set for which I  
have sampled sites at a variety of dates.  I want to use this data,  
and get a running average of the sampled values for the current and  
previous date.

I originally thought something like ddply would be ideal for this,  
however, I cannot break up my data by date, and then apply a function  
that requires information about the previous dates.

I had thought to use a for loop and merge, but that doesn't quite seem  
to be working.

So, my questions are twofold

1) Is there a way to use something like the plyr library to do this  
efficiently
	1a) Indeed, is there a way to use ddply or its ilk to have a function  
that returns a vector of values, and then assign the variables you are  
sorting by to the whole vector?  Or maybe making each value it's own  
column in the new data frame, and then using reshape is the answer.   
Hrm.  Seems clunky.

2) Or, can a for loop around a plyr-kind of statement do the trick  
(and if so, pointers on why the below code won't work) (also, it, too,  
seems clunkier than I would like)


sites<-c("a", "b", "c")
dates<-1:5

a.df<-expand.grid(sites=sites, dates=dates)
a.df$value<-runif(15,0,100)
a.df<-as.data.frame(a.df)


#now, I want to get the average of the
mean2<-function(df, date){
	sub.df<-subset(df, df$dates-date<1 &
				df$dates-date>-1 )
	return(mean(df$value))
	}

my.df<-data.frame(sites=NA, dates=NA, V1=NA)
for(a.date in a.df$dates){
	new.df<-ddply(a.df, "sites", function(df) mean2 (df, a.date))
	my.df<-merge(my.df, new.df) #doesn't seem to work
}

my.df

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list