[R-sig-Geo] combining values taken at nearly identical locations

Wed Apr 4 08:03:48 CEST 2012

?zerodist

from sp. You can set zero > 0 within that.

?dnearneigh

from spdep helps you find neighbours within specified distance. Surely 
rgeos has something ...

HTH,
Tom

Am 03.04.2012 22:40, schrieb Molly Davies:
> Hello,
>
> I apologize if my question has already been answered and I failed to find it. There must be a formal word for what I'm trying to do and I just don't know it ...
>
> My data: PM2.5 measurements, taken from 144 different monitors over a period of 78 days. Some monitors have daily values, others report at regular intervals (once every m days, m in {3, 6, 15, etc}, varies by type of monitor).
>
> The wrinkle: A number of these monitors have one or more monitors right next to them (little clusters of identical locations) or very nearby (within 100 meters).
>
> My objective: I would like to combine the values of these clusters of closely spaced monitors. On days when more than one monitor within the cluster reports a PM2.5 measurement, I would like to take the average. On days when only one monitor reports a measurement within the cluster, I'd like to use that one measurement. I do NOT want to average the location of the monitors, though! Rather, I want to use a "majority rules" voting system: if I have a cluster of 4 closely spaced monitors and 2 of them have the same coordinates, I'd like to assign the combined vector of PM2.5 measurements to the coordinates those 2 points have in common. If there are no repeated locations in a cluster, I'd still like to be able to assign the vector of measurements to an existing set of locations in my original data and not an average location. (Note: I am aware that I can use ddply{plyr} to take care of exactly duplicated points, but I want to do more than that.)
>
> A toy example:
>
> Original data: Suppose x and y are location and d1, d2 and d3 are measurements taken on different days.
> x   y   d1  d2  d3
> 1   1   NA  12  3
> 1   1   14  NA  5
> 1.3 1.5 8   NA  NA
> 15  17  11  21  7
>
> I would like to average the rows that are within a radius of 1 from each other and use the coordinates associated with the majority in the combination.
> x   y   d1  d2  d3
> 1   1   11  12  4
> 15  17  11  21  7
>
> Does what I've described have a name? Are there any built-in functions in R that will do it? If not, I would very much appreciate suggestions about how best to implement such a task.
>
> Some toy data to set me straight with:
> Each row is a unique monitor.
> My neighborhood radius of interest is 0.1.
> ###### BEGIN SNIP ############
> toyDat<- data.frame(x=runif(90), y=runif(90)) # SETTING UP A DATA FRAME
> toyDat[91:100,]<- toyDat[sample(90, 10),] # CREATING SOME EXACTLY DUPLICATED LOCATIONS
> toyDat[101:105,]<- toyDat[96:100,] + 0.02 # NOW I'LL HAVE AT LEAST 5 TRIADS INCLUDING ONE DUPLICATED LOCATION.
> toyDat$d1<- rnorm(105) # GIVING ALL THE MONITORS DATA
> toyDat$d2<- rnorm(105)
> toyDat$d1[sample(105, 15)]<- NA # SPRINKLING IN MISSING VALUES TO KEEP IT REALISTIC
> toyDat$d2[sample(105, 17)]<- NA
> ###### END SNIP ############
>
> Thanks so much,
>
> Molly Davies
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

-- 
Technische Universität München
Department für Pflanzenwissenschaften
Lehrstuhl für Grünlandlehre
Alte Akademie 12
85350 Freising / Germany
Phone: ++49 (0)8161 715324
Fax:   ++49 (0)8161 713243
email: tom.gottfried at wzw.tum.de
http://www.wzw.tum.de/gruenland