[R-sig-Geo] Subsetting dataframe by all factor levels
Justin H.
j@tnhllrd @ending from gm@il@com
Fri Sep 14 20:03:41 CEST 2018
Hi Rich,
For the sake of example, here's a solution for a simple aggregation.
>aggregate(rainfall, list(rainfall$name), mean) #This will aggregate all
columns and determine their mean. You're left with 58 rows.
>aggregate( rainfall[, #:#], list(rainfall$name), mean) #In case you only
want to aggregate over select columns.
I am assuming you want rows with every combination of year and station with
their average precipitations. To aggregate it in that way you will need to
create a new column that represents the year (or month/year if the data are
appropriate for that resolution).
>rainfall.year<-with(rainfall, tapply(prcp, list(name, year), mean)) #This
does the aggregation.
>rainfall.year<-data.frame(as.table(rainfall.year)) #However, you are
given a "wide" data frame. This makes it "long" as you probably want it.
A for-do-done loop option.
for (i in levels(rainfall.year[,#year])) {
print(i)
print(mean(rainfall.year[rainfall.year$year==i,#prcp]))
}
The loop will return the mean rainfall per year, where #year is the number
for the year column and #prcp is for precipitation.
Try running that loop to see that it is properly looping through the factor
you want and then stick in the interpolation function.
I hope that helps!
Cheers,
Justin
On Fri, Sep 14, 2018 at 1:13 PM Rich Shepard <rshepard using appl-ecosys.com>
wrote:
> I need to learn geospatial analyses in R to complement my GIS knowledge.
> I've just re-read the subsetting chapter in Hadley's 'Advanced R' without
> seeing how to create separate data frames based by extracting all rows for
> each site name in the parent data frame in one step. I believe that what I
> need to do is create a list of the factor names and feed them to a loop
> subsetting each to a new dataframe. Perhaps there's a better way unknown to
> me and I need advice, suggestions, and recommendations how to proceed.
>
> The inclusive data frame has this structure:
>
> str(rainfall)
> 'data.frame': 113569 obs. of 6 variables:
> $ name : Factor w/ 58 levels "Blazed Alder",..: 20 20 20 20 20 20 20
> ...
> $ easting : num 2370575 2370575 2370575 2370575 2370575 ...
> $ northing: num 199338 199338 199338 199338 199338 ...
> $ elev : num 228 228 228 228 228 228 228 228 228 228 ...
> $ sampdate: Date, format: "2005-01-01" "2005-01-02" ...
> $ prcp : num 0.59 0.08 0.1 0 0 0.02 0.05 0.1 0 0.02 ...
>
> My goal is to use the monthly mean rainfall at each of the 58 reporting
> stations to interpolate/extrapolate rainfall over the entire county for
> selected years to show variability. The data points are not evenly
> distributed but clustered in more populated areas and dispersed in rural
> areas. My geochemical data typically are like this and I need to also learn
> how this distribution affects how the data are analyzed.
>
> TIA,
>
> Rich
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo using r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
[[alternative HTML version deleted]]
More information about the R-sig-Geo
mailing list