[R-sig-Geo] Subsetting dataframe by all factor levels

Justin H. j@tnhllrd @ending from gm@il@com
Fri Sep 14 20:03:41 CEST 2018


Hi Rich,

For the sake of example, here's a solution for a simple aggregation.

>aggregate(rainfall, list(rainfall$name), mean)  #This will aggregate all
columns and determine their mean. You're left with 58 rows.
>aggregate( rainfall[, #:#], list(rainfall$name), mean)  #In case you only
want to aggregate over select columns.


I am assuming you want rows with every combination of year and station with
their average precipitations. To aggregate it in that way you will need to
create a new column that represents the year (or month/year if the data are
appropriate for that resolution).

>rainfall.year<-with(rainfall, tapply(prcp, list(name, year), mean))  #This
does the aggregation.
>rainfall.year<-data.frame(as.table(rainfall.year))  #However, you are
given a "wide" data frame. This makes it "long" as you probably want it.

A for-do-done loop option.

for (i in levels(rainfall.year[,#year])) {
print(i)
print(mean(rainfall.year[rainfall.year$year==i,#prcp]))
}

The loop will return the mean rainfall per year, where #year is the number
for the year column and #prcp is for precipitation.
Try running that loop to see that it is properly looping through the factor
you want and then stick in the interpolation function.

I hope that helps!

Cheers,
Justin

On Fri, Sep 14, 2018 at 1:13 PM Rich Shepard <rshepard using appl-ecosys.com>
wrote:

>    I need to learn geospatial analyses in R to complement my GIS knowledge.
> I've just re-read the subsetting chapter in Hadley's 'Advanced R' without
> seeing how to create separate data frames based by extracting all rows for
> each site name in the parent data frame in one step. I believe that what I
> need to do is create a list of the factor names and feed them to a loop
> subsetting each to a new dataframe. Perhaps there's a better way unknown to
> me and I need advice, suggestions, and recommendations how to proceed.
>
>    The inclusive data frame has this structure:
>
> str(rainfall)
> 'data.frame':   113569 obs. of  6 variables:
>   $ name    : Factor w/ 58 levels "Blazed Alder",..: 20 20 20 20 20 20 20
> ...
>   $ easting : num  2370575 2370575 2370575 2370575 2370575 ...
>   $ northing: num  199338 199338 199338 199338 199338 ...
>   $ elev    : num  228 228 228 228 228 228 228 228 228 228 ...
>   $ sampdate: Date, format: "2005-01-01" "2005-01-02" ...
>   $ prcp    : num  0.59 0.08 0.1 0 0 0.02 0.05 0.1 0 0.02 ...
>
>    My goal is to use the monthly mean rainfall at each of the 58 reporting
> stations to interpolate/extrapolate rainfall over the entire county for
> selected years to show variability. The data points are not evenly
> distributed but clustered in more populated areas and dispersed in rural
> areas. My geochemical data typically are like this and I need to also learn
> how this distribution affects how the data are analyzed.
>
> TIA,
>
> Rich
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo using r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

	[[alternative HTML version deleted]]



More information about the R-sig-Geo mailing list