This tutorial introduces the WaterML R package. This tutorial shows an example how to retrieve data from the Hydrologic Information System and do statistical analysis in R.
#import required libraries library(WaterML)
#get the list of supported CUAHSI HIS services services <- GetServices()
http://hydroportal.cuahsi.org/ipswich/cuahsi_1_1.asmx?WSDLthat enlists volunteers to collect data on the health of the Ipswich River and its tributaries in Massachusetts, USA. We can use the
GetSites()functions to get the tables of variables and sites on the server.
#point to an CUAHSI HIS service and get a list of the variables and sites server <- "http://hydroportal.cuahsi.org/ipswich/cuahsi_1_1.asmx?WSDL" variables <- GetVariables(server) sites <- GetSites(server)
#get full site info for all sites using the GetSiteInfo method siteinfo <- GetSiteInfo(server, "IRWA:FB-BV")
IRWA:Temp) and dissolved oxygen (full variable code
IRWA:DO). In this example we get the values for all available days. Note that we can also use the
endDateparameters to restrict the time period of interest. To get help on the GetValues function, you can type
?GetValuesin the R console. Note that for this particular site there are 21 Temperature and 22 dissolved oxygen observations.
#get full site info for all sites using the GetSiteInfo method Temp <- GetValues(server,siteCode="IRWA:FB-BV",variableCode="IRWA:Temp") DO <- GetValues(server, siteCode="IRWA:FB-BV",variableCode="IRWA:DO")
points()function for adding the dissolved oxygen data points to the existing plot.
plot(DataValue~time, data=Temp, col="red") points(DataValue~time, data=DO, col="blue")
Note that the “time” represents the local time, and “DateTimeUTC” represents the UTC time. The “DateTimeUTC” columns are in POSIXct format. POSIXct is a special format in R for storing date and time. POSIXct represents the number of seconds since the beginning of 1970. You can use the strftime function to get the year, month, day, hour, minute and second corresponding to each time as shown below:
years <- strftime(DO$time, "%Y") months <- strftime(DO$time, "%m") days <- strftime(DO$time, "%d") hours <- strftime(DO$time, "%h") minutes <- strftime(DO$time, "%M") seconds <- strftime(DO$time, "%s")
#merge our two tables based on the time column data <- merge(DO, Temp, by="time") #rename the column DataValue.x in the merged table to "DO" names(data)[names(data)=="DataValue.x"] <- "DO" #rename the column DataValue.y in the merged table to "Temp" names(data)[names(data)=="DataValue.y"] <- "Temp"
# Perform a linear regression on the dissolved oxygen vs. temperature values model <- lm(DO~Temp, data=data)
The code creates two outputs when run in RStudio. First, it creates a scatter plot of dissolved oxygen concentration versus water temperature with the linear regression line.
Second, it outputs the results from the regression analysis. From these results, there appears to be a significant negative linear relationship between water temperature and dissolved oxygen at this site.
#> #> Call: #> lm(formula = DO ~ Temp, data = data) #> #> Residuals: #> Min 1Q Median 3Q Max #> -2.1091 -1.1377 -0.5184 1.1897 3.1494 #> #> Coefficients: #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) 9.00177 0.69022 13.042 1.54e-11 *** #> Temp -0.17927 0.04435 -4.042 0.000587 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> #> Residual standard error: 1.566 on 21 degrees of freedom #> Multiple R-squared: 0.4376, Adjusted R-squared: 0.4108 #> F-statistic: 16.34 on 1 and 21 DF, p-value: 0.0005871
This tutorial shows how you can use the WaterML library in R to access data from a CUAHSI HIS web service directly within R without the need to first download data to your local computer. While this was demonstrated for a data service hosted by Ipswich River Watershed Association, the WaterML R package can be used to access data from any compliant CUAHSI HIS web service including the 100+ data services listed on the HIS Central website.
For additional information on the tutorial and the WaterML R Package, please refer to:
Jiri Kadlec, Bryn StClair, Daniel P.Ames, Richard A. Gill (2015). WaterML R package for managing ecological experiment data on a CUAHSI HydroServer. Ecological Informatics, 28, 19-28. http://www.sciencedirect.com/science/article/pii/S1574954115000801