Pilot implementation of the US EPA interoperable watershed network
© The Author(s). 2017
Received: 23 December 2016
Accepted: 20 April 2017
Published: 24 May 2017
The mission of the United States Environmental Protection Agency (EPA) is to protect human health and the environment, including air, water and land. Understanding the extent of pollution in waters and identifying waters for protection has been based in part on water quality monitoring data collected and shared by parties (federal, state, tribal, and local) throughout the U.S. To date, this monitoring data has been largely represented by data collected as a water quality sample (data collected by a technician in the field or analyzed in a lab). EPA’s “STORage and RETrieval” (STORET) and the Water Quality Exchange (WQX) have served as the repository for all this sampling data. However, these tools and systems were not designed to handle today’s continuous water quality sensors. EPA has therefore embarked on the Interoperable Watersheds Network (IWN) project, which is focused on identifying a common set of formats and standards for data, and on testing and validating these standards as well as new ways of sharing data and metadata. The completed IWN will greatly expand the sharing of data and its use, thereby streamlining the assessment, restoration, and protection of surface water quality at all levels of government.
Stakeholder workgroups were engaged to assist with developing requirements for the three major project components: required attributes and query capability for a centralized metadata catalog, technological and data requirements for data providers, and desired functionality for a web-based discovery tool that provides access to the catalog services and provider data.
The pilot implementation of IWN uses the Open Geospatial Consortium (OGC) Sensor Observation Service (SOS) 2.0 and WaterML2 standards as the foundation for a distributed sensor data sharing network. Data owners in locations across the United States have worked with EPA to publish their continuous sensor data and related metadata either through “data appliances” running the open-source 52° North implementation of SOS or using commercial software like Kisters’ KiWIS product.
Metadata are harvested into a centralized catalog that provides a REST Service API for sensor discovery. Users can discover data by querying for specific parameters, or using spatial boundaries such as HUC, county, a buffered point, or a user defined polygon. The sensor results are returned as GeoJSON, which can be used to create maps. The API also provides the service endpoints for the sensors, which can be used to access the continuous data to create charts or download the data for other analysis.
The pilot IWN demonstrates that standards-based interoperability can provide a sound basis for a national-scale clearinghouse for continuous sensor data, though scalability of the approach will need further testing. Selected technical detail, lessons learned, and future plans for the IWN are included in the discussion.
KeywordsOGC SOS OGC WaterML Discoverability Access Sensor Data Water Resources
The United States Environmental Protection Agency (EPA) mission is the protection of human health and the environment, including the waters of the United States. EPA’s “STORage and RETrieval” (STORET) data system  has been used to collect and hold millions of water quality sample measurements and associated metadata collected since the 1960s. Additional systems like the Water Quality Exchange (WQX)  and the Water Quality Portal (WQP)  have facilitated the communications and exchange of water quality sampling data between data providers and promoted discoverability of and access to data across agencies. However, STORET, WQX and WQP emphasize the handling of discretely sampled “grab” data and are not well-suited to manage high-frequency “continuous” data generated by modern, affordable water quality monitoring sensors. The use of these sensors is becoming ubiquitous with a proliferation of this telemetered ‘real-time’ data on the internet and development of new sensor technology for nutrients and other parameters of interest promises to expand and diversify applications.
The worst-case “as-is” scenario is widespread, with collected data not passing beyond the organization and discoverability minimized or non-existent;
EPA’s STORET/WQX/WQP system for water quality data makes centralized grab sample data readily discoverable and accessible, but is not well-structured for handling continuous data;
EPA’s AirNow centralized system handles continuous data well, but is currently focused on a highly controlled, homogenous set of parameters;
The US Geological Survey’s National Water Information System delivers centralized water data using OGC services; and
The Integrated Ocean Observing System is built around OGC standards such as SOS and combines a centralized catalog with distributed data;
Organizations publish their data using SOS services through a variety of means;
Data services and organizations are registered in a centralized catalog;
Discovery and analysis are supported through a portal complementary to WQP for human use and through an application programming interface (API) for machine-to-machine use cases.
Hackensack-Passaic, New Jersey. The New Jersey Department of Environmental Protection (NJDEP) and the Meadowlands Environmental Research Institute (MERI) operate sensors in and around the Passaic River.
Little Miami, Ohio. EPA’s Office of Research and Development (ORD) and Clermont County each operate sensors on reservoirs, tributaries and the main stem of the Little Miami River.
Stakeholders in the watersheds were engaged in site visits and on monthly calls to develop use cases, to define data workflows and attendant technology stacks, and to provide feedback throughout.
A straightforward software development approach was used that first elicited requirements for the major projected components and then iteratively implemented the components with many opportunities for stakeholder input.
Identify required attributes and query capability for a centralized metadata catalog,
Specify technological and data requirements for data providers, and
Define desired functionality for a web-based discovery tool that provides access to the catalog services and provider data.
Short, simple descriptions were solicited from representatives of the pilot watersheds to define user stories. These descriptions of desirable features presented from the stakeholder perspective were used as the launching point for an agile development process. Regular interactions with stakeholders served to inform the implementation towards its responsive endpoint.
Results and discussion
Catalog Development. A metadata catalog was developed to contain and serve necessary information to meet user expectations.
SOS Services. Partner organizations identified how best to make their data available through SOS services and then implemented—or helped to implement—the services.
Discovery Tool. The Currents discovery tool consumes metadata and data services to enable data discovery and access.
As implemented, IWN data is currently made available using WaterML 2 and SOS 2.0 through either 52 N or Kisters servers with the SOS 2.0 Hydrology Profile enabled, so data services are compliant with the requirements in the Best Practice document for the OGC SOS 2.0 hydrology profile for SOS 2.0 implementations serving OGC WaterML 2.0 . In addition, the related catalog and Currents discovery tool fulfill the common cases requirements for data discovery and download established in the Scope section of the Best Practice document.
User Stories From Pilot Watersheds
Hackensack-Passaic River Watershed
A. Drinking Water/Source Water Protection Early Warning
“As a water manager, I want to view trends in selected parameters so that I can predict and remediate a water quality issue before it occurs.”
B. Water Quality Assessment (for Clean Water Act Integrated Reporting)
“As a water quality manager, I want to download continuous monitoring parameter data so that I can compare it to numeric criteria and evaluate if the water is meeting state standards.”
Little Miami River Watershed
A. Water Safety (Drinking Water and Recreation) optimization: Maximizing output while minimizing cost with a Harmful Algal Bloom focus
“As a water manager, I need to detect water quality issues, such as a harmful algal bloom, so that I can alert the public.”
B. TMDL Implementation
“As a water quality manager, I need to be able to download parameter data for use in running TMDL models. “
The Source Water Protection (Hackensack-Passaic A) and Water Safety (Little Miami A) user stories share a need for discovery and visualization, while the Water Quality Assessment (Hackensack-Passic B) and TMDL Implementation (Little Miami B) user stories call for large multiple-site, multiple-parameter downloads.
Standardized vocabulary for parameter names. Parameter names supplied by data providers are mapped to the appropriate name in the Substance Registry Service, which is EPA’s “authoritative resource for basic information about chemicals, biological organisms, and other substances of interest to EPA and its state and tribal partners.” 
Quality Assurance/Quality control (QA/QC) field. The “Sensor QAQC” field provides a simplified mechanism for linking to appropriate QA/QC data such as sensor maintenance reports. The expectation is that data providers will populate this field with a hyperlink that points to the providers’ collection of relevant QA/QC data and metadata.
QA/QC status. Although some providers (e.g. the US Geological Survey ()) are able to provide observation-specific data qualifiers, QC status is generally not consistently available, and is not directly represented in the catalog data model. QC status is instead encoded as part of the SOS procedures.
GetOrganizations retrieves the list of organizations that are currently registered as data providers along with service end point, the date of the most recent data harvest, when the server was last pinged, and an indication of whether the endpoint is available. The service accepts an optional organization id (org_id) parameter which limits the results to the requested organization.
AvailableParameters returns the list of parameters that are available for query via the metadata catalog.
GetSensors (multiple) returns a feature collection which specifies the siteId, siteName, orgId and geometry (type and coordinates) of a sensor. There are separate services for spatial filtering by county, hydrologic unit, circular buffer, bounding box, and upstream/downstream relationship. All of these services accept an organization id (derivable from the getOrganizations service) and parameter id (from availableParameters), as well as a minimum and maximum observation date to constrain results.
GetSensorParameters returns the list of parameters that are registered in the catalog for the specified input sensor ID.
GetOrganizationParameters returns the list of parameters that are registered in the catalog for the input organization ID.
GetSensorParameters and GetOrganizationParameters results both include the organization’s parameter IDs for use in querying data by parameter directly from the organization’s service endpoint. The catalog harvests metadata from registered organizations’ service endpoints daily.
Partner Configurations Implemented for Pilot IWN
Linux server hosted at Rutgers University
Manual batch updates only
Linux server hosted on Amazon Web Services t2.micro instance
Near-realtime updates pulled from third-party site
Linux server hosted in EPA private cloud
Manual batch updates only
Windows server on premises
Near-realtime updates pulled from Flowlink (vendor) database on same server
Details of setup and configuration are provided in a supporting GitHub repository at https://github.com/IWN-Currents/OGD-materials.
Pilot IWN “Partners of Opportunity”
EPA Region 1, Region 10
Linux server hosted in EPA private cloud (same as EPA ORD)
Near-realtime updates pulled from EPA Region 1 and third-party sites
EPA Region 7
Kisters WISKI/KiWIS installation
SOS 2.0/WaterML 2 services provided by KiWIS
National Water Information System
Provides WaterML2-formatted data retrieval; required custom code for metadata harvesting
There is a common naming scheme for procedures, offerings, features and templates that reflects the IWN project, object type, data provider organization and sub-organization, location, data status, and parameter (e.g.urn:x-epaiwpp:template:epa:ord:esf-weather:raw:light-3).
Observed data from each sensor in the system are presented to the user as a SOS Observation Offering.
Offerings are each linked to an SOS Procedure describing the sensor that produced the data in the offering.
Sensor procedures for all of the sensors at a station are grouped together as children of a station procedure. Each station procedure has an offering that is “undefined”.
Station and sensor procedures contain sub-organizational contact information, while the Provider section for the SOS installation contains organizational contact information.
The ingestion code inserts stations, sensors, and observations using the SOS API, which allows it to be run locally or remotely, though local operation is recommended to simplify security settings for the SOS client. The code checks the SOS database to identify the most recent available observation for a given parameter and station, and only uploads observations that are more recent. Two typical use cases have been identified in the IWN pilot project: direct manual (batch) use for the occasional injection of long-term, typically historical and lengthy records, and scheduled invocation of a.sh (Linux) or.bat/.vbs (Windows) script for continuous near-realtime updates.
The ingestion script, example supporting files, and data appliance setup instructions are available from the GitHub repository at https://github.com/IWN-Currents/OGD-materials.
Features selected for query functionality were identified using input from the partner workgroups. The Currents tool allows users to filter data by organization, parameters monitored, and by identifying a date range for the observation results. Users can additionally use spatial parameters, such as the current map window, a user defined polygon and HUC-8 watershed or county boundaries to refine their selections. Partners also expressed a desire to select sites using a point and specified buffer distance and using stream network navigation; these features are included in the metadata catalog services, but are not yet available in the Currents tool.
The successful implementation of the pilot IWN demonstrates the feasibility of the original strategy for sharing continuous data, although scalability of the approach will be a concern. In particular, bandwidth, storage, and CPU requirements for the catalog server will likely increase as data providers engage with the IWN and register more data appliances. Data providers are deemed unlikely to run into scalability issues as data appliances configured for this pilot ran successfully on with minimal resources (e.g. Amazon Web Services’ most-lightweight hardware configuration – t2.micro).
Deployment of data appliances with varying configurations matching providers’ data output formats.
Implementation of an automatically updating metadata catalog and attendant API for web-based queries.
Standards-based integration into the catalog of metadata both from IWN data appliances and from other interoperable data sources, demonstrating that a standards-based approach can address data source heterogeneity.
Design and development of the web-based discovery and access Currents tool to fully leverage the catalog and data source APIs, e.g. by adding upstream/downstream selection and access to all metadata elements.
The DeleteObservation request added by 52 North as an extension to the SOS standard is of high value and worth adding to the standard. Data partners sometimes identified errors in their data after posting, and DeleteObservation supports the replacement of erroneous data
Observation-specific data qualifiers would be useful for the IWN to support user quality control information needs, but data qualifiers as defined in the WaterML2 standard (e.g. <wml2:qualifier xlink>) are not yet supported in the 52 N SOS database model. Observation-specific qualifiers can be included with InsertObservation requests using the < om:parameter > tag in the current development branch for 52 N SOS, but cannot be entered in InsertResult requests. Implementation of wml2 qualifiers and/or om parameters is desirable.
Downloading of results from large GetObservation requests can be time-consuming, and it would be useful to provide the user with feedback on the progress of their request. One way SOS might help is to allow a temporal filter to be placed on the GetDataAvailabilityRequest to allow the querying individual/software to assess roughly how large a given retrieval might be to set expectations and perhaps strategy, such as breaking down the retrieval into smaller subretrievals.
Guidance on harmonizing data appliance deployment with organizational IT policies
Improved handling of QA/QC.
Multiple-parameter, multiple-station visualization and download capability in Currents.
Addition of sub-organizational contacts to metadata catalog and Currents discovery tool.
Selection of stations in Currents using the API’s point-and-buffer and upstream/downstream services.
A mobile Currents application.
To align complementary efforts and promote interoperability, the next IWN phase will encompass coordination and cooperation with other Federal agencies (e.g. USGS, NOAA) and academia (e.g. Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI)). Additionally, EPA hopes to engage with the private sector to encourage sensor and data management vendors to provide SOS and WaterML 2 access to data.
A description is provided here of the naming scheme for SOS objects on IWM data appliances. Basic recipes and other information for installing and configuring 52°North SOS and the pilot IWN ingestion script are provided at https://github.com/IWN-Currents/OGD-materials.
IWN uniform resource name (URN) scheme
Uniform Resource Names (URNs) are used extensively to provide unique machine-readable identifiers for different entities represented in 52 N SOS-based data appliances deployed on behalf of Pilot partners. URNs were chosen instead of Uniform Resource Locators (URLs) to simplify data provider requirements by removing the need to provide resolvable endpoints on data appliances.
In general, URNs consist of the term “urn:” followed by a namespace ID and a namespace-specific string. The namespace ID is currently “x-epaiwpp”, so all URNs will begin with the text “urn:x-epaiwpp”. The organization, suborganization, station, and parameter IDs are specified in metadata files for the data appliance.
Organizations, suborganization and station IDs
usepa – United States Environmental Protection Agency
njdep – New Jersey Department of Environmental Protection
njmeri – Meadowlands Environmental Research Institute (located in New Jersey)
ohclecty – Clermont County, Ohio – OR OH39025 (FIPS-BASED)
usepa:ord – Office of Research and Development (EPA)
ohclecty:wrd – Water Resources Division of Clermont County, Ohio
njmeri:meri – no suborganization, organization acronym repeated
Station IDs are assigned by the organization, and consist of alphanumeric characters.
Parameter IDs are used to uniquely identify observable properties, sensors, features, offerings and templates within 52 N SOS. The parameter IDs must be consistent across all organizations.
Observable property URNs
Station, offering, sensor, feature, and template URNs
Station URNs identify platforms deployed for sensors, and consist of the namespace ID followed by the classifier “station”, and the organization, suborganization, and station IDs:
Sensor URNs identify sensors deployed at a platform, and consist of the namespace ID followed by the classifier “sensor”, the organization, suborganization, and station IDs, a data quality status indicator (“raw”,”provisional” or “final”), and the sensor parameter:
The work reported on in this paper was funded by the U.S. Environmental Protection Agency (EPA). EPA staff participated in the work, providing substantial input on design specifications and reporting requirements, and also identified and coordinated with stakeholders in the pilot watersheds for this project. Additionally, EPA staff (D. Young and B. Dean) are co-authors.
All authors participated directly in the work reported on in this paper and contributed materially to the manuscript. Specifically: TS led the implementation of “data appliances.” He drafted the outline, abstract, results (data appliance) and methods, and coordinated synthesis, editing and submittal of the manuscript. KS was responsible for development of the Currents discovery tool and also led the elicitation of specifications and user stories from pilot watershed partners. She wrote the results (discovery tool) section of the manuscript and provided editorial review. BB was responsible for design and implementation of the metadata catalog and deployment of the catalog on EPA-supplied hardware. He was responsible for the writeup of the catalog in the manuscript. DY was the EPA project lead and also served as liaison to EPA and other federal agencies and offices with similar needs. He was responsible for the paper’s introduction and conclusions. BD supported D. Young in liaison roles, and also worked on the use of the EPA SRS as a basis for a standardized parameter set. She provided editorial review on the paper. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
- 52° North. “Sensor Observation Service”. 2016. http://52north.org/communities/sensorweb/sos/. Accessed 20 Dec 2016.Google Scholar
- Advisory Committee on Water Information. “Open Water Data Initiative Overview”. 2014. https://acwi.gov/spatial/owdi/. Accessed 28 Mar 2017.Google Scholar
- EPA. “Continuous Monitoring Data Sharing Strategy.” Prepared by Michael Baker International. Washington: LimnoTech and MapTech, Inc; 2015. under EPA Contract EP‐C‐12‐052 Order No. 0005.Google Scholar
- EPA. “What is STORET and how does it relate to WQX”. 2016a. https://www.epa.gov/waterdata/frequent-questions-about-storage-and-retrieval-storet#101. Accessed 20 Dec 2016.
- EPA. “STORET/WQX: What is WQX?”. 2016b. https://www.epa.gov/waterdata/frequent-questions-about-storage-and-retrieval-storet#103. Accessed 20 Dec 2016.
- EPA. “About Substance Registry Services”. 2016c. https://iaspub.epa.gov/sor_internet/registry/substreg/home/overview/home.do. Accessed 20 Dec 2016.
- Open Geospatial Consortium. “OGC® Sensor Observation Service 2.0 Hydrology Profile”. 2014. http://docs.opengeospatial.org/bp/14-004r1/14-004r1.html#requirement_1. Accessed 28 March 2017.Google Scholar
- Open Geospatial Consortium. “Sensor Observation Service”. 2016a. http://www.opengeospatial.org/standards/sos. Accessed 20 Dec 2016.
- Open Geospatial Consortium. “OGC® WaterML”. 2016b. http://www.opengeospatial.org/standards/waterml. Accessed 20 Dec 2016.
- Slawecki TAD, Young D, Perez B, McLellan P. “A Draft EPA Strategy for Sharing Continuous Monitoring Data”. In: Proceedings of the Water Environment Federation, WEFTEC 2015: Session 610 through Session 611. 2015. p. 5291–5303(13). doi:https://doi.org/10.2175/193864715819522919.Google Scholar
- USGS, EPA, USDA. “What is the WQP”. 2016. http://www.waterqualitydata.us/wqp_description/. Accessed 20 Dec.Google Scholar