Harvard WorldMap is an open source Geospatial Content Management System (GeoCMS), containing a large number of geospatial datasets, which requires a framework to return to end users the most relevant and reliable results.
In this section we will provide an overview on GeoCMSs and their background problem domain and related technologies.
Geospatial content management systems
A Content Management System (CMS) is a web application, which allows users to work within a collaborative environment to create and update digital content. This content can exist in a number of different forms (blog posts, articles, images, videos etc) and it can be typically created, reviewed and published by applying a revision workflow and can be shared with other users and/or group of users through a granular permission system. In the last two decades a number of popular open source frameworks have gained popularity: some notable ones in the open source arena are Joomla, Drupal and Wordpress (based on the PHP language); Liferay (based on the Java language) and Plone and Django CMS (based on Python).
A different class of CMS, which we refer to as Data Content Management Systems (DCMS), is oriented toward the storage and distribution of open data and their metadata. CKAN is an open source web-based management system built on the Python/Pylons web framework, the PostgreSQL database, and Solr for search. CKAN is used by a number of public institutions to create data catalogues and registries, enabling users to share open data and make them discoverable and presentable. Data.gov, for example, is a United States government website built with CKAN, which has been in operation since 2009, and publishing open data at a national scale [7].
Another example for DCMS is Dataverse (built on Java/Glassfish), an open source web application that enables users to share, cite, explore and analyze research data. Its development started in 2006 at the Institute for Quantitative Social Science (IQSS) at Harvard University [8]. A growing number of universities and organizations around the world are running their own Dataverse repository instances.
Regarding the geospatial world, the focus of this paper, there are a number of frameworks being used to implement open geospatial web platforms. GeoCMS are DCMS which let the users create and update geospatial content and relative metadata. By using these frameworks it is possible to deploy Spatial Data Infrastructures (SDI) and/or geoportals.
A typical GeoCMS can provide users with some or all of the following abilities:
-
Upload vector and raster datasets (layers), or create a vector dataset (layer) from scratch. Layers can be stored in a spatial database and rendered with a map rendering engine
-
Create thematic styles for a layer
-
Edit the metadata of a layer
-
Edit the features of a vector layer
-
Set appropriate granular permissions for a layer to a user or a group of users. Permissions can be of different types: for example a user can be enabled to access and edit the metadata of a vector dataset, but not enabled to edit its features and its styles
-
Create web maps combining geospatial layers which have been uploaded to the GeoCMS
-
Provide interoperability with external clients using standards and open protocols
-
Harvest layers from external map services in order to make them available to the GeoCMS users
GeoCMS can be developed as a spatial extension of existing CMSs or as an independent framework which focus mainly on the management of spatial information. These systems are typically developed within a web framework that uses a spatial database to store the data and a map rendering engine to generate the map tiles.
Most of the open source GeoCMSs use PostgreSQLFootnote 1 with PostGISFootnote 2 as spatial database and GeoServerFootnote 3 or MapServerFootnote 4 as the map rendering engine.
A JavaScript mapping library is used to implement the web maps, most common ones are OpenLayersFootnote 5 and LeafletFootnote 6.
More sophisticated GeoCMS can include in the stack a map caching engine and a Catalogue Service for the Web (CSW)Footnote 7 instance.
One of the first GeoCMS frameworks was PrimaGIS (beginning of 2000), built on the Plone CMS using MapServer as a map rendering engine.
In the last decade the most widely adopted open source GeoCMS implementations have been GeoNodeFootnote 8, MapbenderFootnote 9, CartaroFootnote 10 and GeoNetwork (which evolved from a CSW implementation to a complete GeoCMS solution)Footnote 11. CARTO (formerly CartoDB)Footnote 12 is a powerful cloud computing mapping platform. Its underlying source code, based on JavaScript, Node.js and PostgreSQL/PostGIS, is released as open source. There are however few CARTO instances running other than the main commercial one run by the CARTO company.
CKAN itself offers a number of geospatial features. For example, a non-spatial dataset can be geo-indexed and made it searchable by location by associating to it a GeoJSON geometry. Spatial search is performed by the spatial features of the CKAN search engine (Solr) or alternatively using a PostGIS spatial database. CKAN provides a way to harvest external CSW servers (federation). Thanks to the integration with pycsw, it can also provide a fully compliant CSW interface for the harvested records.
By using a GeoCMS, which uses a full stack of components (a spatial database, a map rendering engine, a map caching engine, a CSW catalogue), it is possible to implement an SDI and/or a solution for the storage and distribution of open data.
Search engines
CMS first, and later GeoCMS and open data portals started embedding a search engine in their architecture in order to make content and data easily discoverable. The most widely adopted open source search engine frameworks are SolrFootnote 13 and ElasticsearchFootnote 14, both of which are based on the Java LuceneFootnote 15 search library. Search engine technology provides fast scored results to end users and includes features similar to web search engines such as Google, Yahoo and Bing.
It is very common to pair a CMS with Solr or Elasticsearch. GeoCMSs have adopted this trend as well: in GeoNode, since the earliest versions, it has been possible to add Solr or Elasticsearch to the stack in order to make content more easily discoverable.
Web map services
On the web there is a large number of web map services exposing much useful geospatial information using Open Geospatial Consortium (OGC) standards or open protocols. Some popular OGC standards are Web Map Service (WMS) [9], Web Map Service Tile Caching (WMS-C)Footnote 16, Web Feature Service (WFS)Footnote 17, Web Coverage Service (WCS) [10], Catalogue Service for The Web (CSW). A set of very widely used and powerful open protocols are Esri ArcGIS Representational state transfer (REST) MapServer, ImageServer and FeatureServerFootnote 18.
Thanks to these standards and open protocols, a tremendous volume of geospatial information can be accessed by clients such as desktop platforms (QGIS, gvSIG, GRASS GIS, Esri ArcGIS, etc...) and by web platforms (GeoCMS and SDI) which can federate the services. For example GeoNode allows users to register a remote web map service in order to gain access to all of its published layers. These layers can then be used and combined with the native GeoNode layers to create web maps.
Harvard WorldMap
Since 2010 the CGA has been developing and maintaining WorldMap [11]Footnote 19, a GeoCMS and open data platform that enables registered users to publish geospatial content on the web. Users can upload geospatial vector and raster datasets to the platform and combine them with existing datasets to create web maps. Existing datasets can be data which other users have uploaded, or data exposed by external web servers.
WorldMap is based on GeoNode and at the time of writing has been used by more than 20,000 registered users and provides access to about 120,000 map layers and 5000 maps (collections of layers).
In a WorldMap map object, users can combine local layers and remote layers (Fig. 1).
A local layer is a geospatial dataset managed by GeoNode: the data is stored as a PostgreSQL/PostGIS table if the layer is a vector dataset, or as a Geotiff file on disk if the layer is a raster dataset. In both cases the layer is displayed in the mapping client, which is based on the OpenLayers JavaScript library, using the OGC WMS standard [9] or the WMS-C specification, which are implemented by the GeoNode rendering engine, GeoServer. Whenever the client needs to access the coordinates of the feature’s geometry, the OGC WFS Footnote 20 and the Web Feature Service - Transaction (WFS-T) standards, in GeoServer, are used.
A remote layer is a layer published in an external web map service. Most of the web map services are implemented using OGC Web Standards or specifications - like WMS, WMS-C, Tile Map Service (TMS), Web Map Tile Service (WMTS), WFS - or custom open protocols such as the Esri ArcGIS REST MapServer and ImageServer.
Hypermap registry
In a WorldMap map object it is possible to combine local GeoNode layers and remote layers from external web map services. Because there is a very large number of local and remote layers to search, a search platform is needed to enable WorldMap users to discover the most appropriate and reliable dataset for their specific need. The requirements of such a platform are:
-
Enable creation and maintenance of a registry of web map services, exposed as OGC standards and Esri REST endpoints
-
Make the collected geospatial information easily discoverable
-
Constantly monitor layers status in order to filter out from users search layers which are not reliable: for example, layers which are published in not constantly up and running web map services
-
Collect usage statistics to enable crowd curation of local and remote layers
-
Provide instant previews (thumbnails) of local and remote layers
-
Support visualization of geographic distribution of results returned
-
Support robust search by time as well as space
In 2015 CGA started the design and development of Hypermap Registry (referred also as Hypermap), a registry platform to harvest and manage a large catalogue of web map services. Hypermap is released under the Massachusetts Institute of Technology (MIT) open source license, and is hosted on GitHubFootnote 21.
CGA runs a public instance of this platform, named Harvard Hypermap (HHypermap)Footnote 22, which is used by WorldMap to enable the users to search and use layers in their maps.
HHypermap implements a number of features in an attempt to provide high quality and reliable results to WorldMap users searching geospatial content published in web map services.
CGA staff has harvested a large number of web map services and layers using HHypermap. This information is handled in a relational database, and more services and layers can be added using the Hypermap user interface by system administrators or external users who can suggest map service endpoints. Hypermap is a web application with a public user interface providing users with access to the service information collected. Administrators can manage the information, for example adding a new map service to harvest or checking the health of known services.