The GRASPgfs has therefore focused on designing and implementing a flexible, interoperable platform based on open source softwareFootnote 4 compliant with GEOSSFootnote 5 using OGCFootnote 6 standards and services for data and processing capabilities. From delivering a flexible, integrative and sharing eGRASP web platform based on openess, the objectives of enabling researchers in crop modelling, agro-ecological modelling either as developer of new models or evaluating agriculture strategies (agro-ecomic modelling), to seamlessly re-use existing models and specific data such genetic-trait information will be achieved. For efficiency and controls on the quality in terms of uncertainty and variability of the outcomes, the design of the platform allowed functionalities to easily browse and visualise metadata as well as has to geo-computationally evaluate workflows output uncertainties [15, 25, 28]. Spatial analysis of the spatial variations either of the predicted outcomes and their uncertainties were included in the design to be part of the platform as well. That way the modelling part and of the decision making part are interlinked, allowing more flexibility and adaptability. The approach and the concept of the eGRASP platform has been the result of multidisciplinary exchanges leading to a real transdisciplinary vision [4, 21, 38] that is highlighted in the next section.
Emergence of a transdisciplinary vision
Whilst building up a core collaboration on this topic from a range of disciplines (within environmental and human geography, crop science, geospatial information, and computing science) at the University of Nottingham by meeting regularly and having small funding for a few summer internships in 2010, the common vision expressed in Fig. 1 started to emerge. Later on, thanks to a 18 months pump prime funding from the BBSRC the research work could start. The workflow of Fig. 1 encapsulates the vision put into the design of the eGRASP platform as much as it is a template of potential modelling scenarios envisioning the various components as data and processes needed to consider fulfilling our objectives for GRASPgfs. If at first it may have seemed that the geospatial sciences brought tools enabling this research within a cross-disciplinary perspective, it transformed rapidly into acting as a media of a more holistic integrated approach [16], which then expressed itself in challenging its specific developments within a context beyond the disciplines involved. In addition to providing more opportunities for expanding the capabilities and applications looked for in the first place, this advancement also created new avenues for interdisciplinary research and practices in the use of GIS in agriculture research.
Beyond the global concept and concepts encapsulated in it, Fig. 1 is a truly transverse vision that not only put each specialist of a sub-model within a contextual flow but also enriches the geospatial e-infrastructure modelling framework. This resulted from various flow diagrams of conceptual information into a technical and standardised representation using a cross-disciplinary encoding standard, the BPMN standard (Business Process Modelling Notation from the OMG standard organisation). As far as the cross-disciplinary concerns, Fig. 1 as a BPMN representation is also a scientific geo-computational model seen from a meta-level description that can be linked to a workflow engine enabling its computational execution once instantiated (Fig. 2).
In order to instantiate such models (entire Fig. 1 or sub-models encapsulated) the design of the eGRASP platform is based on the Use Case model in Fig. 2, which translates the requirements exposed earlier. In this figure only general use cases are presented with different colours to express the different domains or disciplines concerned: the green use cases reflect the crop genetic with genetic-trait information aspects, the yellow use cases concern geospatial science with visualisation and selection of environmental constraints, the bleu use cases are to do with geocomputational modelling and scientific workflow composition and evaluation, and the pink use cases concerns crop epidemiology with the risk factors associate with the crop modelling including pests and disease risks from pathogens information.
Like UMLFootnote 7 (Unified Modelling Language) particularly using class diagrams for object modelling and use case diagrams such as in Fig. 2, has been enabling cross-disciplinary exchanges from data modelling [22], the BPMN language establishes a bridge between the conceptual integrated modelling towards the effective execution of the models [44]. Facilitating the composition of such workflows using existing resources is paramount [11].
Crop modelling complexity
Well-known crop modelling approaches such as APSIMFootnote 8 [19], AquaCropFootnote 9 are considered here as expressing or being a sub|-model of the “trait variation forecast integration”. The purpose of the GRASPgfs is to re-use directly these established models within a flexible platform; they can be wrapped into OGC web processing services (WPS) and made available for the platform as such [10, 35] or via a brokering system [7, 39]. When the models can be broken down into sub-components, if required by the crop-trait variation scenario, this can be made available to the processing service. When possible the interaction of these models can be complex to set up and to combine, the BPMN editor is seen as a simplification, particularly when a few models are to be combined. Ultimately it brings interoperability in interfacing heterogeneous data and processing models that do not necessarily impose standardisation for each of them. This does not preclude of course a good understanding of the models used, but the goal of the eGRASP platform is to hide this complexity and to focus on the ability to re-use the resources within a more macro scenario for global food security. Models and types of models identified in introduction can be potentially re-used here and the platform objectives are also to facilitate their encapsulation as WPS services (Fig. 3).
When looking at trait variation with genotypic information, the crop modelling may start with building up a selection for trait-variation linked to genotype linkage and environment interaction. This corresponds to the “Trait Hypothesis Construction” process task in the generic workflow. To this end, it is described in Fig. 2 among the other capabilities of the eGRASP platform; the functionalities associated to this genetic-trait selection, before performing the crop modeling for example, are the green part of the use case model. To achieve this aspect the platform is reusing the CropStoreDBFootnote 10 database, called GeoGermplasmDB in the architecture design (Fig. 4). The GeoGermplasmDB has an extended schema in order to record the geometry associated to few tables using the OGC standard (Fig. 3) and also to be able to encode the pest and pathogens characteristics along with model parameters associated to the crop varieties as stipulated in the requirements. The GeogermplasmDB allows users to record genotype information and trait information with geo-location depending on the origins of the seeds and the trial sites and implements the component “Bio-genetic Knowledge” component of the platform. Geospatial variations associated with genetic variations can lead to breed selection programs [18, 33]. An example using the underutilized crop of the Bambara groundnut (Vigna subterranea) is detailed in the example section (Fig. 5).
The other aspects of complexity considered here come on one hand from the interaction of farmer’s knowledge with respect to the land races linked to their strategies to make a living [24, 32] and on the other hand to the climate forcing interacting with the current land conditions. Due mostly to aggregation and topological properties when modelling these models, the spatial complexity can now be also introduced [26, 47, 51]. Specific models for climate forcing more often mechanistic can be used to predict future ground conditions but are usually integrated with interaction from general land use categories [43, 50].
The eGRASP capacity
The approach pursued in GRASPgfs and for the design of the eGRASP platform has been as much top-down as bottom-up from leading disciplines such crop genetics, geospatial information modelling and crop modelling. Basically besides strong top-down emphasis on geolocated genetic-trait database (the GeogermplasmDB), and on a workflow modelling (based on OGC WPS and BPMN standard), case studies analysis were used to gather requirements. Mixing these two aspects as well as envisaging direct use of the top-down elements into the bottom-up approach, the UML use case diagram of the required functionalities of the eGRASP platform was obtained (Fig. 2). From the adoption of the use case diagram, disciplinary research took place to refine the case studies with focus on use case matching and potential new developments whilst the computing architecture was design to fit these requirements.
The architecture designed for the eGRASP platform to enable global spatial data infrastructure functionalities, as well as the ones described above, is given in Fig. 4. This viewpoint gives an overview of the different components without detailing on how specific analytical functionalities are implemented. The objective for this pump-prime funding was to establish the design and to demonstrate a prototype. Therefore, specific functionalities are still to be developed; further funding is required to pursue these efforts. In Fig. 4, front-end services with their clients are represented as square boxes and back-end services often associated with specific information (e.g., databases, repositories) are represented as cylindrical boxes. The eGRASP system appears in this design as sub-architecture of the CropBASEFootnote 11 initiative led by CFF (Crops For the Future), a wiki-knowledge sharing platform integrating multiple CFF programs also in development.
For the sake of demonstrating the architecture the set of services implemented and facilities currently available,Footnote 12 but the platform as well as the CropBASE portal are not yet operational. The OGC services, for example using WPS and WFS, can also be used directly in other clients such as in QGIS (from the OSGeoFootnote 13 stack), currently:
-
the Geovisualisation is supported from QGIS and from the WMS client provided from the Geoserver serving the GeogermplasmDB
-
the Discovery via Metadata Catalogue service (OGC CSW) is supported by GeoNetwork12, queries on GEOSS registered catalogue can brings re-usable resources (data or processing services) as well a s local ones.
-
the GeoWorkflow is supported by a bespoke specification for OGC services using the jBPMFootnote 14 suite with a web editor and a workflow engine [35].
-
the GeoGermplasmDB services as well as local environmental data are served using GeoServer12; the results of the simulations or other workflows can be stored in the local environmental data storage.
-
a set of ontologies can be used to enrich the data and processes enabling refined queries via the metadata catalogue client.
Quality information available for data and processes in the metadata catalogue are used for uncertainty assessmet from the error propagation, by then allowing better decision-making. This is currently available as added functionality from the web editor from re-using the MetaPUnT WPSFootnote 15 service [27, 28] and allowing to meta-propagate the uncertainties.
First applications
Two illustrative examples are presented here to highlight the potential of the eGRASP. The first one, a landscape gentic modelling, uses directly the GeoGermplasmDB and WFS associated to describe spatially genetic distances of germplasms. The second one illustrates the crop disease modelling of the eGRASP facility by designing an examplar wheat eyespot disease model [1]. Both examples, the landscape genetic association analysis and the crop disease modelling are using a BPMN scientific workflow representation, by then demonstrating the range of modelling situations that eGRASP is intending to cover.
For the landscape genetic modelling, a glasshouse trial with 128 plants from 4 repetitions each of 32 landraces was analysed (Figs. 5 and 6). Here only the genotypic information was used to retrace geo-location associations of similar genetic profiles based on 20 microsatellites molecular markers (SSR) [37, 46]. Five genetic profiles were identified from k-means on main principal components of the SSR response data. In Fig. 6, the green and red profiles capturing most of the genetic variability are relatively clustering spatially with an East-west gradient in the Sahel for the reds and a North-south gradient in the East and South-East Africa for the greens. Adaptations to similar climatic environment can be though as explaining these zones with the Sahel zone for the reds and a more humid tropical zone in the East-Africa for the greens. Trade routes can be also involved. Further analysis including the phonologic data with comparison to local data will be needed to confirm these sorts of hypotheses.
Each task of the workflow in Fig. 5 was performed from R scripts based on existing packages. These R scripts are in the process of being encapsulated as WPS in order to be used and shared from the eGRASP platform.
The second example illustrated in Fig. 7 is a scientific workflow for crop modelling with potential occurrence of the eyespot disease. The purpose was to integrate specific epidemiological disease modelling within a normal growth simulation model. The Eyespot disease is modelled using few sub-models interfering with the normal development of the crop:
-
The inoculation potential model (IPM) determines the amount of inoculum available for infection of the host depending on land condition risks and weather data.
-
The disease development model (DDM) based on the inoculation level and key environmental factors related toinfection and disease developement.
-
Finally at a key developmental growth stage the severity of the disease is determined (DSM) and is based on estimates from the previous two models.
-
The impact of the severity of disease is then evaluated iteratively (HRM) at the subsequent growth stages until the crop has been harvested.
Each one of the models: IPM, DDM, DSM and HRM are stochastic models and estimated at the given growth stages that were identified as crucial during the development of the crop on controlled data: GS13, GS32, GS39 and GS65 [1]. The models are to be combined with physiological based model for crop growth as in the BPMN representation in Fig. 7. The disease evolution models have been implemented in RFootnote 16 and APSIM was chosen as crop growth model. Within APSIM and using the script manager, R scripts can be ran, making APSIM the orchestrating engine. Nonehteless, encapsulating APSIM within a WPS could be a future solution using the workflow engine wihtin eGRASP. Details of first results and variables involved in the IPM, DDM, DSM and HRM model can be seen in [1] as well as the full validation of the models. Nonetheless, despite the capacity of APSIM to run R scripts, the targetted variables by the disease modelling couldn’t be updated during simulations which led to a much simpler adaptation of Fig. 7.
For the eGRASP the interest lies in the fact that such composition and conceptualisation of the models can be facilitated and controlled, e.g. looking for model adequacy. The interoperability ensures that the models designed according to the BPMN standard can be then shared using a standard graphical representation for better communication but also as XML encoding enabling any workflow engine to run the scientific model represented as a workflow.
Like UML (Unified Modelling Language) used as a computing science tool to design of application systems, leading both to databases and object programing implementations, the meta-language of the BPMN can be very rapidly understood from the scientists involved [22, 29]. This transdisciplinary process enabled to conceptualise the disease evolution and impact in a comprehensive way that has been also efficient to put in practice once each sub-model (tasks in the BPMN diagram) has been established and fitted.