3D city modelling and CityGML
The City Geography Markup Language (CityGML) is an international standard for the interoperable representation and exchange of virtual 3D city and landscape models. CityGML defines a conceptual schema for the most relevant entities of the urban space like buildings, roads, railways, tunnels, bridges, city furniture, water bodies, vegetation, and the terrain. The conceptual schema specifies how and into which parts and pieces physical objects of the real world should be decomposed and classified. All objects can be represented with respect to their semantics, 3D geometry, 3D topology, and appearances in five predefined levels of detail (LOD 0–4). CityGML is formally specified using UML class diagrams, explanations of the object classes and attributes, and an XML schema for the file exchange format. CityGML is issued by the Open Geospatial Consortium (OGC). The first official version of CityGML was released in the year 2008 and the current version 2.0.0 was published in 2012 (cf. [17]).
In CityGML, all classes and data types are grouped into a number of thematic modules. The modules and their relationships are shown in the UML package diagram in Fig. 1. The Core module defines the basic CityGML components and is, hence, a mandatory package that must always be referenced by the packages of the other modules including Building, Bridge, Transportation, CityObjectGroup, Appearance, Generic, CityFurniture, Relief, Vegetation, Tunnel, LandUse, and WaterBody. Since CityGML is based on OGC’s Geography Markup Language (GML) in version 3.1.1, the Core module has a dependency of the GML3 schema which must always be imported into the CityGML schemas. Another mandatory package is the Extensible Address Language (xAL) issued by OASIS, which maps the address formats of different countries onto a unified XML schema for encoding the address information of a building object in a standardised XML structure.
The geometric-topological model of CityGML is realized using a subset of the GML3 geometry model, which is based on the ISO 19107 standard ‘Spatial Schema’ for representing the spatial properties of real-world objects. Supported geometric primitives include Point, Curve, Surface, and Solid, which allow to represent spatial properties of city objects in different dimensions ranging from zero to three. Volumetric geometries are modeled using the well-known boundary representation (B-Rep, cf. [13]), where each Solid geometry object is defined by a closed outer shell (composed of individual Surface objects) and an arbitrary number of inner shells (representing any inclosures). The orientation of surfaces can be specified explicitly when using the geometry type OrientableSurface.
For each geometry type, more complex geometries with composite or aggregated hierarchies (cf. Fig. 2) can be constructed. The difference between aggregate and composite geometries lies in the topological relationships between the respective geometry components. For aggregate geometries such as MultiCurve, MultiSurface, and MultiSolid, the spatial relationships between components are not restricted and primitives can, hence, overlap, touch, or be disjoint. In contrast, a composite geometry like CompositeCurve, CompositeSurface, or CompositeSolid is a special case of the aggregate geometry which must be isomorphic to a single respective geometric primitive. This implies that the underlying elements must be topologically connected along their boundaries. In addition, the GML geometry type GeometryComplex can be used to represent a complex consisting of geometric primitives of different types (e.g. Point and Curve). The members of a geometric complex must not overlap and can touch at their boundaries only. GeometryComplex is being used in CityGML to represent the geometric network of streets and railways.
CityGML allows to assign appearances to individual surfaces (like Polygons), composite, and aggregate surfaces. Appearances can be specified by colours or textures, and each surface can be assigned any number of appearances. Textures are represented by raster images.
In order to represent topological relationships between geometries, CityGML utilizes the XLink concept according to the GML specification. Each geometry object can have a unique identifier and can form a shared part of different aggregate or composite geometries. For example, one polygon may be member of the outer shells of two solids in order to explicitly express that the two solids are touching along one side. The shared polygon is then not represented redundantly, but is referenced from the outer shell of the second solid by an XLink to the shared polygon.
CityGML is very flexible regarding the expression of spatial properties of semantic objects. For example, the geometry of a Building object may be given in any of the LODs 0, 1, 2, 3, and 4 (also simultaneously). Most LODs allow representing the geometry by a Solid, a MultiSurface, or a combination of both. Besides geometrical decompositions, CityGML can also decompose objects semantically into parts. For example, a building can consist of building parts, which again can consist of roof, wall, and ground surfaces etc. When objects are decomposed in the same way regarding their semantic as well as their spatial structure, they are considered to be spatio-semantically coherent. This is illustrated for a building model in Fig. 3. Most of the LOD2 CityGML building models available today are semantically describing the wall, roof, and ground surfaces and additionally provide a solid geometry for the geometric representation of the building hull and its 3D shape. The semantic objects are usually used to query and analyse the building components and their thematic attributes, whereas the solid geometry represents the whole body and is useful for geometric calculations such as the building volume and surface areas. Both aspects of describing the building are complementary and provide a very flexible modelling structure ranging from simple geometric models to semantically rich models.
A 3D geodatabase for CityGML must be able to cope with all the aspects presented above. This means, each semantic object like a building or a tunnel can be decomposed into parts and subparts. Each semantic object can have a number of geometric properties of different geometry types and LODs. Some geometry elements can be shared from different aggregate geometries. Also semantic objects can be part of multiple semantic aggregate objects. Each surface can be assigned an arbitrary number of appearances. The geodatabase must also be able to handle appearance data like individual surface textures, which typically are given in binary image file formats (e.g. JPEG or PNG). All semantic objects have predefined thematic attributes and, in addition, can have an arbitrary number of generic attributes. Finally, since 3D city models cover large areas up to entire countries, the geodatabase must be able to manage the large data volumes and provide efficient access to the stored data for thematic and spatial queries.
Database solutions for CityGML
Besides 3DCityDB, several other database solutions support the management of CityGML data. In the following, a selection of these software packages are listed, along with their major characteristics with respect to CityGML support. The Open Source software frameworks deegreeFootnote 1 and GDAL/OGRFootnote 2 as well as the commercial software packages CPA SupportGISFootnote 3 and Snowflake GO LOADER / GO PUBLISHERFootnote 4 offer generic support for GML application schemas. Since CityGML is a GML application schema, these software systems are able to automatically create database schemas for storing CityGML data for various database management systems like ORACLE Spatial or PostgreSQL/ PostGIS, using the CityGML XML Schema definition files. For importing and exporting CityGML data sets into/from the database, deegree and SupportGIS offer an OGC Web Feature Service (WFS) interface whereas Snowflake GO Loader provides a desktop tool.
The named systems all extract CityGML data from CityGML files and insert the data into tables of spatially-extended relational database management systems. But, in recent years researchers have also examined different NoSQL solutions (de Souza Baptista et al. [11]). Document stores such as BaseXFootnote 5 for XML or MongoDBFootnote 6 for JSON seem like an obvious choice for storing instance documents of CityGML [31]. In fact, data ingest and retrieval is a lot faster than with RDBMS due to the smaller serialization effort [20]. While documents stores have their weakness with more complex queries including joins and spatial operations, they can be a great choice for web application backends sitting in between the RDBMS and a client. GeoRocket,Footnote 7 for example, decomposes CityGML XML files and stores the XML fragments in a (distributed) file system like Amazon S3 or MongoDB. GeoRocket is available in an Open Source and a commercial version. Furthermore, solutions for storing CityGML data using the graph database Neo4jFootnote 8 have been presented by Agoub et al. [1] as well as by Nguyen et al. [33]. The latter software has been made available as Open Source software on Github.Footnote 9
Relational database modelling for CityGML
There are strong reasons to employ spatially-extended relational database management systems (SRDBMS) to store and manage complex 3D city models. First, SRDBMS support all required geometry types and provide means for proper spatial indexing as well as for geometric and topological analyses. Second, SRDBMS can directly be used by most geoinformation systems (GIS) or spatially enabled ETL (Extract, Transform, Load) tools. As described above there exists a variety of non-relational databases like object-oriented databases, document-oriented databases, and graph databases, which are increasingly investigated and employed in many application fields (cf. [35]). However, they are currently still more or less limited in their capabilities and performance regarding spatial operations and coordinate transformations, which are of great importance for the enterprise use in GIS applications (cf. [1]). Therefore, SRDBMS such as the commercial software ORACLE Spatial/Locator and the Open Source software PostgreSQL with PostGIS extension play a major role for GIS due to their extensive capabilities in handling 3D spatial data.
The conceptual solution for handling object-oriented data models like CityGML in SRDBMS can be abstracted to solving the problem of mapping the object-oriented data model onto a relational data model. This has been extensively studied and discussed in literature over the past 25 years. Golobisky & Vecchietti [16] summarized the fundamental concepts for deriving relational database schemas using different mapping rules according to the source UML class structures. For example, a class shall be mapped onto one table where each row should represent an instanced object of the respective class. Thus, the mapped table shall have at least one primary key column which can be named as “ID” and defined with the long integer data type for storing the object identifier which must be unique within the table. Additional columns can also be added to the mapped table for storing the spatial and non-spatial attribute values of the respective class objects. To handle the class associations in relational models, a foreign key constraint or an associative table in case of M:N relationship shall be utilized to link the tables mapped from the associated classes. Moreover, the inheritance relationship between two classes can either be implemented using a foreign key constraint to link the subclass and superclass tables by joining their primary keys or mapped to a table that represents the two inherited classes at the same time. Further discussions and comparison of, among others, the aforementioned mapping rules are given in [19].
However, although these mapping rules from the literature allow to map CityGML data model onto a relational database model, they may easily lead to a large number of database tables with many join relations. An analysis of the existing relational database systems indicated that a more compact database schema is much more efficient for querying and processing of large and complex-structured data to facilitate good performance when interacting with the database in a real-time application (cf. [39]). To reach this purpose, the CityGML database schema shall result from a careful manual process by identifying and simplifying the complex CityGML classes and data types and mapping them onto fewer tables with respect to the database complexity, operating performance, and semantic interoperability. Concerning this requirement, [24] proposed a set of fine-grained mapping rules, which have been successfully adopted for designing the 3DCityDB database schema and are briefly reviewed in the following subsections.
Mapping an inheritance hierarchy onto one table
With this approach, multiple CityGML classes belonging to an inheritance hierarchy can be mapped onto one single table. For example, a table named CITYOBJECT can be used for the instance objects and their attribute values of the GML class _GML, and _Feature as well as the CityGML class _CityObject (cf. Fig. 4). For each CityGML top-level class like AbstractBuilding, AbstractBridge and AbstractTunnel etc. a separate table associated with the CITYOBJECT table shall be created to hold the feature attributes. This way, the CITYOBJECT table can be used as a central registry of all the CityGML top-level features and allows for rapidly retrieving a list of CityObjects through a query on their attributes like spatial extent via a user-selected bounding box.
Mapping classes at the same inheritance hierarchy level onto one table
This mapping approach utilizes only one table to represent multiple classes which are subtyped from a common class and at the same time belong to the same inheritance hierarchy level (cf. Fig. 5). This way, the subclasses are logically mapped onto the super class table, such that the retrieval of data contents of all subclasses just needs to perform only one query on the table in order to avoid multiple table joins for speeding up the overall performance. To distinguish the different types of instance objects stored in the table, an additional column OBJECTCLASS_ID is required which can store a numeric value in each row for representing the respective class type. This type information is static and can be well documented in an additional table OBJECTCLASS whose primary key values are used for enumerating the object class IDs and referenced by the OBJECTCLASS_ID columns of the class tables. Moreover, additional columns for describing the meta-information like class name and parent class name etc. of each feature class can be added to the OBJECTCLASS table which allows third-party applications to directly retrieve the class information from the database for interpreting the queried feature objects.
Note that this mapping approach is not generally applicable since it also has its own usage limitations in some particular cases. For example, if the subclasses have very different attributes or associations to other classes, a large number of empty cells will occur in the database table and can result in a lower storage efficiency, especially when the number of subclasses is increased. Considering this situation, the utilization of this mapping approach shall satisfy some certain conditions regarding the model definitions and structures which may typically have the following characteristics:
-
The super class shall be an abstract class that holds all attributes and associations which will be inherited by the concrete subclasses.
-
Every of the subclasses shall not have any further attributes or associated with other classes.
With these conditions, the storage efficiency can be retained to the highest degree, because only one additional column e.g. OBJECTCLASS_ID storing the class type information needs to be added to the table. An analysis of the CityGML model structure shows that this mapping approach can be well applied to the relational database modelling for CityGML to improve the overall database performance and efficiency. For example, the thematic surfaces like wall surfaces, roof surfaces, and ground surfaces etc. of each feature type like Building, Tunnel, and Bridge are abstracted to an abstract class called _BoundarySurface which holds the relevant attributes and association information. For each type of thematic surface, a concrete class i.e. WallSurface, RoofSurface, and GroundSurface etc. being a subtype of the class BoundarySurface is defined individually. This model definition exactly satisfies the afore-outlined conditions of this mapping approach allowing for realizing the fast data retrieval. For example, a typical query being usually applied is the export of a semantically rich building (LOD > = 2) to a 3D graphics format. In this case, the thematic surfaces like roof and wall surfaces forming the outer shell of the building object can be directly queried by joining the surface table with the building table instead of using multiple database joins.
Mapping aggregations and compositions onto one table
In objected-oriented data models, recursive aggregation relations of features can be properly modelled by means of a well-known design pattern called ‘Composite Pattern’ (cf. [14]) which typically uses three interrelated classes (cf. Fig. 6) for constructing a tree-like data structure. According to the concept of this design pattern, each instance of the class CompositeObject can contain an arbitrary number of, but at least one instance of the class BasicObject or CompositeObject. The BasicObject corresponds to the leaf in the aggregation hierarchy and shall not have child components. The conventional solution for the mapping of such data model onto relational structure is to use a foreign key for joining each object with its parent object to querying all the aggregated objects. In this case, recursive database queries must be performed which may cause high performance cost, especially if the recursion depth is unknown.
In order to achieve good performance when retrieving the elements of a tree of objects, a specific optimization approach has been developed. The key idea of the database design is to utilize a single database table for the mapping of all the involved feature classes along with their inheritance relationships. A foreign key column PARENT_ID is used for representing the composition relationship. Additionally, this database table receives a foreign key column ROOT_ID which holds the ID of the root element of each composite hierarchy and hence allows for fast retrieval of all its child elements by querying on the attribute ROOT_ID in order to avoid time-costly recursive database joins. Moreover, since three classes are mapped onto one table, an additional column OBJECTCLASS_ID is required for supporting the automatic determination of class affiliation information. This mapping approach can benefit the relational database modelling for the CityGML data modules like Building, Bridge, and Tunnel.
Mapping CityGML’s B-rep geometries onto a single table
The optimization approach for the mapping of composite pattern can also be applied for the handling of complex data types like the B-Rep geometries such as aggregated/composite surfaces and solids (cf. Fig. 7).
With this optimization step, all surface-based geometry types can be represented in a simplified data model according to the composite pattern (cf. the previous subsection) and consequently mapped onto a compact table allowing for high-performance database query of all the geometry elements of an aggregation hierarchy. Instead of using a class ID column, the class affiliation is realized using a number of flag columns for characterizing the different types of geometry and aggregation. For example, the IS_SOLID distinguishes between surface and solid geometry, and the IS_COMPOSITE can be used to determine whether this is an aggregate (e.g. MultiSolid, MultiSurface) or a composite (e.g., CompositeSolid, CompositeSurface) geometry element. This approach offers semantic clarity of the table structure and also allows to manage the surface and solid geometries within a single table at the same time. Consequently, the interaction and query of the geometry data from this table becomes much simpler. For example, if a feature object owns a MultiSurface or MultiSolid property, a foreign key column can be added to the class table referencing to the primary key column of the geometry table to access the geometry data. Furthermore, since each surface geometry element is explicitly stored in a tuple, it can be easily augmented with appearance information like texture images, colors, or materials by associating the geometry table with the appearance data table via the corresponding row ID.