The current version of CityJSON implements most of the CityGML data model, and all of the CityGML modules have been mapped. The parts that were not implemented are based on the fact that they would have unnecessarily complicated the encoding, and that they are not used in practice (with the files that are publicly available at least).
We explain in the following the main engineering choices that were made, and we also describe where and how the data model differs from that of CityGML. The full specifications are available online at https://cityjson.org/specs/.
The JSON data format defines simple data types for boolean values, numbers, and strings, as well as two data structures:
-
1
An ordered list of elements, which are separated by commas and enclosed with square brackets, i.e. []. We refer to it as an “array”.
-
2
An object consisting of key/value pairs (key is often named “property”), which are in the form key: value and are enclosed with curly brackets, i.e. {}. We refer to it as a “dictionary”. It is often called a map, a hash table, an associative array, or in the context of JSON simply as an object.
A JSON object can be any combination and nesting of the above elements.
A CityJSON file represents a given geographical area; the file contains one JSON object of type ~CityJSON~ and would typically contain the following JSON properties:
City objects are “flattened out”
The property ~CityObjects~ contains a dictionary where the properties are the identifiers of the city objects (IDs). The schema of CityGML has been flattened out and all hierarchies removed. Figure 2 shows the city objects that are supported in CityJSON, both 1st- and 2nd-level city objects are stored in the dictionary ~CityObjects~.
As an example, for a building containing 2 parts, the 3 objects will be represented at the same level and linked by their IDs.
Each city object can have a "parents" and/or a "children" property, and this is how in the snippet the building ~id-1~ is linked to its 2 parts. The fact that a dictionary is used means that developers have direct access to the city objects through their IDs (and also in constant time if a hashmap is used to implement the dictionary).
A city object can be of any of the types defined in Fig. 2, and each of them must have the same structure, and at a minimum contain a "geometry" property. If attributes are to be stored, they have to be in the "attributes" property. This simplifies the work of the developer because there is a single point of entry for all geometries and attributes, unlike with CityGML.
Geometry
CityJSON defines the same 3D geometric primitives used in CityGML, with the same restrictions for linearity/planarity. However, since they are rarely used in a 3D context, Point and LineString only have their Multi* counterparts; a single Point is a MultiPoint with only one object. When a geometry is defined, it must contain a value for the LoD. In order to avoid ambiguities, we encourage the use of the refined LoDs, as defined in [4], over the five standard CityGML ones. City Object can have several LoDs, and thus CityJSON, as is the case for CityGML, allows us to store concurrently several LoDs for the same object.
It should be noticed that CityJSON uses a different approach from (City)GML to store the (x,y,z) coordinates of geometric primitives. A geometric primitive does not list all the coordinates of its vertices, rather the coordinates of the vertices are stored in a separate array (the "vertices" property of the CityJSON object), and geometric primitives refer to the position of a vertex in that array.
The indexing mechanism of the format Wavefront OBJFootnote 4 is reused, because it has been used for many years, with success, in the computer graphics community. There are several advantages to this approach. First, the files can be compressed: 3D vertices are often shared by several surfaces, and repeating them can be costly (especially if they are very precise, often sub-millimetre is used). Second, this increases the topological relationships that are explicitly stored in the file, and several operations can be sped up and made more robust (e.g. are two buildings adjacent?). Third, it is very easy to convert to a representation listing all coordinates; the inverse is not true.
The geometry is based on an enumeration of the vertices forming each ring of a surface, as follows. A ~MultiSurface~ has an array containing surfaces, where each surface is modelled by an array of arrays, the first array being the exterior boundary of the surface, and the others the interior boundaries. A ~Solid~ has an array of shells, the first array being the exterior shell of the solid, and the others being the interior shells; each shell has an array of surfaces, modelled in the exact same way as a ~MultiSurface~. Notice that unlike with (City)GML, there is only one variation per geometry type, which (greatly) simplifies the life of developers.
Semantic surfaces
In one given city object (say a ~Building~), several surfaces can have the same semantics (think for instance of a complex building that has been triangulated, there can be many triangles for one given surface). Because of this, a semantic surface, which is a pivotal concept in CityGML, becomes a JSON object that is stored separately from the geometry of a city object. By doing so, a semantic surface object has to be declared only once, and each of the surfaces used to represent it can point to it. This is achieved by first declaring all the semantic surfaces in a "surfaces" array, and then declaring an added "values" array that links each surface to its corresponding semantic surface using their respective positions in the arrays.
Geometry templates
CityGML’s Implicit Geometries, better known in computer graphics as templates, are one method to compress files since identical geometries (e.g. benches, lamp posts, and trees), need only be defined once (and translations/rotations/scaling are applied). In CityJSON, they are implemented slightly differently than in CityGML: they are stored at one specific location in the file, and each template can be reused. In CityGML, one reuses the geometry used for another city object, and thus there is no structured way to store them, and furthermore, one has to search for them in the file (with XLinks) because they can be located anywhere (the link could even point to an external reference that needs to be resolved).
A given city object can have a geometry of type "GeometryInstance" (instead of those defined above), which defines the (x,y,z) location, a link to the geometry template, and the transformation matrix.
Appearance
Both textures and materials are supported, and the same mechanisms as CityGML are used for these. The material is represented with the X3D specificationsFootnote 5, as is the case for CityGML. For the texture, the COLLADA specificationsFootnote 6 are reused, as is the case for CityGML.
Just as for the geometry templates, all material and textures must be located at the same entry point in a CityJSON file; this is in contrast to CityGML where they can be located anywhere.
Schema validation
CityJSON uses schemas defined in JSON SchemaFootnote 7 to document its data model and to validate whether a CityJSON file respects the allowed structure and syntax. All the city objects, their attributes, the allowed geometries, and other constraints are defined in schemas that are openly available at https://cityjson.org/schemas/.
It should be noticed that JSON Schemas are less flexible than XML Schemas, inheritance and namespaces are for instance not supported. They nevertheless allow us to document most of what is possible with XML, and we have added extra validation functions to the software cjio for the properties and constraints that cannot be expressed with JSON Schemas, see the section about software below for details. The extra constraints can be seen as validating the internal consistency of a given CityJSON file, and examples of these are:
-
are the links between 1st- and 2nd-level city objects consistent?
-
are the arrays for the boundaries and the semantics coherent? (i.e. same structure)
-
are there duplicate IDs for city objects?
-
are there duplicate or orphan vertices?
-
are there vertex indices that do not exist?
CityGML support
CityJSON implements most of the data model, and all the CityGML modules have been mapped to CityJSON objects. However, for the sake of simplicity and efficiency, some modules and features have been omitted and/or simplified. If a module is supported, it does not mean that there is a 1-to-1 mapping between the classes and features in CityGML and CityJSON, but rather that it is possible to represent the same information, but in a different manner. CityJSON is thus conformant to a subset of CityGML, although technically only CityGML files (encoded with the XML format) can be conformant to the specifications of CityGML [22, Clause 2 about Conformance].
The main features that are not supported are:
-
The LoD4 of CityGML, which was mostly designed to represent the interior of buildings (including details and furniture), is not implemented. The main reason is that this concept will be revamped completely in the next CityGML version [19], and currently there are virtually no datasets having LoD4 buildings.
-
No support for arbitrary coordinate reference systems (CRSs). Only an EPSG codeFootnote 8 can be used.
-
All geometries in a given CityJSON object must use the same CRS.
-
In CityGML most objects can have an ID (usually gml:id). That is, not only can one building have an ID, but also each 3D primitive forming its geometry can have an ID. In CityJSON, only city objects and semantic surfaces can have IDs.
Compression of CityJSON files
To reduce the size of a file, it is possible to represent the coordinates of the vertices with integer values, and store the scale factor and the translation needed to obtain the original coordinates (stored with floats/doubles). If compressed, a CityJSON file contains a "transform" property:
and the real-world coordinates of a given vertex v are obtained easily, for example for the x component:
$$ x = (v_{x} * transform.scale_{x}) + (transform.translate_{x}) $$
Several file formats use this, for instance LAS [1] and TopoJSON [7]. For CityJSON, it typically compresses the files by around 5–10%; we give below examples with real-world datasets. It should be noticed that it also makes files more “robust”, in the sense that the coordinates are not prone to rounding because of floating-point representation in a computer [10]. This is the favoured way to store CityJSON files.
Handling and streaming (large) CityJSON files
One drawback of representing geometries by having references to a list of vertices is that large files are difficult to handle (one needs to read all of the file in memory to reconstruct the geometries) and that streaming of large files is thus complicated.
There exists a misconception that CityGML, since it uses the Simple Features paradigm [20], can be easily and directly streamed. We claim that while it is easier, this is not completely true. CityGML files also often contain references between objects in a given file (XLinks), and before this file can be streamed, these references need to be resolved and the objects copied to the location pointing to it. This also increases the size of the file.
Isenburg and Lindstrom [12] proposes to reorganise the order of the information in the file so that the vertices are not all at the end, they rather are located close to the geometries that need them. Special tags in the file informs us about the fact that a vertex will not be used anymore, thus allowing us to free the memory.
This cannot be used with the current structure of CityJSON, but we propose instead to partition a CityJSON file into several files. The rule can be based on a spatial partition, on the type of city objects, or simply randomly. It suffices to update the list of vertices and the indices, which is a simple operation. The open-source software cjio has an implementation of this.
Partitioning a given CityJSON file into several usually will not increase the storage. There will be several properties (e.g. the CRS, metadata, etc.) that will be repeated for each of the files, but the indices in each file will be smaller (always starting at 0), and thus in practice we have noticed that the size will actually decrease.
Support for metadata
CityGML has very limited support for metadata [16]. Only a few elements are supported, such as the bounding box and the CRS, and most elements are on the city model level and not on the module or city feature level. While there exists a metadata ADE for CityGMLFootnote 9, in CityJSON metadata is incorporated into the core schema. CityJSON metadata is developed with ISO 19115 (the metadata standard specifically for geographic information developed by the International Organization for Standardization) as the base and further includes elements important for 3D city models, such as the levels of detail present, extensions (and their metadata), presence of textures and/or materials, etc. It also supports metadata at the city model level, the module level and the city feature level.
This is the only addition that CityJSON makes to the CityGML data model.