Copyright © 2024 Zazuko® and CC-BY rules apply.
A good amount of data in organizations is maintained in tables with multiple columns. Typically you can think of a multitude of Excel or CSV tables. This data often has a dimension which describes the time, sometimes other dimensions for classifying the data, and in most cases some actual observed values or counts.
Such tables are holding the data in a structured form, but most of the time, the information to understand the columns and also the necessary metadata enabling the creation of use-full representations in charts and visualizations is missing.
With the creation of Cubes you as data provider and domain specialist of the data are able to augment and annotate your data with everything necessary to understand the input data – directly in the to be published dataset. Fully annotated Cubes can also be used to visualize your data proper tooling.
This section describes the status of this document at the time of its publication. Other documents may supersede this document.
This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This section describes the Cube Schema model.
The Cube Schema defines a minimal set of classes and properties necessary to represent multi-dimensional arrays of data in RDF 1.1 Concepts and Abstract Syntax.
We describe the model, an elaborate example, and scripts to validate observations based on the constraint (SHACL shape) provided.
PREFIX | IRI | Description |
---|---|---|
cube |
https://cube.link/ |
Cube Schema. |
meta |
https://cube.link/meta/ |
Cube Schema meta data extension. |
relation |
https://cube.link/relation/ |
Cube Schema relation vocabulary. |
PREFIX | IRI | Description |
---|---|---|
schema | http://schema.org/ | To describe basic properties. |
sh | http://www.w3.org/ns/shacl# | Inherited from the Cube Schema for constratins. |
qudt | http://qudt.org/schema/qudt/ | Describe scale of mesures. |
unit | http://qudt.org/vocab/unit/ | Describes units on values. |
time | http://www.w3.org/2006/time# | A time description ontology. |
geo | http://www.opengis.net/ont/geosparql# | OGC GeoSPARQL 1.0. |
xsd | http://www.w3.org/2001/XMLSchema# | XML Schema Datatypes. |
skos | http://www.w3.org/2004/02/skos/core# | SKOS Simple Knowledge Organization System. |
The core schema is very simple and almost unconstrained from an RDF perspective. This ensures it can be used in many ways and does not restrict its usefulness due to too rigorous definitions.
There are 4 classes defined in the Cube Schema
Represents the entry point for a collection of one or more observation sets, conforming to some common dimensional structure.
Specifies constraints that need to be met on the Cube. Used for metadata and validation. (Optional) For more information see Cube Schema : Constraints
A Constraint for a cube. A Constraint is optional but recommended, it is used to:
Cube
can be validated.A single observation in the cube may have one or more associated dimensions. A Observation can appear in one ore more ObservationSets.
An ObservationSet is a structure that acts as a container for multiple Observations. It can be used to group any set of Observations, as long as they use the same dimensions. There is on purpose no stronger semantics attached to this set, to make sure it can be used in almost any scenario. A cube can have one or more ObservationSets and an Observation can appear in multiple ObservationSets.
An observation which is not defined. for more information see NULL and Empty values
The resources described by the various classes are connected by a small set of properties.
Connects a cube with a set of observations.
Connects a cube with a constraint for metadata and validation.
Connects a set of observations with a single observation.
Connects an observation with the agent that created the observation. The agent can be a person, organization, device, or software. A description of the method to gather the data could be attached to the agent.
A dimension is a structure that categorizes facts and measures to enable users to answer business questions. Commonly used dimensions are people, products, place, and time (Source: Wikidata).
In Cube Schema, facts, measures, and categories are all considered a dimension.
All Observations need to provide the same set of dimensions, they cannot be optional. This ensures that cubes can be queried efficiently.
Unlike other RDF vocabularies in that domain, there is no specific class for a dimension. Creating a new Cube Schema dimension would be the same as defining a new RDF Property.
This encourages re-use and makes it much easier to cherrypick on existing RDF properties and use them as dimensions. Obvious examples are temporal properties like schema:validFrom
, dcterms:date
, dcterms:temporal
, etc.
In general, any RDF Property can be considered for describing a dimension, except for heavily constrained properties that might lead to unwanted conclusions (inference) by reasoners.
Note that choosing a particular dimension will have implications. For querying cubes via SPARQL, spatial and temporal dimensions might only be filtered properly if the datatype used (i.e.
xsd:date
,geo:asWKT
) is supported/optimized by the SPARQL endpoint. This is for example mostly not the case for fragment datatypes likexsd:gYear
etc.It has to be ensured that properties are not attached at the wrong level. Spatial dimensions for example are most likely not attached to the observation directly but to an instance of a dimension referenced in the observation. This is the case when the observation is referring to a static location where the observation is done, e.g. a sensor location. When however the actual location is the observation then it can be recorded as a dimension
Instances of a dimension can be RDF literals with data types (sometimes called typed literals) or IRIs. Only one literal must be attached to each dimension or must point to a single IRI.
Language tagged literals and all other (meta) data should be modeled as terms/concepts. For this purpose, IRI's should be used and the literal(s) would be appended to that particular instance of a term/concept. This can be done e.g. by using SKOS Simple Knowledge Organization System Primer (https://www.w3.org/TR/skos-primer/) or schema.org DefinedTerm. As shown in the following example, a typical cube structure is a combination of dimensions with typed literals attached to the "observation" itself and dimensions that refer to concept groups via IRIs.
In RDF 1.1 Turtle syntax, the observation above looks like this:
room1
is an IRI that has labels attached as language-tagged strings.
Nesting of relations can be expressed in a machine-readable form as well but is not part of the core Cube Schema.
In Cube Schema, all dimensions are mandatory for a cube. If a value could not be measured, it should be expressed as such.
There is no generic "built-in" way to solve this in RDF. For some numeric datatypes, XML and thus RDF defines "not a number" (NaN
) as a value. According to the specs, this is only valid for xsd:float
and xsd:double
and not for xsd:decimal
and xsd:integer
.
To provide a generic solution that works for all numbers and IRIs, Cube Schema provides cube:Undefined
. The following example shows how to use it in cube:Observation
and in the attached shape:
If it is necessary to state why the value is cube:Undefined
, annotations should be used.
Additional constraints (like sh:minLength
in the example) may be placed within the real data type so that they do not apply to undefined values.
From a high-level point of view, the core classes and properties are enough to publish a valid cube, however, the absence of all metadata might make it hard for consumers to understand what the data is about.
To add the title and a short description of the cube add the following properties directly on an instance of a cube:Cube
A descriptive name of the Cube, this description can be multilingual.
A description of the Cube, this description can be multilingual.
you are free to add other properties related to your Cube or the publication process of your Cube directly on the instance of the Cube.
From a pure publishing point of view and according to the Open World Model, publishing observations using the core Cube Schema is enough. However, in reality, one might want to use the data for other purposes as well, like visualizing it in web applications and other publications. To be useful, such tools might require additional metadata and cubes that adhere to certain constraints.
Cube Schema supports attaching a "constraint" to a cube. The constraints themselves are expressed using the Shapes Constraint Language (SHACL)
Providing constraints for a cube facilitates the documentation, interpretation, and validation of the cube for tooling and libraries. There is only one constraint per cube allowed, this results in a very low overhead for the documentation, the cube and observations do not need any documentation on the data level.
It is up to the consumer of the data to decide how the constraint is used. If provided, it should be possible to validate the observations in the cube with it.
A cube:Constraint is also a sh:NodeShape
. Shapes Constraint Language (SHACL) is used to restrict the cube to a particular structure. The Constraint presented in this section can be used to validate all Observation
s in the cube.
If additional restrictions are needed, additional restrictions expressed as Shapes Constraint Language (SHACL) can be provided.
The following snippet defines a new constraint, it applies to all cube:Observation
:
Dimension Constraints are linked to Cube Constraints through the sh:property
property, the Dimension constraints uses the Shapes Constraint Language (SHACL) vocabulary to express the constraints for the specific dimension.
The dimension/property is referenced using sh:path
. The value of the path must be a single value and not an RDF list.
Additional constraints can be added according to the SHACL specification, for example, data types, min/max values, etc.
The following snippet defines a dimension (property) dc:date
with a literal value (xsd:dateTime
):
Dimensions that point to objects like code lists (i.e taxonomies represented in vocabularies like SKOS Simple Knowledge Organization System Primer) can be expressed as well:
Dimensions can have further types, which are not defined by this vocabulary to support other Dimensions (e.g. Precision, Statistical Measures) or for additional Attributes to filter on, which are not part of the key to defining a cube:KeyDimension
.
To facilitate the visualization of Cubes or any other user experience related activity it is possible to extend the Constraints to include additional metadata that describes the characteristics of the cube and its dimensions. By providing this information in the Constraints tools used for displaying the data in the cube do not need to process and interpret the actual data in the cube to configure the visualization.
To be able to understand the nature of a dimension we can type the dimension in the constraints. In general, we have at least two mandatory types per cube, the cube:MeasureDimension and the cube:KeyDimension.
The following classes are used to define the various visualization element of the Cube Schema
The KeyDimension tags one or multiple dimensions which are together uniquely identifying an observation. You can think of them as the Key in a relational database.
The MeasureDimension tags at least one dimension, but potentially multiple, which is the actual measurement, or statistical count attached to an observation.
The following properties are used to define the various visualization element of the Cube Schema
A descriptive name of the Dimension, this description can be multilingual.
A description of the Dimension, this description can be multilingual.
To describe the unit of the values in a dimension the respective qudt:Unit
instance can be attached to a Dimension Constraint with the qudt:unit
property.
To provide more information on the statistical property scale of measure it can be described by qudt:NominalScale
, qudt:OrdinalScale
, qudt:IntervalScale
or qudt:RatioScale
which is attached through qudt:scaleType
to the Dimension Constraint. There can be only one qudt:scaleType
per dimension.
The different scale types hint about features that can be used for visualization properties:
qudt:NominalScale
: Expects the dimension to be either Resource (connected by a URI) or a value with any kind of dataType.qudt:OrdinalScale
: Expects the dimension to be either:schema:position
on the elements which provides a lexical ordering (best to use integer numbers). Further can be expected that the order in sh:in
of the constraint is correctly ordered.1th
, 2nd
, 3rd
).qudt:IntervalScale
: Expects the dimension to be values with a numeric dataType and the unit not to contradict the correct Scale.qudt:RatioScale
: Expects the dimension to be values with a numeric dataType and the unit not to contradict the correct Scale.To describe the datatype used by the dimension attach the sh:datatype
to the Dimension Constraint.
Be aware that this implies the presence of a typed literal as the dimension value
To express that the dimension provides a specific kind of data which is necessary to select the correct visual representation you can add a meta:dataKind
resource with the following possible structures:
schema:GeoCoordinates
: To hint that the dimension does provide Resources with latitude and longitude which can be shown on a map.schema:GeoShape
: To hint that the dimension does provide Resources that have a shape that can be shown on a map.
time:GeneralDateTimeDescription
: To hint that the dimension does provide Resources that can be shown on a timeline.
It is further possible to add time:unitType
to hint about the precision in which the dimension should be presented. A time:TemporalUnit
is expected: time:unitYear
, time:unitMonth
, time:unitWeek
, time:unitDay
, time:unitHour
, time:unitMinute
and time:unitSecond
.
The sh:order can be used to indicate the relative order of the dimension, for use in visualizations. It should be used according to the specification by using ascending order, for example, so that properties with smaller order are placed above (or to the left) of properties with a larger order.
The previous section described the properties of basic cubes, there are however situations where more complex observations or relations between observations need to be expressed. This section provides a set of best practices for various subjects.
To be able to have a continuous history of a published cube there is a meta construct that can be put around a cube, describing a line of the history of a cube based on a schema:CreativeWork
.
The version history has attached through schema:hasPart
each time a fully described cube which can be interpreted independently. It is expected that the cubes in the same history line do not change the count of dimensions. All the other descriptions can change.
To record a version the schema:version
property can be used
A cube can be invalidated or unlisted by adding schema:expires
with the expiry date to the cube itself.
As a guideline we propose the following predicates to annotate a cube with the status of a cube (Draft, Published, Hidden, ...) on the level ot the cube description.
A status of the cube, like Draft or Published can be added to the cube through schema:CreativeWorkStatus
. The status is expected to be a schema:DefinedTerm
.
Further to hint the usage of a cube for a specific application, and potentially to be filtered out for other applications, we propose the usage of schema:workExample
attached to the cube. (The logic behind, is that if the cube is shown in an Application, it becomes an example how to interpret the cube – therefore it becomes a work example of that cube.)
Finally we propose meta:applicationIgnores
attached to the Contraint of the dimension to hide this respective dimension for the specified Application.
(It shall use the same objects as used on schema:workExample
.)
Observation may hold dimensions that are related to each other as quantitative relation. Expressing this on the observation with blank nodes creates properly structured RDF, but creates performance and complexity issues when querying the Cube.
To overcome this limitation the relation can be expressed on the relevant Dimension Constraints
There is one sh:property
definition per dimension so the lookup only needs to be done once and is valid for all observations of that particular cube.
A relation between dimensions is described only with cube
and meta
vocabulary. The relation classes itself can be extended based on specific use cases.
The controlled vocabulary introduced with namespace PREFIX relation: <https://cube.link/relation/>
provides the most common relation Classes, and is proposed as a guideline.
This is an advanced usage of the cube and increases its complexity. But it gives the expressiveness needed to describe the complex relationship between data in a machine-processable way.
Observations can be structured in hierarchies inside cubes. It is possible to define hierarchies which reside inside one dimension (e.g. categories, classifications) or also hierarchies which span over multiple dimensions. It is also possible to have hierarchies using external concepts.
To allow to reuse existing hierarchies described with e.g. (schema:hasPart
/ schema:isPartOf
, SKOS Simple Knowledge Organization System Primer (https://www.w3.org/TR/skos-primer/) or similar ontologies) the following solution simply annotates one or multiple possible hierarchies.
The hierarchies definition are always done top-down with one or multiple roots, following predicates of your choosing and the leaves must be the final observations inside one dimension.
The hierarchy annotation is attached to a cube dimension as a meta:Hierarchy through meta:inHierarchy (similar to a meta:Relation).
It holds a name describing the hierarchy as such with schema:name
and least one, but potentially multiple meta:hierarchyRoot. Finally the connection between the root nodes and the first level below in the hierarchy is attached through meta:nextInHierarchy.
The simplest example above puts the two concepts countries and cantons in relation.
With the use of Property Paths (sh:path
) the connection between to levels in the hierarchy is expressed.
As a guideline we suggest support minimally support one step Predicate Paths and inverse one step Predicate Paths.
More complex paths will depend on the support of the used applications.
If the predicate using sh:path
is not distinct enough, it is possible to add sh:targetClass
specify additionally the Class of which the sh:path
is pointing to.
With the use of meta:nextInHierarchy
it is possible to extend the number of levels indefinitely. Once a path does point to a instance of a concept which is attached by the defined dimension, the hierarchy for this element is complete. Therefore is it possible that the levels change for different levels.
It is possible to generate a minimal constraint given a Cube
and a set of Observation
s.
SPARQL CONSTRUCT queries will be provided in this repository.
Cube Viewer (Homepage) is both an app and a reusable component to visualize data cubes based on Cube Schema.
A demo of the app is deployed at cube-viewer.zazuko.com.
An example Cube is specified in cube.ttl. The cube provides a constraint in constraint.ttl.
This section is a work in progress, the wording and terminology still need some thought.
The validation process of the cube can be divided into three different aspects.
The node package barnard59-cube
includes commands to validate cubes and their constraints. Validation commands can be used with SHACL shapes defined here.
To use, install barnard59
CLI and the barnard59-cube
package globally:
npm install -g barnard59 barnard59-cube
Validation commands provide SHACL validation reports in case of violations:
To get a human-readable summary of the report, chain another command, available with the barnard59-shacl
package:
Even though Cube Schema is a very lightweight vocabulary, there is a minimal set of rules to make a valid Cube. We provide a basic cube constraint to check this (notice that input includes both cube and constraint).
When a cube provides an optional observation constraint through the observationConstaint property, this can be tested as well.
The constraint should be a SHACL shape but it's expected to not have any target declaration. The validation tool takes care of making all the observations a target for the constraint.
Notice that a single file is suitable only for small cubes: the check-observations
command can process big cubes splitting the input into chunks but the constraint file is expected to fit in memory.
Further options and details are described in the documentation.
If constraints are used to provide guidance for interaction with the cube (e.g. visualizations), the constraints themselves should conform to a structure that interaction can deal with.
The standalone constraint can be extended to meet the specific requirements for the intended interaction.
The The RDF Data Cube Vocabulary is probably the oldest vocabulary in the domain of RDF for representing cubes. The authors of this document used RDF Data Cubes extensively in the past and ran into multiple issues with it.
The authors of this specification are grateful for the work done by the original authors of the RDF Data Cube Vocabulary specification. This work would not have been possible without it and some parts look pretty much like the RDF Data Cube Vocabulary.
It was considered to either clarify or update the RDF Data Cube Vocabulary specification. For the sake of simplicity, it was decided to start from scratch.
There are at least two efforts that extend the RDF Data Cube Vocabulary to address some of its limitations:
Both efforts could likely be solved/addressed within the Cube Schema approach, this needs to be validated by interested parties.
The Semantic Sensor Network Ontology defines a simplified model for describing observations from a sensor in RDF.
The Observation model is at least inspired by the RDF Data Cube Vocabulary but it is not very useful for use-cases outside of sensor networks.
It should be relatively easy to replace SSN observations with the Cube Schema.