Property Assertion Constraints for ontologies and knowledge graphs
DOI | https://doi.org/10.1108/DTA-05-2022-0209 |
Published date | 21 April 2023 |
Date | 21 April 2023 |
Pages | 157-176 |
Author | Henrik Dibowski |
Property Assertion Constraints for
ontologies and knowledge graphs
Henrik Dibowski
Robert Bosch GmbH, Bosch Center for Artificial Intelligence, Renningen, Germany
Abstract
Purpose –The curation of ontologies and knowledge graphs (KGs) is an essential task for industrial
knowledge-based applications, as they rely on the contained knowledge to be correct and error-free. Often,
a significant amount of a KG is curated by humans. Established validation methods, such as Shapes
Constraint Language, Shape Expressions or Web Ontology Language, can detect wrong statements only
after their materialization, which can be too late. Instead, an approach that avoids errors and adequately
supports users is required.
Design/methodology/approach –For solving that problem, Property Assertion Constraints (PACs) have
been developed. PACs extend the range definition of a property with additional logic expressed with
SPARQL. For the context of a given instance and property, a tailored PAC query is dynamically built and
triggered on the KG. It can determine all values that will result in valid property value assertions.
Findings –PACs can avoid the expansion of KGs with invalid property value assertions effectively, as their
contained expertise narrows down the valid options a user can choose from. This simplifies the knowledge
curation and, most notably, relieves users or machines from knowing and applying this expertise, but instead
enables a computer to take care of it.
Originality/value –PACs are fundamentally different from existing approaches. Instead of detecting
erroneous materialized facts, they can determine all semantically correct assertions before materializing
them. This avoids invalid property value assertions and provides users an informed, purposeful assistance.
To the author’s knowledge, PACs are the only such approach.
Keywords SPARQL, Ontology, Constraint validation, Error prevention, Knowledge graph curation,
Property value assertion, SHACL,
Paper type Research paper
1. Introduction
In industrial use cases, the need for having richer and more comprehensive information
available is continuously growing. Ontologies and knowledge graphs (KGs) provide the
technical answer to that need, as they can surpass traditional databases and data formats
such as XML and structured text in many respects. They can surpass them by size and
complexity; the semantics of the contained information is explicitly available, the
information is hence machine-interpretable; information silos can be overcome and
artificial intelligence can be realized via semantic search and reasoning.
In industries, KGs are often used for modeling products, systems, factories, processes or
data and the complex interactions and interrelationships between them. Two prominent use
cases are digital twins and semantic data lakes (Dibowski et al., 2020;Dibowski and
Schmid, 2021), where the KG defines a digital copy of a real-world entity and a semantic
representation of its data. Many KG-driven applications require the defined information to
be comprehensive and free of errors to function accurately. Errors in knowledge can lead to
failure of applications, which can have severe consequences in terms of damage and cost, in
particular when affecting production lines, control systems, vehicles, etc. This imposes an
even bigger challenge in the provisioning, curation and expansion of KGs.
In the beginning, when introducing ontologies and KGs in an enterprise or project for the
first time, a high(er) investment is required. First of all, starting to use a new technology
requires new skill sets, tools, best practices and processes. Secondly, KGs can comprise
much richer, more heterogeneous knowledge than conventional data models, but it requires
ThecurrentissueandfulltextarchiveofthisjournalisavailableonEmeraldInsightat:
https://www.emerald.com/insight/2514-9288.htm
157
Received 20 May 2022
Revised 4 July 2022
Accepted 20 July 2022
Data Technologies and
Applications
Vol. 57 No. 2, 2023
pp. 157-176
© Emerald Publishing Limited
2514-9288
DOI 10.1108/DTA-05-2022-0209
PACs for
ontologies and
knowledge
graphs
more effort and time to create them. Often, this higher investment is an impediment for
deciding to use KGs in an enterprise, despite their utilization pays offin the mid and long
term. KGs can help to facilitate enhanced use cases and functionality, and they can assist or
even help to automate tasks that require human experts, making these processes more
efficient and cost-effective.
Asignificant amoun t of the cost for building a K G is related to the provisi oning of
knowledge to be stored in the KG. While certain parts can be imported from existing
information already available in digitized data models via automated data ingestion
pipelines, utilizing Resource De scription Framework ( RDF) mapping techniqu es such as
RDB to RDF Mapping Langu age (R2RML; Das et al., 2012) or RDF Mapping Language
(RML; Meester et al.,2020), other parts need to be populated by human experts via
dedicated KG tools. This hu man-curated expansion of KGs is t ime-consuming, costl y
and error-prone. Therefore, tools that can aid and support users in this demanding task
are required.
However, no adequate means that can reasonably support a human-curated expansion of
KGs exist. Indeed, mature KG validation methods are available, such as reasoning or
constraint-checking methods like Shapes Constraint Language (SHACL; Knublauch and
Kontokostas, 2017) or Shape Expressions (ShEx; Prud’hommeaux et al., 2019). They,
however, can only detect already materialized errors in the KG, but are of no or limited
value for guiding human-curated expansions and for preventing errors.
For closing this gap, this paper presents Property Assertion Constraints (PACs). Instead
of detecting (already materialized) errors, PACs can prevent errors by guiding humans (but
also machines) during the expansion of KGs. PACs can avoid invalid property value
assertions effectively, as their contained expertise narrows down the options a user can
choose from to only the valid property values. This reduces the number of options, from
possibly several hundreds or thousands of values to a much smaller number, so that it
simplifies the knowledge curation. Notably, PACs relieve the user or machines from
applying or even knowing this expertise, but instead enable a computer to take care of it.
Altogether, PACs enable an informed, error-preventing expansion of KGs.
This paper is a widely enhanced version of the study by Dibowski (2021), in which PACs
for object properties have been introduced for the first time. The coverage of PACs is
extended from object properties to data type properties in this paper, thus now covering all
property value assertions possible in a KG. While in Dibowski (2021) only the concept of
whitelist PAC existed, this paper adds the new types of PACs: blacklist PAC, lower/upper
bound PAC and regular expression (regex) PAC. Besides, an evaluation section and
discussion of limitations have been added. This enhanced scope of the paper comes along
with several new figures, query examples and tables that have been added too.
This paper is structured as follows: Section 2 describes the work related to the validation
and completion of KGs, and Section 3 criticizes major drawbacks and explains the problem
statement. Section 4 is the main part of the paper and describes the PAC approach in detail.
Section 5 finally concludes the paper.
2. Related work
This section presents the work related to the validation and completion of KGs. As the
primary focus of KGs and the proposed approach is about the A-box, this section
concentrates on A-box approaches.
Knowledge validation is a critical task and measures whether statements from KGs are
semantically correct and correspond to the so-called “real”world (Huaman et al., 2020).
Semantically wrong statements can be referred to as errors. Through validation, errors in
a KG can be detected (error detection), and through specific strategies, they can be resolved
DTA
57,2
158
To continue reading
Request your trial