KG Quality Flashcards
(23 cards)
What does ‘Garbage In, Garbage Out’ imply for KG quality?
A KG’s quality directly depends on input data; poor-quality inputs yield poor-quality graphs.
Why is KG quality multi-dimensional?
It involves multiple aspects (accuracy, completeness, consistency, etc.) and is use-case dependent.
How does KG quality differ from software quality?
KGs lack modularization, power unknown downstream systems, and follow unique development processes.
Define ‘Accuracy’ in a KG.
Closeness of recorded values to true values, measured via correctness (bool) or distance metrics.
How can Accuracy be aggregated across a KG?
As the proportion of accurate triples or as a weighted average accuracy.
Define ‘Completeness’ in a KG.
Degree to which required knowledge is present: schema, property, and population completeness.
What is ‘Population Completeness’?
Extent to which all entities in the scoped domain are present in the KG.
Define ‘Consistency’ in a KG.
Absence of conflicting statements; measured by number of inconsistencies detected.
What are ‘Reference Values’?
Ground-truth facts from experts used to measure accuracy and population completeness.
What are ‘Competency Questions’?
SPARQL queries with known results serving as unit tests for KG capabilities.
What role does SHACL play in KG quality?
Defines shape constraints (cardinality, datatype, class) for automated quality validation.
Define ‘Syntactic Validity’.
Conformance to RDF syntax: well-formed triples, correct prefixes, literals, and serialization grammar.
Define ‘Timeliness’ for a KG.
Degree to which KG data is up-to-date, measured via timestamps, update latency, and volatility.
What is ‘Freshness’ in KG quality?
Coverage and recency of timestamped data and the age-to-volatility ratio.
Define ‘Conciseness’.
Avoidance of redundant or duplicate schema elements and data instances in the KG.
Name a metric for Conciseness.
Ratio of unique instances to total instances; ratio of unique predicates in schema.
Define ‘Understandability’.
Ease of human comprehension, supported by labels, comments, and readable IRI patterns.
What metric measures Understandability?
Proportion of classes/properties with rdfs:label and rdfs:comment annotations.
Why is Human Judgment valid in KG quality?
Quality is subjective and context-dependent; expert judgment is valid when acknowledged.
Define ‘Availability’ in decentralized KGs.
Accessibility of KG data via endpoints, dumps, or dereferenceable URIs.
Define ‘Latency’ in KG performance.
Delay between query request and the start of the response from the KG service.
What is ‘Interlinking’ in Linked Data?
Degree of external links (e.g., owl:sameAs) per entity, indicating cross-KG connections.
How can interlinking affect KG quality?
Improves completeness but may introduce conciseness and consistency challenges when merging.