Skip to content

SHACL Validation in FAIR²

What is SHACL?

SHACL (Shapes Constraint Language) is a W3C standard for validating RDF data. FAIR² uses SHACL to ensure that datasets conform to structured metadata requirements, enabling: - Schema compliance – Ensuring datasets follow FAIR² metadata rules. - Interoperability – Enforcing structured descriptions for AI-ready datasets. - Automated validation – Catching missing metadata and inconsistencies.


Why Validate with SHACL?

FAIR² uses SHACL to: Ensure datasets meet FAIR² schema constraints.
Detect missing or incorrect metadata.
Standardize metadata across ML Croissant & Schema.org.
Improve dataset quality & machine-actionability.


🚀 How to Validate Your Dataset with SHACL

1️⃣ Install Required Tools

To validate a dataset against FAIR²’s SHACL rules, install:

pip install pyshacl rdflib

2️⃣ Prepare Your Dataset Metadata

Make sure you have a FAIR² metadata file (fair2.json), structured according to FAIR² Schema.

Example dataset metadata (fair2.json):

{
  "@context": "https://fair2.ai/ns/",
  "@type": "Dataset",
  "name": "AI-ready Dataset",
  "description": "A dataset for training machine learning models.",
  "license": "https://creativecommons.org/licenses/by/4.0/",
  "distribution": [
    {
      "@type": "DataDownload",
      "contentUrl": "https://example.com/data.csv",
      "encodingFormat": "text/csv"
    }
  ]
}

3️⃣ Run SHACL Validation

To validate against FAIR²’s SHACL constraints, run:

pyshacl -s fair2-shapes.ttl -d fair2.jsonld

Valid dataset:

Validation Report
Conforms: True

❌ Invalid dataset (e.g., missing schema:license):

Validation Report
Conforms: False
Violation: Missing required property schema:license

FAIR² SHACL Rules

FAIR²’s SHACL rules ensure datasets comply with structured metadata requirements.

FAIR² SHACL Rules

FAIR²’s SHACL rules ensure datasets comply with structured metadata requirements.

Validation Type Requirements
Dataset Validation (DatasetShape) - Must include dataset name, description, author, license, and identifier.
- Must have at least one distribution file (schema:distribution).
- Should include a citation (schema:citation) and a preferred citation format (cr:citeAs).
Data Article Validation (DataArticleShape) - If provided, must link to a scholarly article (schema:ScholarlyArticle).
- Should have a method section (fair2:methodSection).
Methodology Validation (MethodSectionShape & MethodStepShape) - Method sections (fair2:MethodSectionShape) must contain at least one step (fair2:step).
- Method steps (fair2:MethodStepShape) should:
  - Have a name (schema:name) and description (schema:description).
  - Optionally reference next steps (schema:nextItem).

Namespaces

The FAIR² schema relies on multiple vocabularies to ensure interoperability with existing standards such as schema.org, ML Croissant, and Contributor Role Ontology (CRO).

Below are the key namespaces used in this schema:

Prefix Namespace URI
fair2 https://fair2.ai/ontology#
schema https://schema.org/
cr https://mlcommons.org/croissant#
sh http://www.w3.org/ns/shacl#
rdfs http://www.w3.org/2000/01/rdf-schema#
xsd http://www.w3.org/2001/XMLSchema#
rai https://fair2.ai/ontology/responsibleAI#
obo http://purl.obolibrary.org/obo/
prov http://www.w3.org/ns/prov#

For a detailed description of these namespaces and their usage in the FAIR² ontology, refer to:
📖 Ontology Documentation.


🎯 Why SHACL is Essential for FAIR²

Guarantees dataset compliance with FAIR² and ML Croissant. Reduces metadata errors, improving dataset usability. Ensures AI-ready metadata for machine learning pipelines. Provides structured validation for research integrity.

🚀 Next Steps