SHACL Validation in FAIR²
What is SHACL?
SHACL (Shapes Constraint Language) is a W3C standard for validating RDF data. FAIR² uses SHACL to ensure that datasets conform to structured metadata requirements, enabling: - Schema compliance – Ensuring datasets follow FAIR² metadata rules. - Interoperability – Enforcing structured descriptions for AI-ready datasets. - Automated validation – Catching missing metadata and inconsistencies.
Why Validate with SHACL?
FAIR² uses SHACL to:
Ensure datasets meet FAIR² schema constraints.
Detect missing or incorrect metadata.
Standardize metadata across ML Croissant & Schema.org.
Improve dataset quality & machine-actionability.
🚀 How to Validate Your Dataset with SHACL
1️⃣ Install Required Tools
To validate a dataset against FAIR²’s SHACL rules, install:
pip install pyshacl rdflib
2️⃣ Prepare Your Dataset Metadata
Make sure you have a FAIR² metadata file (fair2.json), structured according to FAIR² Schema.
Example dataset metadata (fair2.json):
{
"@context": "https://fair2.ai/ns/",
"@type": "Dataset",
"name": "AI-ready Dataset",
"description": "A dataset for training machine learning models.",
"license": "https://creativecommons.org/licenses/by/4.0/",
"distribution": [
{
"@type": "DataDownload",
"contentUrl": "https://example.com/data.csv",
"encodingFormat": "text/csv"
}
]
}
3️⃣ Run SHACL Validation
To validate against FAIR²’s SHACL constraints, run:
pyshacl -s fair2-shapes.ttl -d fair2.jsonld
Valid dataset:
Validation Report
Conforms: True
❌ Invalid dataset (e.g., missing schema:license):
Validation Report
Conforms: False
Violation: Missing required property schema:license
FAIR² SHACL Rules
FAIR²’s SHACL rules ensure datasets comply with structured metadata requirements.
FAIR² SHACL Rules
FAIR²’s SHACL rules ensure datasets comply with structured metadata requirements.
| Validation Type | Requirements |
|---|---|
Dataset Validation (DatasetShape) |
- Must include dataset name, description, author, license, and identifier. - Must have at least one distribution file ( schema:distribution). - Should include a citation ( schema:citation) and a preferred citation format (cr:citeAs). |
Data Article Validation (DataArticleShape) |
- If provided, must link to a scholarly article (schema:ScholarlyArticle). - Should have a method section ( fair2:methodSection). |
Methodology Validation (MethodSectionShape & MethodStepShape) |
- Method sections (fair2:MethodSectionShape) must contain at least one step (fair2:step). - Method steps ( fair2:MethodStepShape) should: - Have a name ( schema:name) and description (schema:description). - Optionally reference next steps ( schema:nextItem). |
Namespaces
The FAIR² schema relies on multiple vocabularies to ensure interoperability with existing standards such as schema.org, ML Croissant, and Contributor Role Ontology (CRO).
Below are the key namespaces used in this schema:
| Prefix | Namespace URI |
|---|---|
fair2 |
https://fair2.ai/ontology# |
schema |
https://schema.org/ |
cr |
https://mlcommons.org/croissant# |
sh |
http://www.w3.org/ns/shacl# |
rdfs |
http://www.w3.org/2000/01/rdf-schema# |
xsd |
http://www.w3.org/2001/XMLSchema# |
rai |
https://fair2.ai/ontology/responsibleAI# |
obo |
http://purl.obolibrary.org/obo/ |
prov |
http://www.w3.org/ns/prov# |
For a detailed description of these namespaces and their usage in the FAIR² ontology, refer to:
📖 Ontology Documentation.
🎯 Why SHACL is Essential for FAIR²
Guarantees dataset compliance with FAIR² and ML Croissant. Reduces metadata errors, improving dataset usability. Ensures AI-ready metadata for machine learning pipelines. Provides structured validation for research integrity.
🚀 Next Steps
- Validate your dataset with SHACL.
- See dataset examples to understand real-world usage.
- Learn about JSON-LD & RDF for AI-ready metadata.