Getting started
di# Getting Started with FAIR²
This guide provides the foundational steps to begin using the FAIR² (FAIR Squared) specification for preparing, validating, and integrating AI-ready datasets.
What is FAIR²?
FAIR² extends the original FAIR principles—Findable, Accessible, Interoperable, and Reusable—by introducing capabilities that make datasets:
- AI-Ready: Structured for compatibility with machine learning workflows.
- Context-Rich: Enhanced with metadata that captures provenance, methodology, and ethical considerations.
- Machine-Actionable: Validated using SHACL rules and described using JSON-LD and schema.org standards.
FAIR² builds upon ML Croissant and uses SHACL to enforce schema compliance, enabling robust dataset validation and seamless downstream use in AI pipelines.
Step 1: Install Required Tools
To work with FAIR² metadata, you may require the following:
Core Requirements
- Python 3.8+
- RDF Libraries for validation:
rdflibpyshacl- ML Croissant
- TensorFlow and/or PyTorch (optional, for AI training pipelines)
Installation
Use the following command to install recommended packages:
pip install ml-croissant torch tensorflow rdflib pyshacl
This installs the following components:
ml-croissant: Metadata handling and dataset loadingtorchandtensorflow: AI frameworksrdflib,pyshacl: RDF graph processing and SHACL validation
Note: The FAIR² Validator CLI is under development and will be released in a future version.
Step 2: Define a FAIR² Metadata File
A FAIR²-compliant dataset includes a metadata file named fair2.json written in JSON-LD format. This file should include:
- A globally unique identifier (e.g., DOI or URI)
- Dataset description and license
- Distribution entries describing downloadable data assets
- References to validation shapes and method sections (optional)
Minimal Example
{
"@context": "https://fair2.ai/spec/fair2_context",
"@type": "Dataset",
"name": "Example AI-ready Dataset",
"description": "A dataset demonstrating FAIR² compliance",
"license": "https://creativecommons.org/licenses/by/4.0/",
"distribution": [
{
"@type": "DataDownload",
"contentUrl": "https://example.com/dataset.csv",
"encodingFormat": "text/csv"
}
]
}
Step 3: Validate FAIR² Metadata Using SHACL
FAIR² metadata is designed to be machine-validated. SHACL validation ensures that datasets conform to the required structure.
Using pySHACL
To run validation locally:
pyshacl -s shapes/dataset.json -d fair2.json
Where:
- shapes/dataset.json is your SHACL shape graph (provided by FAIR²)
- fair2.json is your dataset metadata file
Web-Based Validation (Coming Soon)
A browser-based validator is under development and will allow validation of FAIR² metadata via file upload or URL.
Step 4: Integrate with ML Croissant
FAIR² metadata is compatible with ML Croissant, allowing seamless integration with machine learning pipelines.
Example: Loading FAIR² Dataset into TensorFlow
from mlcroissant import Dataset
import tensorflow as tf
dataset = Dataset("fair2.json")
tensorflow_dataset = tf.data.Dataset.from_generator(
lambda: dataset, output_types=(tf.float32, tf.int32)
)
for image, label in tensorflow_dataset:
# Train your TensorFlow model here
This workflow allows you to describe your dataset once and reuse it across multiple AI environments without additional preprocessing.
Next Steps
Once you have created and validated your FAIR² metadata file, you can:
- Explore the full FAIR² Schema to describe your data in greater detail.
- Use SHACL validation to check compliance.
- Combine FAIR² with ML Croissant features for scalable dataset reuse.
- Contribute to the specification by providing feedback or submitting extensions.
FAIR² is actively evolving. For updates, please refer to the project roadmap and join the community discussions on specification development.