FAIR² Responsible AI (RAI) — Integration Guide

This guide explains how FAIR² metadata aligns with the ML Croissant RAI specification and how to author, validate, and query Responsible-AI information in your datasets.

Scope. This document uses the exact property names listed in the Croissant RAI “Overview” table (e.g., rai:dataLimitations, rai:dataCollection, rai:useCases, etc.) and shows how they fit within FAIR² JSON‑LD. See the official spec for authoritative definitions:
https://docs.mlcommons.org/croissant/docs/croissant-rai-spec.html#overview-croissant-rai-properties-and-use-cases

1. Namespaces & Conformance

Use the following prefixes in your dataset JSON‑LD:

{
  "@context": {
    "schema": "https://schema.org/",
    "cr": "http://mlcommons.org/croissant/",
    "rai": "http://mlcommons.org/croissant/rai/",
    "dct": "http://purl.org/dc/terms/"
  }
}

We recommend declaring conformance with the RAI profile via dct:conformsTo:

{
  "dct:conformsTo": "http://mlcommons.org/croissant/rai/"
}

2. Core RAI Properties (exact names)

The table below lists the core Croissant RAI properties this library expects. All are plain strings unless otherwise noted by your governance process. Your project may adopt additional RAI terms as the spec evolves—always consult the official spec for the latest guidance.

Property	Purpose (short)	Cardinality	Required?*
`rai:dataLimitations`	Known gaps, caveats, or limitations of the data	0..1	Recommended
`rai:dataCollection`	How the data was collected (procedures, settings)	0..1	Recommended
`rai:useCases`	Intended/appropriate uses	0..1	Recommended
`rai:dataReleaseMaintenance`	Release cadence, maintenance & support commitments	0..1	Recommended
`rai:annotationPlatform`	Platform(s) used for annotation	0..1	Optional
`rai:annotationsPerItem`	How many annotations per item	0..1	Optional
`rai:annotatorDemographics`	Demographic info about annotators (if appropriate/allowed)	0..1	Optional
`rai:machineAnnotationTools`	ML tools used to pre‑label or assist annotation	0..1	Optional
`rai:dataBiases`	Known or suspected biases	0..1	Recommended
`rai:personalSensitiveInformation`	Whether/how PII or sensitive info is handled	0..1	Recommended

* FAIR² does not mandate a minimum set beyond Croissant’s baseline, but strongly encourages filling all Recommended fields to improve transparency and downstream risk assessment.

3. Minimal JSON‑LD example (FAIR² + Croissant RAI)

Below is a compact, copy‑paste‑ready snippet. You can merge it with your dataset’s main JSON‑LD.

{
  "@context": {
    "schema": "https://schema.org/",
    "cr": "http://mlcommons.org/croissant/",
    "rai": "http://mlcommons.org/croissant/rai/",
    "dct": "http://purl.org/dc/terms/"
  },
  "@type": "schema:Dataset",
  "name": "Example FAIR² Dataset with RAI",
  "version": "1.0",
  "dct:conformsTo": "http://mlcommons.org/croissant/rai/",
  "rai:dataLimitations": "Labels may be noisy for low-light scenes.",
  "rai:dataCollection": "Captured via mobile devices in public indoor spaces.",
  "rai:useCases": "Robust indoor navigation and scene understanding research.",
  "rai:dataReleaseMaintenance": "Quarterly updates; issues tracked in public repository.",
  "rai:annotationPlatform": "In-house tool; audit logs retained.",
  "rai:annotationsPerItem": "3",
  "rai:annotatorDemographics": "Adult bilingual annotators from multiple regions.",
  "rai:machineAnnotationTools": "YOLOv8 for pre‑labels; human review required.",
  "rai:dataBiases": "Over‑representation of shopping malls; under‑representation of hospitals.",
  "rai:personalSensitiveInformation": "PII removed per policy; faces blurred at source."
}

4. Validating RAI with `fair2py`

Fair2Dataset loads your JSON‑LD, runs Croissant validation and FAIR² SHACL checks (including RAI shape coverage where applicable), and keeps the full JSON‑LD in metadata_full.

from fair2py.dataset import Fair2Dataset

# Path or URL to your dataset JSON-LD
path = "path/to/dataset.json"

ds = Fair2Dataset(path)

# Full JSON-LD (with RAI) preserved:
rai_fields = {
    k: ds.metadata_full.get(k)
    for k in [
        "rai:dataLimitations",
        "rai:dataCollection",
        "rai:useCases",
        "rai:dataReleaseMaintenance",
        "rai:annotationPlatform",
        "rai:annotationsPerItem",
        "rai:annotatorDemographics",
        "rai:machineAnnotationTools",
        "rai:dataBiases",
        "rai:personalSensitiveInformation",
    ]
}
print({k: v for k, v in rai_fields.items() if v})

If you see a warning about non‑standard @context, that is informative (from Croissant) and does not block FAIR² validation. Ensure that the rai: prefix is present and values are strings or IRIs as appropriate for your governance rules.

5. Quick SPARQL to inspect RAI fields

The following SPARQL query enumerates RAI predicates and values when your graph is parsed into rdflib.

from rdflib import Namespace, URIRef
from rdflib.namespace import RDF

RAI = Namespace("http://mlcommons.org/croissant/rai/")
SCHEMA = Namespace("https://schema.org/")

q = """
PREFIX schema: <https://schema.org/>
PREFIX rai: <http://mlcommons.org/croissant/rai/>

SELECT ?dataset ?p ?o
WHERE {
  ?dataset a schema:Dataset ;
           ?p ?o .
  FILTER(STRSTARTS(STR(?p), STR(rai:)))
}
ORDER BY ?p
LIMIT 200
"""

for row in ds.graph.query(q):
    print(row)

6) Authoring tips & best practices

Be concrete, not generic. Prefer actionable statements over boilerplate (e.g., name bias sources, collection settings, audit processes).
Keep provenance. When feasible, link to documents (IRIs) that describe templated processes or internal policies.
Avoid sensitive disclosures. If describing annotator demographics, comply with local privacy laws and internal policies. Use aggregated or high‑level descriptors when needed.
Version RAI text. Update RAI fields alongside dataset releases; note changes in your changelog.

7) Troubleshooting

RAI fields missing in ds.metadata: That’s expected—Croissant keeps its own schema. Use ds.metadata_full to access all original JSON‑LD (including RAI).
Prefixes look like literals (e.g., "schema:name"): Ensure your JSON‑LD uses prefix keys in @context and not quoted strings in SHACL/Turtle. If converting shapes, normalize predicates to IRIs.
Validation passes but shapes don’t trigger: Check your target class (@type: schema:Dataset) and that the RAI predicates are actually used in the data being validated.

8) References

ML Croissant RAI spec (properties & use cases):
https://docs.mlcommons.org/croissant/docs/croissant-rai-spec.html#overview-croissant-rai-properties-and-use-cases
ML Croissant (main spec): https://docs.mlcommons.org/croissant/
FAIR² project: https://fair2.ai/

Last updated: Generated by fair2py docs automation.