Skip to content
@sdif-format

SDIF Format

Semantic Data Interchange Format. A compact, canonicalizable, AI-friendly data format for structured, auditable machine workflows.

SDIF Format

Semantic Data Interchange Format

Compact, semantic and canonicalizable structured data
for AI agents, deterministic workflows and human-auditable records.

What is SDIF? · Ecosystem · Why it exists · Status · Contributing

PyPI Python versions Status Canonicalizable Open tooling


pip install sdif-format

PyPI package · v1.0.0 release · Documentation


Compact

Less repeated structure.
Fewer wasted tokens.
Semantic

Tables, relations,
metadata and intent.
Canonical

Stable output for hashing,
signing and comparison.
Auditable

Designed to be read,
reviewed and trusted.


What is SDIF?

SDIF — Semantic Data Interchange Format is a compact, canonicalizable and AI-friendly data format for structured information that needs to move cleanly between humans, tools, agents and deterministic workflows.

It is designed for cases where data should be:

  • small enough to be efficient in AI context windows;
  • structured enough for machines;
  • readable enough for humans;
  • deterministic enough for hashing, signing and reproducible workflows;
  • semantic enough to express tables, relations, metadata and intent.

@sdif 1.0

kind Plan
id release.v1
title "Release readiness plan"

items[id,status,owner,evidence]:
  R1 done build "reports/build.md"
  R2 open qa "reports/tests.md"
  R3 done security "reports/audit.md"

rel:
  release.v1 validated_by R1
  release.v1 blocked_by R2
  release.v1 governed_by R3

Structured information closer to a document,
while still behaving like a contract.



Ecosystem

This GitHub organization hosts the official SDIF ecosystem: the core format, reference tooling, benchmarks, examples, libraries, and editor extensions.

PYTHON CLIENT & CLI

sdif-py

Specification, parser, canonicalizer, and CLI.
The normative reference implementation.

Explore sdif-py →

SPECIFICATION (SSOT)

sdif-spec

Official format specification, canonicalization rules,
and portable conformance test suite.

View specification →

BENCHMARKS

sdif-benchmarks

Reproducible benchmark datasets and reports comparing SDIF with JSON, YAML, XML, and CSV.

View benchmarks →

RUST IMPLEMENTATION

sdif-rs

Pure Rust parser implementation with a span-annotated AST designed for editor tooling.

Explore sdif-rs →

LANGUAGE SERVER (LSP)

sdif-lsp

LSP language server binary (tower-lsp) providing real-time diagnostics and IDE features.

View sdif-lsp →

EDITOR INTEGRATION

vscode-sdif

VS Code extension client providing syntax highlighting, diagnostics, and LSP configuration.

Open extension →

GRAMMAR FOUNDATION

tree-sitter-sdif

Tree-sitter grammar foundation for syntax highlighting and incremental parsing.

Open grammar →

DOCUMENTATION

sdif-format.github.io

Official documentation website containing specification guides, tutorials, and examples.

Read docs →

ORGANIZATION META

.github

Organization profile, assets, and shared community configuration files.

View profile →


Repository map
Repository Purpose
sdif-py Core Python parser, validator, canonicalizer, and CLI
sdif-spec Official format specification and conformance test suite (SSOT)
sdif-benchmarks Benchmark datasets, reports, and comparison tooling
sdif-rs Rust parser crate with span-annotated AST
sdif-lsp LSP language server binary
tree-sitter-sdif Tree-sitter grammar foundation for syntax highlighting
vscode-sdif VS Code extension client for SDIF
sdif-format.github.io Public documentation website (Docusaurus)
.github Organization profile, assets, and shared GitHub community files


Why it exists

Modern software workflows exchange more structured context than ever.

That context moves through APIs, files, prompts, agents, documentation systems, CI pipelines, benchmarks and human reviews. The usual formats all solve part of the problem, but none of them quite match this new middle ground.

JSON

Universal and reliable, but noisy when repeated records dominate.
YAML

Readable, but often too permissive for deterministic workflows.
CSV

Compact, but loses structure, relations and meaning very quickly.
Markdown

Great for humans, but not enough when data must be parsed and verified.

SDIF tries to sit in that gap.

Not as a replacement for every format, but as a focused layer for structured data that needs to remain compact, meaningful, reviewable and reproducible.



Designed for

AI workflows

  • Agent memory snapshots
  • Compact context payloads
  • AI-friendly summaries
  • Tool-to-tool exchange
  • Structured prompt artifacts

Engineering

  • Project plans
  • Roadmaps
  • Registries
  • Manifests
  • Technical specifications

Verification

  • Benchmark reports
  • Canonical records
  • Hashable datasets
  • Golden files
  • Comparison-friendly artifacts


Design principles

Compact by default

Repeated structure should not require repeated noise. SDIF aims to reduce unnecessary bytes and tokens while keeping documents understandable.

Human-auditable

A good SDIF file should be inspectable in a plain text editor. Reviewability is part of the format, not a side effect.

Canonicalizable

Equivalent data should be able to produce deterministic bytes. That matters for hashing, signing, reproducibility and comparison.

Semantic

Data is more than rows and fields. SDIF treats relations, metadata, context and intent as first-class concerns.

AI-friendly

Token efficiency, stable structure and low ambiguity are core design goals, especially for agentic and LLM-assisted workflows.

Practical first

SDIF should be useful, testable and implementable. The format should not require heroics to parse or adopt.



Status

Specification

Stable
v1.0
Python tooling

Parser, CLI,
canonicalization and validation
Distribution

Available on PyPI as
sdif-format

SDIF v1.0.0 is available as a public Python package:

pip install sdif-format
import sdif

The current focus is now on adoption, documentation, conformance and ecosystem tooling:

  • keep the v1.0 format contract stable;
  • improve examples and documentation;
  • expand conformance fixtures;
  • publish reproducible benchmarks;
  • improve editor and syntax tooling;
  • gather feedback from real-world datasets and AI workflows.

We prefer evidence over claims. Benchmarks, golden files and reproducible examples are part of the product, not marketing decoration.



What SDIF is not

SDIF is not trying to replace JSON, YAML, CSV, Markdown, XML, Parquet or Protocol Buffers.

Those formats are useful and battle-tested.

SDIF focuses on a narrower problem:

compact, semantic, canonicalizable structured data
that can move cleanly between humans, machines and AI systems.

That focus is intentional.



Contributing

We are still early, so the most valuable contributions are not only code.

Useful contributions

  • Test SDIF with real datasets
  • Find ambiguous syntax or edge cases
  • Compare SDIF against existing formats
  • Improve documentation and examples
  • Build small tools around the format

Especially welcome

  • Reproducible benchmarks
  • Golden files
  • Parser feedback
  • AI workflow experiments
  • Constructive criticism

Good criticism is welcome. Vague hype is less useful.



Project philosophy

SDIF symbol

We want SDIF to be boring in the best possible way.

Clear syntax Small files Stable output
Readable examples Useful errors Reproducible benchmarks

A format should not require heroics to implement. If SDIF works, it should feel obvious after you use it.



Contact

The best place to follow the project is this GitHub organization.

Useful links:

Constructive criticism, real datasets, benchmark ideas and parser feedback are especially welcome

Pinned Loading

  1. sdif-py sdif-py Public

    Semantic Data Interchange Format. Compact, canonicalizable structured data for AI and deterministic workflows.

    Python 2

Repositories

Showing 9 of 9 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…