OME-IRIS

OME-IRIS is an open bioimage dataset catalog for benchmarking image input/output (IO), transformations, metadata management, and bioimage-linked workflows.

We also provide a small Python package by the same name (ome_iris) to help fetch and validate the datasets in the catalog.

Inspired by both the classic iris.csv dataset and the iris of the eye that brings images into focus, OME-IRIS aims to provide a collection of reference datasets for evaluating interoperable bioimage data formats, tools, and workflows.

What this is

A lightweight manifest catalog for small benchmark datasets
A fetch + verify workflow with a single CLI
LinkML-based schema definitions for dataset manifests

What this is not

Not a data portal
Not DVC-based
Not a large-file git storage approach
Not a full ontology or end-to-end benchmark system yet

Quick start

uv run ome-iris fetch --tier small
uv run ome-iris verify
uv run ome-iris export-rocrate --dataset nf1-cellpainting-shrunken

Download a reproducible subset for local development or benchmarking:

uv run ome-iris download nf1 \
  --output .benchmark-data/ome-iris/nf1 \
  --preset tiny \
  --channel DAPI

Python API:

from ome_iris import datasets

datasets.download(
    "nf1",
    output_dir=".benchmark-data/ome-iris/nf1",
    subset={"images": 20, "channels": ["DAPI"]},
)

Fetch output modes:

uv run ome-iris fetch --tier small --verbose  # show per-file labels + downloader progress
uv run ome-iris fetch --tier small --silent   # suppress downloader progress output

What `fetch` does

High-level flow when you run ome-iris fetch:

Loads dataset manifests from --manifests-dir.
Applies optional filters (--dataset, --tier).
Creates local dataset roots under --data-dir/<source_identifier>/.
Writes ro-crate-metadata.json into each dataset root.
Iterates over each files entry:
- for kind: file: downloads the file URL (or skips if already present)
- for kind: directory: traverses/downloads directory contents (or extracts archive sources)
Reports a summary:
- downloaded count + item list
- skipped count + item list
- missing URLs
- failed downloads

Output layout example:

data/
  NF1_cellpainting_data_shrunken/
    ro-crate-metadata.json
    profiles.parquet
    images/
    masks/

Local files are stored under ./data/ by default. Each dataset directory also gets ro-crate-metadata.json with source/provenance metadata from the manifest.

To use another data directory:

uv run ome-iris fetch --data-dir /tmp/ome-iris-data
uv run ome-iris verify --data-dir /tmp/ome-iris-data

What `download` does

ome-iris download creates a small, reproducible subset under the exact --output directory. It supports named dataset aliases such as nf1, preset sizes (tiny, small, benchmark), image limits, channel filters, plate/well/site filters, and Z/T/C ranges where filenames expose those values.

Downloaded subsets include manifest.json with the source dataset, selected subset options, downloaded file paths, source URLs, SHA-256 checksums, file sizes, image shapes, dtypes, and file metadata. Existing files are reused and included in the manifest. Use --validate-only to verify an existing subset cache against its manifest without downloading data:

uv run ome-iris download nf1 \
  --output .benchmark-data/ome-iris/nf1 \
  --validate-only

Add a dataset

Add or update a dataset manifest and catalog metadata.
Include source, formats, and file-level metadata.
Run:

uv run ome-iris verify

Starter scaffolding command:

uv run ome-iris scaffold --source-path /path/to/JUMP_plate_BR00117006
uv run ome-iris scaffold --source-path /path/to/JUMP_plate_BR00117006 --append-csv
uv run ome-iris scaffold --source-path /path/to/JUMP_plate_BR00117006 --include-directory-entry --directory-path images --archive-format zip

The command guesses a dataset id/name/formats, writes a starter YAML manifest, and prints a suggested datasets.csv row.

File entry patterns

source_identifier is required at the top level of each manifest.
All files[].path values are relative to data/<source_identifier>/.
sha256 is optional for file entries.
Use kind: directory to fetch everything under a directory source.
- For GitHub tree URLs (https://github.com/<owner>/<repo>/tree/<ref>/<path>), OME-IRIS traverses files under that subtree.
- For local directory paths, OME-IRIS recursively copies files.
- For archive URLs, set archive_format (zip or tar) to extract an archive into the destination directory.

Relationships

Use an optional top-level relationships list to describe links between dataset components.

from: source file path (must match a files[].path)
to: target file path (must match a files[].path)
type: relationship label (for example links_to_images_by, links_to_masks_by, references_metadata)
rocrate_predicate: explicit RO-Crate/JSON-LD predicate URI for export (required)
via_columns (optional): explicit table columns used for linking
filename_patterns (optional): standardized filename templates used by the relationship
derived_from_columns (optional): columns used when deriving one component from another (for example images -> masks)

Example:

files:
  - path: profiles.parquet
  - path: images
    kind: directory

relationships:
  - from: profiles.parquet
    to: images
    type: links_to_images_by
    rocrate_predicate: http://schema.org/associatedMedia

Example directory entry:

files:
  - path: jump-plate/images
    kind: directory
    archive_format: zip
    url: https://example.org/jump-plate-images.zip
    sha256: ""  # optional

Custom metadata (first-class)

OME-IRIS supports custom metadata as a first-class field via custom_metadata objects at manifest, source, and file levels.

Rules:

custom_metadata must be an object/map.
Keys must be strings.
Values may be strings, numbers, booleans, null, lists, or nested objects.

Example:

id: jump-plate
source_identifier: JUMP_plate_BR00117006
name: JUMP plate BR00117006 (JUMP_plate_BR00117006) example
description: Plate-level cell painting benchmark subset.
tier: small
license: CC-BY-4.0
custom_metadata:
  study: jump-cp
  species: human
source:
  repository: https://example.org/repo
  path: datasets/JUMP_plate_BR00117006
  url: https://example.org/repo/tree/main/datasets/JUMP_plate_BR00117006
formats: [csv, tiff]
files:
  - path: profiles.csv
    url: https://example.org/files/profiles.csv
    sha256: "..."
    custom_metadata:
      role: profile_table

Why large files are not committed

Large image/profile files make repositories slow and fragile for contributors and CI. OME-IRIS tracks metadata and download locations, while actual data is fetched locally when needed.

Documentation

Build docs locally:

uv sync --group docs
uv run --frozen sphinx-build docs/src docs/build

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.agents/skills		.agents/skills
.github		.github
docs/src		docs/src
src/OME_IRIS		src/OME_IRIS
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
AGENTS.md		AGENTS.md
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OME-IRIS

What this is

What this is not

Quick start

What `fetch` does

What `download` does

Add a dataset

File entry patterns

Relationships

Custom metadata (first-class)

Why large files are not committed

Documentation

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OME-IRIS

What this is

What this is not

Quick start

What fetch does

What download does

Add a dataset

File entry patterns

Relationships

Custom metadata (first-class)

Why large files are not committed

Documentation

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

What `fetch` does

What `download` does

Packages