OME-IRIS is an open bioimage dataset catalog for benchmarking image input/output (IO), transformations, metadata management, and bioimage-linked workflows.
We also provide a small Python package by the same name (ome_iris) to help fetch and validate the datasets in the catalog.
Inspired by both the classic iris.csv dataset and the iris of the eye that brings images into focus, OME-IRIS aims to provide a collection of reference datasets for evaluating interoperable bioimage data formats, tools, and workflows.
- A lightweight manifest catalog for small benchmark datasets
- A fetch + verify workflow with a single CLI
- LinkML-based schema definitions for dataset manifests
- Not a data portal
- Not DVC-based
- Not a large-file git storage approach
- Not a full ontology or end-to-end benchmark system yet
uv run ome-iris fetch --tier small
uv run ome-iris verify
uv run ome-iris export-rocrate --dataset nf1-cellpainting-shrunkenDownload a reproducible subset for local development or benchmarking:
uv run ome-iris download nf1 \
--output .benchmark-data/ome-iris/nf1 \
--preset tiny \
--channel DAPIPython API:
from ome_iris import datasets
datasets.download(
"nf1",
output_dir=".benchmark-data/ome-iris/nf1",
subset={"images": 20, "channels": ["DAPI"]},
)Fetch output modes:
uv run ome-iris fetch --tier small --verbose # show per-file labels + downloader progress
uv run ome-iris fetch --tier small --silent # suppress downloader progress outputHigh-level flow when you run ome-iris fetch:
- Loads dataset manifests from
--manifests-dir. - Applies optional filters (
--dataset,--tier). - Creates local dataset roots under
--data-dir/<source_identifier>/. - Writes
ro-crate-metadata.jsoninto each dataset root. - Iterates over each
filesentry:- for
kind: file: downloads the file URL (or skips if already present) - for
kind: directory: traverses/downloads directory contents (or extracts archive sources)
- for
- Reports a summary:
- downloaded count + item list
- skipped count + item list
- missing URLs
- failed downloads
Output layout example:
data/
NF1_cellpainting_data_shrunken/
ro-crate-metadata.json
profiles.parquet
images/
masks/
Local files are stored under ./data/ by default.
Each dataset directory also gets ro-crate-metadata.json with source/provenance metadata from the manifest.
To use another data directory:
uv run ome-iris fetch --data-dir /tmp/ome-iris-data
uv run ome-iris verify --data-dir /tmp/ome-iris-dataome-iris download creates a small, reproducible subset under the exact --output
directory. It supports named dataset aliases such as nf1, preset sizes
(tiny, small, benchmark), image limits, channel filters, plate/well/site
filters, and Z/T/C ranges where filenames expose those values.
Downloaded subsets include manifest.json with the source dataset, selected
subset options, downloaded file paths, source URLs, SHA-256 checksums, file
sizes, image shapes, dtypes, and file metadata. Existing files are reused and
included in the manifest. Use --validate-only to verify an existing subset
cache against its manifest without downloading data:
uv run ome-iris download nf1 \
--output .benchmark-data/ome-iris/nf1 \
--validate-only- Add or update a dataset manifest and catalog metadata.
- Include source, formats, and file-level metadata.
- Run:
uv run ome-iris verifyStarter scaffolding command:
uv run ome-iris scaffold --source-path /path/to/JUMP_plate_BR00117006
uv run ome-iris scaffold --source-path /path/to/JUMP_plate_BR00117006 --append-csv
uv run ome-iris scaffold --source-path /path/to/JUMP_plate_BR00117006 --include-directory-entry --directory-path images --archive-format zipThe command guesses a dataset id/name/formats, writes a starter YAML manifest, and prints a suggested datasets.csv row.
source_identifieris required at the top level of each manifest.- All
files[].pathvalues are relative todata/<source_identifier>/. sha256is optional for file entries.- Use
kind: directoryto fetch everything under a directory source.- For GitHub tree URLs (
https://github.com/<owner>/<repo>/tree/<ref>/<path>), OME-IRIS traverses files under that subtree. - For local directory paths, OME-IRIS recursively copies files.
- For archive URLs, set
archive_format(ziportar) to extract an archive into the destination directory.
- For GitHub tree URLs (
Use an optional top-level relationships list to describe links between dataset components.
from: source file path (must match afiles[].path)to: target file path (must match afiles[].path)type: relationship label (for examplelinks_to_images_by,links_to_masks_by,references_metadata)rocrate_predicate: explicit RO-Crate/JSON-LD predicate URI for export (required)via_columns(optional): explicit table columns used for linkingfilename_patterns(optional): standardized filename templates used by the relationshipderived_from_columns(optional): columns used when deriving one component from another (for example images -> masks)
Example:
files:
- path: profiles.parquet
- path: images
kind: directory
relationships:
- from: profiles.parquet
to: images
type: links_to_images_by
rocrate_predicate: http://schema.org/associatedMediaExample directory entry:
files:
- path: jump-plate/images
kind: directory
archive_format: zip
url: https://example.org/jump-plate-images.zip
sha256: "" # optionalOME-IRIS supports custom metadata as a first-class field via custom_metadata objects at manifest, source, and file levels.
Rules:
custom_metadatamust be an object/map.- Keys must be strings.
- Values may be strings, numbers, booleans, null, lists, or nested objects.
Example:
id: jump-plate
source_identifier: JUMP_plate_BR00117006
name: JUMP plate BR00117006 (JUMP_plate_BR00117006) example
description: Plate-level cell painting benchmark subset.
tier: small
license: CC-BY-4.0
custom_metadata:
study: jump-cp
species: human
source:
repository: https://example.org/repo
path: datasets/JUMP_plate_BR00117006
url: https://example.org/repo/tree/main/datasets/JUMP_plate_BR00117006
formats: [csv, tiff]
files:
- path: profiles.csv
url: https://example.org/files/profiles.csv
sha256: "..."
custom_metadata:
role: profile_tableLarge image/profile files make repositories slow and fragile for contributors and CI. OME-IRIS tracks metadata and download locations, while actual data is fetched locally when needed.
Build docs locally:
uv sync --group docs
uv run --frozen sphinx-build docs/src docs/build