Active Learning

active_learning is a backend-agnostic library for selecting samples by ID, attaching artifacts, and emitting results through explicit sinks.

This repository is the standalone home for the active_learning Python package. The core package can be installed without autocrane-cloud; CRID and Sama workflows live under explicit integration modules and require the CRID runtime to provide the interface package.

Installation

For library development:

uv pip install -e .

For the CRID-backed app and scripts from an autocrane-cloud/apps/crid shell:

uv pip install -e /Users/lukas/repos/active-learning[app,crid]

The app launchers expect autocrane-cloud next to this checkout. Set AUTOCRANE_CLOUD_PATH=/path/to/autocrane-cloud if it lives elsewhere.

Architecture

The flow is:

source IDs -> image provider -> scorers -> selectors -> SelectionResult -> sinks

The core rules are:

SampleId is the canonical sample representation.
Scorers derive per-sample artifacts and scores, including the cached brightness stats used for early filtering.
Selectors choose the final subset.
Sinks consume SelectionResult and handle outputs or side effects.
Integrations adapt CRID and Sama to the core flow.

Package Layout

core/
- shared runtime primitives: config loading, image provider, selection orchestration, and core types
providers/
- model and inference utilities: Unet loading, batch extraction, and uncertainty scoring helpers
scorers/
- score, artifact derivation, and brightness-based pre-filtering keyed by sample ID
selectors/
- final subset selection from candidates plus artifacts
sinks/
- SelectionResult consumers that emit outputs or side effects
integrations/
- backend-specific adapters, currently CRID and Sama, to the core flow
strategies/
- reusable selection recipes built from lower-level pieces
scripts/
- thin CLI entrypoints around the library pieces
tests/
- unit and integration coverage for the package

Notes

core/config.py owns config parsing and validation.
core/image_provider.py owns image materialization and caching.
providers/ is separate from core/; it contains model/inference utilities, not image storage or CRID access.

Strategies

The seed.py CLI accepts the following values for --strategy:

coreset
- Pure diversity selection over image features. Uses the configured feature model and any labeled seed images as the reference set.
uncertainty_coreset
- Computes uncertainty first, then balances uncertainty and diversity using coreset-style selection.
uncertainty_topk
- Pure uncertainty ranking. Selects the n most uncertain images without a diversity stage.
uncertainty_topk_coreset
- Two-stage uncertainty workflow: first keep the top uncertain candidates, then run coreset selection on that reduced pool.
alges
- Active Learning with Gradient Embeddings for Segmentation. Builds ALGES gradient embeddings from the configured segmentation model and selects with k-means++.
alges_coreset
- Two-stage ALGES workflow: run ALGES to form a candidate pool, then run coreset selection to diversify the final batch.

Related flags used by some strategies:

--provider {mc_dropout,entropy,bald}
- Used by the uncertainty-based strategies.
--aggregation {mean,topk_mean,max} and --topk-fraction
- Control how per-pixel uncertainty maps are reduced to one score per image.
--candidate-multiplier
- Used by uncertainty_topk_coreset to size the intermediate uncertainty shortlist.
--feature-model
- Used by coreset, uncertainty_coreset, uncertainty_topk_coreset, and the coreset stage of alges_coreset.
--method {image,semantic}
- Used by alges and alges_coreset to choose the ALGES embedding variant.

Local Images

Use active-learning-local to run selection on a recursive directory of local images without CRID or Sama:

active-learning-local --images-dir /path/to/images --strategy coreset -n 50

The local runner scans .jpg, .jpeg, .png, .webp, and .bmp files, uses POSIX-style relative paths as sample IDs, and writes a mosaic plus YAML handoff next to the configured mosaic path. coreset is the recommended starter strategy because it only needs image features; uncertainty and ALGES strategies still require a configured UNet model.

Example

For a CRID-backed active-learning run with ALGES and Sama export:

seed.py loads the seed config, queries CRID for candidate sample IDs, and builds an ImageProvider.
providers/ supplies the model side of the run: Unet loading, inference, and uncertainty utilities used by ALGES and uncertainty-based strategies.
Brightness filtering removes bad candidates, then scorers compute features, uncertainty, or ALGES embeddings.
A selector chooses the final SelectionResult.
sinks/mosaic.py can render a preview mosaic, and sinks/yaml.py writes the interactive seed handoff.
If sama_project_id is set, the CRID export sink submits the selected samples and the Sama sink creates the batch.

In practice, this is the path for "pick a batch of images from CRID, inspect the selection, and push it to Sama for annotation."

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
active_learning		active_learning
build/lib/active_learning		build/lib/active_learning
frontend		frontend
sama_frontend		sama_frontend
scripts		scripts
tests		tests
.gitignore		.gitignore
README.md		README.md
app.sh		app.sh
pyproject.toml		pyproject.toml
sama_app.sh		sama_app.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Active Learning

Installation

Architecture

Package Layout

Notes

Strategies

Local Images

Example

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Active Learning

Installation

Architecture

Package Layout

Notes

Strategies

Local Images

Example

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages