Skip to content

feat(auth): recognise known OpenFn clients and use the keys we hold for them#550

Open
stuartc wants to merge 7 commits into
mainfrom
client-auth
Open

feat(auth): recognise known OpenFn clients and use the keys we hold for them#550
stuartc wants to merge 7 commits into
mainfrom
client-auth

Conversation

@stuartc

@stuartc stuartc commented Jun 24, 2026

Copy link
Copy Markdown
Member

What & why

Apollo had no way to tell its callers apart. Every request brought its own Anthropic key, which we forwarded straight to the LLM, so we couldn't associate a request with a particular OpenFn instance or hold a key on a client's behalf. As self-hosted Lightning instances start pointing at our shared hosted Apollo, that matters.

This adds per-client recognition: a known client's request is served with the key we hold for them, while callers we don't recognise carry on exactly as they do on main today. No Lightning-side change is required.

Fixes #545

How it works

We have introduced an "auth token", mapping a hashed key that can be provisioned and mapped against an Anthropic API key.

An always-on authenticate hook on /services/* looks up the api_key already in the request body by its SHA-256 in a lightning_clients table:

  • Known client → the inbound key is never forwarded to the LLM. It's swapped for the client's stored anthropic_api_key, or stripped to fall back to the global key when that column is null.
  • Unknown key → forwarded only if it's sk-ant--shaped (bring-your-own); an unknown non-sk-ant- key is rejected with 401 so a Lightning credential can't leak to the LLM.
  • Internal Apollo-to-Apollo calls → exempt via a per-process token.

Auth tokens are encrypted at rest using AES-256-GCM and cached in memory (current for ~60s).

Changes to the key (either key existence or DB availability) are reflected after expiry.

In production environments it's advised that the clients schema is kept separately, so there is a second (optional) env variable, APOLLO_CLIENTS_DB_URL, which falls back to POSTGRES_URL when absent.

There is a CLI script to help with auth tokens and keys via bun run client. API tokens are added and rotated via stdin, to prevent tokens being logged or stored in shell history. Bear in mind that since auth tokens are hashed, once they've been printed in the terminal they are unrecoverable.

Commits

Six logical commits, readable in order — Bun/lockfile chore, DB pool + migration runner, the auth hook, the provisioning CLI, tests, then docs.

Other changes

Both CI and the Docker image build use the version of Bun declared in .tool-versions, and Bun is now pinned to the current latest (1.3.14).

Known Issues

The WebSockets endpoint doesn't work right now, because WS handshakes can't carry the body credential. This will be sorted in a follow-up.

stuartc added 7 commits June 24, 2026 14:03
Pin the Bun version in .tool-versions and replace the binary bun.lockb with the text-format bun.lock so dependency changes show up in diffs. Update CI and the Dockerfile to match.
Add a shared pg pool (db/index.ts) and a migration runner (db/migrate.ts) that applies the SQL files in platform/migrations in order. Includes the first migration, which creates the lightning_clients table that client auth reads from.
Add an always-on authenticate hook on /services/* that maps the api_key in the request body to a Lightning client via its SHA-256 in the lightning_clients table. On a known match the inbound key is never forwarded to the LLM: it is swapped for the client's stored anthropic_api_key, or stripped to fall back to the global key when that column is null. An unknown key is forwarded only if it is sk-ant-shaped, otherwise rejected with 401. Internal Apollo-to-Apollo calls are exempt via a per-process token injected into each Python child.

Lookups are cached in memory with a single-flight, stale-while-revalidate refresh. Stored keys may be AES-256-GCM encrypted at rest. When the clients DB is unreachable the hook fails closed with 503, and decrypt, refresh and token-mismatch failures are reported to Sentry.
Add a client CLI (bun run client) for provisioning Lightning clients: create, list, rotate and remove rows in lightning_clients, reading secrets from stdin and encrypting them at rest. Add a migrate script to run pending migrations.
Add tests for the authenticate hook and key resolution, the client CLI (store, commands, secret reading), token hashing against fixed vectors, the encryption helper, the migration runner, and server startup. Extend the existing server tests to inject a configured auth instance.
Document the client auth model in the root README and the platform/src/auth README, note the APOLLO_CLIENTS_DB_URL / POSTGRES_URL fallback and the APOLLO_ENC_KEY / APOLLO_INTERNAL_TOKEN variables in .env.example and CLAUDE.md, and add the changeset.
Build the tag list in the manipulate-tag step and append openfn/apollo:latest
only when the version has no hyphen. Pre-release tags (e.g. 1.4.0-pre.0) now push
only their versioned image and leave :latest pointing at the last final release.
@stuartc stuartc marked this pull request as ready for review June 25, 2026 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Epic: Authentication MVP

1 participant