Apollo is OpenFn's knowledge, AI and data platform, providing services to support the OpenFn toolchain.
Apollo is known as the God of (among other things) truth, prophecy and oracles.
This repo contains:
- A bunjs-based webserver
- A number of python-based AI services
- A number of Typescript-based data services
This README covers running, debugging and deploying the server. Deeper docs live alongside the code:
- Architecture — see Server Architecture below, and each service's own README for service-specific detail.
- Contributing & service conventions —
CONTRIBUTING.mdexplains how to add and structure a Python service (entry.py, imports, logging, code quality). - Services — every service has its own README with payload specs and examples. See the Services index below.
- Instance auth —
platform/src/auth/README.mdcovers authenticating/services/*per client and managing per-client Anthropic keys. - Testing —
services/testing/README.mddocuments the shared acceptance-test harness (YAML specs + LLM-as-judge); per-service guides live in each service'stests/.
To run this server locally, you'll need the following dependencies to be installed:
- python 3.11 (yes, 3.11 exactly, see Python Setup)
- poetry
- bunjs
We recommend using asdf with the python plugin installed.
To run the server locally, you need to install the python dependencies
poetry installThen start the server (note that bun install is not needed, see below)
bun startTo start a hot-reloading development server which watches your typescript, run:
bun devTo see an index of the available language services, head to localhost:3000.
This repo uses poetry to manage dependencies.
We use an "in-project" venv , which means a .venv folder will be created when
you run poetry install.
All python is invoked through entry.py, which loads the environment properly
so that relative imports work.
You can invoke entry.py directly (ie, without HTTP or any intermediate js) through bun from the root:
bun py echo --input tmp/payload.json
Bun does not require an installation, like npm does. You can run bun start
right after cloning the repo.
Bun will then install dependencies against the global cache on your machine.
This still uses the lockfile (bun.lockb).
To update a module version, run bun add <module>@<version>, which will update
your lockfile.
One drawback of this is that there is no intellisense, because IDEs rely on
node_modules to load d.ts files. You are welcome to run bun install to run
from a node_modules. None of this affects python.
See bun's install docs for more details.
To communicate with and test the server, you can use @openfn/cli.
Use the apollo command with your service name and pass a json file:
openfn apollo echo tmp/payload.json
Pass --staging, --production or --local to call different deployments of
apollo.
To default to using your local server, you can set an env var:
export OPENFN_APOLLO_DEFAULT_ENV=local
Or pass an explicit URL if you're not running on the default port:
export OPENFN_APOLLO_DEFAULT_ENV=http://localhost:6666
Output will be shown in stdout by default. Pass -o path/to/output/.json to
save the output to disk.
You can get more help with:
openfn apollo help
Note that if a service returns a { files: {} } object in the payload, and you
pass -o with a folder, those files will be written to disk.
Some services require API keys.
Rather than coding these into your JSON payloads directly, keys can be loaded
from the .env file at the root. See .env.example for the
full list of keys and env vars Apollo reads.
Also note that tmp dirs are untracked, so if you do want to store credentials
in your json, keep it inside a tmp dir and it'll remain safe and secret.
The server defaults to port 3000. You can test any service directly with curl to confirm Apollo is working independently of Lightning (or any other client).
For example, to trigger a workflow_chat stream:
curl -N -X POST http://localhost:3000/services/workflow_chat/stream \
-H "Content-Type: application/json" \
-d '{"content":"make a simple http workflow","history":[],"api_key":"<your-anthropic-api-key>"}'The api_key field is your Anthropic API key. If ANTHROPIC_API_KEY is already
set in your .env, you can omit it. In Lightning, this is configured via the
ANTHROPIC_API_KEY environment variable and passed through to Apollo on each
request.
The -N flag disables buffering so SSE events appear as they arrive. You should
see a stream of event: log lines followed by event: complete. An
event: error response means the issue is inside Apollo.
If the stream returns successfully here but Lightning isn't receiving it, the
issue is on the Lightning side -- check that
APOLLO_ENDPOINT=http://localhost:3000 is set correctly in Lightning's
environment (no trailing slash).
To check API key connectivity (Anthropic, OpenAI, Pinecone), hit the status service:
curl http://localhost:3000/services/statusIf you get errors like poetry: command not found (error code 127), and poetry
is set up on your machine, you may need to add these env vars to your .bashrc
(or whatever you use):
export BUN_INSTALL="$HOME/.bun"
export PATH="$BUN_INSTALL/bin:$PATH"
The Apollo server uses bunjs with the Elysia framework.
It is a very lightweight server. By default it includes no authentication, but instance auth can be enabled (see Database below).
Python services are hosted at /services/<name>. Each service expects a POST
request with a JSON body, and will return JSON.
There is very little standard for formality in the JSON structures to date. The server may soon establish some conventions for better interopability with the CLI.
Python scripts are invoked through a child process. Each call to a service runs in its own context.
Python modules are pretty free-form but must adhere to a minimal structure. See the Contribution Guide for details.
Each service lives in services/<name>/ and is auto-mounted by service
discovery. Every mounted service has its own README with payload specs and
examples — start there for anything service-specific.
| Service | What it does |
|---|---|
global_chat |
Orchestrator and single entry point for OpenFn AI chat; routes to subagents or escalates to a planner. Also see its PAYLOAD_SPEC.md. |
job_chat |
AI chat for OpenFn job code, with a code-suggestions/auto-patch mode. |
workflow_chat |
Generates and edits OpenFn workflow YAML, preserving job code and IDs. |
doc_agent_chat |
Agentic chat over a project's uploaded documents (RAG). |
| Service | What it does |
|---|---|
search_docsite |
Semantic search over the OpenFn docs (docsite Pinecone index). |
embed_docsite |
Downloads and indexes the OpenFn docs into the docsite index. |
doc_agent_upload |
Fetches and indexes project documents into the doc-agent index. |
| Service | What it does |
|---|---|
load_adaptor_docs |
Parses adaptor function docs into Postgres. |
search_adaptor_docs |
Queries adaptor docs back out of Postgres by version. |
latest_adaptors |
Fetches the latest adaptor versions from the OpenFn repo. |
adaptor_apis |
TypeScript service: produces a JSON schema of an adaptor's API. |
| Service | What it does |
|---|---|
vocab_mapper |
Maps medical vocabularies (LOINC/SNOMED) against the apollo-mappings index. (No README yet.) |
embeddings |
Vector-store wrapper used by the vocab services. |
embed_loinc_dataset |
Embeds the LOINC dataset into apollo-mappings. |
embed_snomed_dataset |
Embeds the SNOMED dataset into apollo-mappings. |
embeddings_demo |
Standalone embeddings demo (Zilliz). |
| Service | What it does |
|---|---|
status |
Health check: validates Anthropic, OpenAI and Pinecone keys. |
echo |
Test service that returns its input; useful for verifying the pipeline. |
auth |
Instance-auth hook + provisioning (server layer, under platform/, not a mounted service). See Database. |
testing |
Shared acceptance-test harness (not a mounted service). |
Apollo uses Postgres for two tables: adaptor_function_docs (parsed adaptor
docs, used by load_adaptor_docs / search_adaptor_docs) and
lightning_clients (the instance-auth allow-list).
There is no migration framework. The schema is just two schema.sql files you
apply with psql, both written with CREATE TABLE IF NOT EXISTS so re-running
them is safe:
services/load_adaptor_docs/schema.sql—adaptor_function_docs. This table is also created lazily the first timeload_adaptor_docsruns, so applying it by hand is optional.lightning_clients— created and kept current by the migration runner (platform/src/db/migrate.ts, migrations underplatform/migrations/). It is applied automatically at Apollo startup whenPOSTGRES_URLis set; no manualpsqlstep is needed.
First, make sure you've configured your desired POSTGRES_URL in your .env
file.
Create a Postgres DB matching your POSTGRES_URL from the .env file
set -a; . ./.env; set +a; psql "$POSTGRES_URL" -c "DROP TABLE IF EXISTS lightning_clients, adaptor_function_docs CASCADE;"
lightning_clients is migrated automatically at Apollo startup (see
platform/src/db/migrate.ts). Apply the adaptor-docs schema separately:
set -a; . ./.env; set +a; psql "$POSTGRES_URL" -f services/load_adaptor_docs/schema.sql
/services/* can be authenticated so that only known clients (e.g. specific Lightning
instances) may call it, with Apollo using each client's own Anthropic API
key for that client's requests.
- It is transparent and backward compatible (map-if-known-else-forward): the
auth hook is always active but only swaps in a key when it recognises the caller.
Clients are looked up in the
lightning_clientstable viaPOSTGRES_URL; if that table can't be reached, known-client swaps simply don't resolve and every caller degrades to the forward path (it does not blanket-reject). - The credential is the
api_keythe caller already sends in the request body — there is no bearer token, noAuthorizationheader, and no Lightning-side change. Apollo stores only a SHA-256 hash of it. - On a match, the inbound
api_keyis treated purely as a credential and is never forwarded to the LLM: it is replaced with the client's stored Anthropic key (so LLM usage bills to that client), or stripped — falling back to the globalANTHROPIC_API_KEY— if the client has no stored key. - An unrecognised key is forwarded unchanged only if it looks like an
Anthropic key (prefix
sk-ant-) — this is the bring-your-own-key path. An unrecognised key that is notsk-ant--shaped is a likely Lightning credential, so it is rejected (401) rather than forwarded, which would leak it to the LLM. A request with noapi_keyfalls back to the global key. - Health/root endpoints (
/livez,/status,/) are outside/services/*and never subject to the auth hook. Internal Apollo-to-Apolloapollo()calls are exempt via a per-process internal token (APOLLO_INTERNAL_TOKEN), not by network position.
To enable it and provision clients, see
platform/src/auth/.
APOLLO_INTERNAL_TOKEN is the mechanism that lets internal apollo() self-calls
through the auth hook: a self-call carries it in the x-apollo-internal header and the
hook matches it. Because the global ANTHROPIC_API_KEY is dev-only, a token that
fails to match is a dead end (a 401), not a soft fallback to the global key — so
the match has to work in every topology.
- Always set
APOLLO_INTERNAL_TOKENto a shared value across the deployment in production. When it is set, the per-process minting path never runs. - The per-process random mint is a dev-only convenience. Apollo assumes one
Bun process per host; the
apollo()self-call relies on loopback calls landing on the same process, so a minted token only works single-process-per-host. - If
reusePortclustering is ever enabled, a sharedAPOLLO_INTERNAL_TOKENis required, not optional: a self-call can otherwise be routed to a sibling process that minted a different token and will401. Startup logs the token's provenance (env vs minted) and warns when this dangerous combination is detected; a mismatch at the hook is logged as a distinct, greppable warning.
Every service can receive connections in one of two ways:
- HTTP POST method
- Websocket connection
The same URL is used for both connections, clients must request upgrade to a websocket.
Websocket connections will receive a live log stream.
Websockets use the following events:
start: sent by the client with a JSON payload in the data key.
complete: sent by the server when the python script has completed. The result
is a JSON payload in the data key.
log: sent by the server whenever the python process logs a line through a
logger object.
Note that print() statements do not get sent out to the web socket, as these
are intended for local debugging. Only logs from a logger object are diverted.
To build the docker image:
docker build . -t openfn-apolloTo run it on port 3000
docker run -p 3000:3000 openfn-apolloSee the Contribution Guide for more details about how and where to contribute to the Apollo platform.
New releases are assembled as Docker images whenever a version tag of the form
@openfn/apollo@x.y.x is pushed to GitHub.
This tag is automatically generated upon merging to main.
Github's main should represent the latest production version of apollo.
Ideally, releases should be assembled on a branch - usually release/next or
release/1.2.3. But this is not required - releases can be cut straight from a
fix or feature branch, or even from main.
To release a new apollo version:
- Checkout the branch that contains the release
- Run
bun changeset version - (if there are no changesets, you can either run
bun changesetto create one, or manually bumppackage.jsonand updatechangelog.md) - Sanity check the new version number and changelog updates, just to be sure there's no funny stuff.
- Commit changes and push
- When the PR is merged to main, a new tag is generated and a new Docker image is built