Skip to content

feat: add Linear (linear.app) data source plugin#8900

Open
eduardoarantes wants to merge 29 commits into
apache:mainfrom
eduardoarantes:feat/linear-plugin
Open

feat: add Linear (linear.app) data source plugin#8900
eduardoarantes wants to merge 29 commits into
apache:mainfrom
eduardoarantes:feat/linear-plugin

Conversation

@eduardoarantes
Copy link
Copy Markdown

@eduardoarantes eduardoarantes commented Jun 3, 2026

Summary

Adds a new Linear (linear.app) data source plugin, plus its config-ui registration and a Grafana dashboard.

The plugin follows DevLake's 3-stage ETL using framework helpers:

  • Collect (GraphQL → _raw_linear_*) via NewStatefulApiCollector + CreateAsyncGraphqlClient with rate-limit pacing; incremental collection uses a server-side updatedAt filter.
  • Extract (_raw_linear_*_tool_linear_*).
  • Convert (_tool_linear_* → domain tables).

Scope = Linear Teamticket.Board. Auth = personal API key (Authorization: <key>). Status is mapped deterministically from WorkflowState.type (no user-supplied mapping): triage,backlog,unstarted → TODO, started → IN_PROGRESS, completed,canceled → DONE.

Domain mappings:

  • Team → boards
  • Issues → issues + board_issues + issue_assignees (with assignee/creator names)
  • Comments → issue_comments
  • Labels → issue_labels
  • Cycles → sprints + sprint_issues
  • Issue history → issue_changelogs, and lead/cycle time derived from in-progress→done transitions
  • Users → accounts

Also includes:

  • config-ui: Linear connection form (endpoint + personal API key), Teams data scope backed by remote-scopes, and the scope-id mapping.
  • Grafana: grafana/dashboards/Linear.json (per-tool dashboard, like Jira/Asana).

Does this close any open issues?

Closes #8901

Tests

  • e2e DataFlowTester tests for every extractor and convertor under backend/plugins/linear/e2e, plus unit tests for status mapping, the incremental filter, scope/remote-scope mapping, and board/lead-time/long-title edge cases.
  • make build and golangci-lint run are clean; the full plugins/linear/... suite passes against MySQL.

Other Information

  • All commits are DCO signed-off and use conventional commit messages.
  • Edge cases covered: every WorkflowState.type (incl. triage/unknown→OTHER), unassigned issue, issue with no cycle, multiple labels, lead/cycle-time from history, resolution-before-creation guard, and long issue titles/URLs.
  • A Linear dashboard screenshot can be added to this PR description in the GitHub UI.

eduardoarantes and others added 29 commits May 29, 2026 10:58
Add the Linear plugin's tool-layer data models (connection, team scope,
scope config, account, issue, comment, issue label, workflow state, cycle,
issue history) and the initial schema migration with archived snapshots.

The connection authenticates with a personal API key passed verbatim in the
Authorization header (Linear uses no Bearer prefix).

Signed-off-by: Eduardo Rodrigues <2961314+eduardoarantes@users.noreply.github.com>
Wire the Linear plugin entry point and implement all required plugin
interfaces (meta, init, task, api, model, source, migration, blueprint v200,
closeable). Add connection/scope/scope-config CRUD via the data-source helper,
a test-connection endpoint that runs a GraphQL viewer query, and a rate-limited
async GraphQL client that injects the API key via a bare Authorization header.

SubTaskMetas is intentionally empty; collectors are added per entity in
following commits.

Signed-off-by: Eduardo Rodrigues <2961314+eduardoarantes@users.noreply.github.com>
Add the users GraphQL collector (paginated), extractor to
_tool_linear_accounts, and convertor to the domain crossdomain.Account
table, wired as the first three subtasks. Includes an e2e dataflow test
with raw fixtures and verified snapshots.

Signed-off-by: Eduardo Rodrigues <2961314+eduardoarantes@users.noreply.github.com>
Add the team-scoped workflow states GraphQL collector and extractor into
_tool_linear_workflow_states. These states (backlog/unstarted/started/
completed/canceled) drive deterministic issue status mapping. Includes an
e2e test covering all five state types.

Signed-off-by: Eduardo Rodrigues <2961314+eduardoarantes@users.noreply.github.com>
Add the team-scoped issues GraphQL collector (incremental via updatedAt
ordering, inline labels), extractor to _tool_linear_issues and
_tool_linear_issue_labels, and convertor to domain ticket.Issue and
ticket.BoardIssue.

Status maps deterministically from Linear's WorkflowState.type
(backlog/unstarted->TODO, started->IN_PROGRESS, completed/canceled->DONE);
priority maps to its label; lead time falls back to resolution minus
creation. Includes an e2e test spanning all state types, unassigned issues,
issues without a cycle, and multi-label issues.

Signed-off-by: Eduardo Rodrigues <2961314+eduardoarantes@users.noreply.github.com>
Add a per-issue comments GraphQL collector (driven by an input iterator over
collected issues, with pagination), an extractor that recovers the owning
issue id from the raw input column, and a convertor to domain
ticket.IssueComment. Includes an e2e dataflow test.

Signed-off-by: Eduardo Rodrigues <2961314+eduardoarantes@users.noreply.github.com>
Add the convertor from _tool_linear_issue_labels (populated inline by the
issue extractor) into the domain ticket.IssueLabel table. Includes an e2e
test covering issues with multiple labels and with none.

Signed-off-by: Eduardo Rodrigues <2961314+eduardoarantes@users.noreply.github.com>
Add the team-scoped cycles GraphQL collector and extractor, plus convertors
producing domain ticket.Sprint and ticket.BoardSprint (status derived from
completedAt), and ticket.SprintIssue linking issues to their cycle. Includes
an e2e dataflow test covering closed/active cycles and issues with/without a
cycle.

Signed-off-by: Eduardo Rodrigues <2961314+eduardoarantes@users.noreply.github.com>
Add a per-issue history GraphQL collector (input iterator over issues, with
pagination), an extractor capturing state transitions including state types,
and a convertor to domain ticket.IssueChangelogs with mapped from/to status
values. Lead time is already derived from the issue's native
startedAt/completedAt. Includes an e2e test of a full
backlog->started->completed lifecycle.

Signed-off-by: Eduardo Rodrigues <2961314+eduardoarantes@users.noreply.github.com>
Cover makeScopesV200: a team scope with the ticket entity produces the
expected domain board scope id, and a scope without the ticket entity
produces none.

Signed-off-by: Eduardo Rodrigues <2961314+eduardoarantes@users.noreply.github.com>
Document the Linear plugin: supported entities, tool/domain mapping tables,
deterministic status mapping, priority/type/lead-time handling, API-key auth,
connection/scope/pipeline setup examples, rate limiting, and the roadmap
(OAuth, label-based type mapping, config-ui integration).

Signed-off-by: Eduardo Rodrigues <2961314+eduardoarantes@users.noreply.github.com>
A resolution timestamp (completedAt/canceledAt) earlier than createdAt —
from clock skew or migrated/imported issues — produced a negative duration
that, cast to uint, yields platform-dependent garbage (0 on arm64, ~1.8e19
on amd64). Skip the fallback unless the resolution is after creation so lead
time stays unset instead.

Adds an isolated e2e dataflow test with a fixture whose canceledAt precedes
createdAt, asserting lead_time_minutes is empty.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Eduardo Rodrigues <2961314+eduardoarantes@users.noreply.github.com>
The WorkflowState.type 'triage' (the inbox state issues land in before being
accepted) previously fell through to OTHER, contradicting the documented total
mapping and silently mislabeling triage issues. Map it to TODO; keep OTHER as
the fallback for genuinely unrecognized types so unexpected API values surface.

Adds a unit test covering every documented state type plus triage and an
unknown value.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Eduardo Rodrigues <2961314+eduardoarantes@users.noreply.github.com>
The struct was documented as the shared inline-user shape but was never
referenced; each collector declares its own inline user struct. Removing it
avoids misleading a maintainer into editing a type nothing reads.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Eduardo Rodrigues <2961314+eduardoarantes@users.noreply.github.com>
Every other Linear collector uses a page size of 100; issues used 50, which
doubled the number of issue-page round-trips and the iterator size that drives
the per-issue comment/history collectors. Linear permits first: 250.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Eduardo Rodrigues <2961314+eduardoarantes@users.noreply.github.com>
The issue convertor set only assignee_id/creator_id, leaving the denormalized
assignee_name/creator_name columns blank and writing no issue_assignees rows,
so dashboards reading those columns or joining through issue_assignees showed
blank names. Preload account display names (matching the account convertor's
displayName-then-name rule) and emit an IssueAssignee per assigned issue.

The issue dataflow test now loads accounts before conversion and asserts the
names plus issue_assignees; the lead-time test flushes accounts to stay
order-independent on the shared test DB.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Eduardo Rodrigues <2961314+eduardoarantes@users.noreply.github.com>
Sprint membership is derived from each issue's cycle_id, and the batch divider
only deletes outdated rows when it produces at least one row of the type. When
every issue is moved out of its cycle the convertor emits nothing, so the
divider never fires and prior sprint_issues rows linger, leaving issues shown
in sprints they no longer belong to. Delete the team's sprint_issues up front
so the result is correct regardless of how many issues remain in a cycle.

Adds a two-run e2e test that empties every issue's cycle and asserts
sprint_issues is empty afterward.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Eduardo Rodrigues <2961314+eduardoarantes@users.noreply.github.com>
The LinearIssue.LeadTimeMinutes field was never populated, so lead time always
fell back to the coarse createdAt -> resolutionDate span. Derive it instead from
the recorded history: the span from an issue's first transition into an
in-progress state to its first transition into a done state thereafter (active
cycle time), which is the value that genuinely requires history. ConvertIssues
still seeds the fallback; ConvertIssueHistory now overrides it when the
transitions exist, and issues lacking them keep the fallback.

Adds an e2e test asserting issue-1 (started 05-02, completed 05-03) resolves to
1440 minutes from history rather than its 2880-minute created->resolved span.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Eduardo Rodrigues <2961314+eduardoarantes@users.noreply.github.com>
Other first-party ticket plugins (jira, asana, github) expose
connections/:connectionId/remote-scopes so the config UI can browse and select
scopes from the API. Linear had none, forcing users to hand-craft a PUT /scopes
with raw team UUIDs they had no in-product way to discover. Wire the standard
DsRemoteApiProxyHelper + DsRemoteApiScopeListHelper and a lister that queries
the GraphQL teams connection (flat list, cursor-paginated) through the
connection's authenticated client.

Adds unit tests for the response->scope-entry mapping, the pagination cursor,
and the route registration.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Eduardo Rodrigues <2961314+eduardoarantes@users.noreply.github.com>
Both child collectors used a plain GraphqlCollector and swept every issue in
the team on every run, issuing one request per issue with no since filter -
tens of thousands of requests per run on a large team against Linear's ~1500
req/hour budget. Switch them to a stateful collector and restrict the driving
cursor to issues updated since the last successful collection, so steady-state
runs scale with the change delta rather than the whole backlog. A full sync
(since == nil) still sweeps every issue.

Adds a unit test for the incremental cursor-clause builder.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Eduardo Rodrigues <2961314+eduardoarantes@users.noreply.github.com>
Incremental collection relied on the issues query returning newest-first and a
client-side early-stop, but the query pinned no sort direction (Linear's orderBy
is a scalar enum with no direction operand). If the server default were
ascending, the early-stop would fire on the first (oldest) row and collect
almost nothing. Pass a server-side IssueFilter { updatedAt: { gt: since } }
instead and drop the early-stop, so correctness no longer depends on an
undocumented default ordering. A full sync passes an empty filter (match all).

Adds a unit test pinning the filter's JSON shape to Linear's IssueFilter input.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Eduardo Rodrigues <2961314+eduardoarantes@users.noreply.github.com>
The _tool_linear_issues.lead_time_minutes column was never populated (the
collector never requested it and no extractor set it). Now that lead time is
derived into the domain ticket.Issue directly -- from state-transition history
when available, otherwise the createdAt->resolutionDate fallback in the issue
convertor -- the tool-layer field is pure dead weight. Remove it from the model,
the init migration's archived model, the convertor, and the extractor snapshot.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Eduardo Rodrigues <2961314+eduardoarantes@users.noreply.github.com>
Adds the Linear data source to config-ui so it appears in the connection
picker: connection form (endpoint + personal API key + proxy + rate limit),
a flat Teams data-scope backed by the plugin's remote-scopes endpoint, and the
Linear logo. No scope-config transformation — Linear's status mapping is
deterministic. Wired into the plugin registry.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Eduardo Rodrigues <2961314+eduardoarantes@users.noreply.github.com>
getPluginScopeId fell through to the default (scope.id) for Linear, but a
LinearTeam scope is keyed by teamId and has no id field — so the blueprint
referenced an undefined scopeId and patching failed with 'LinearTeam not found'.
Add a linear case returning scope.teamId.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Eduardo Rodrigues <2961314+eduardoarantes@users.noreply.github.com>
Adds grafana/dashboards/Linear.json (cloned from the Asana ticket-dashboard
template) so Linear ships a per-tool dashboard like every other ticket plugin.
Its board picker is scoped to Linear (boards id like 'linear%'); the 13 panels
(throughput, lead/cycle time, status distribution, delivery rate, sprints) read
the shared domain tables. Auto-loaded via Grafana file provisioning.

Signed-off-by: Eduardo Rodrigues <2961314+eduardoarantes@users.noreply.github.com>
board_issues and sprint_issues referenced a board_id (boardIdGen over
LinearTeam), but nothing ever created the ticket.Board row itself, so the
domain boards table stayed empty. Board-scoped dashboards (whose board picker
is 'boards where id like linear%') and any board join therefore returned no
data. Add a ConvertTeams subtask that converts the team scope in
_tool_linear_teams into a ticket.Board keyed identically to those references.

Adds an e2e test asserting the board is produced with the matching id.

Signed-off-by: Eduardo Rodrigues <2961314+eduardoarantes@users.noreply.github.com>
_tool_linear_issues.title and .url were varchar(255), but Linear titles can
exceed 255 chars (and the issue URL embeds a title slug), so extraction failed
with 'Error 1406: Data too long for column title'. Drop the varchar limit so
both are longtext, matching the domain issues.title and jira's tool summary.

Adds an e2e test extracting a 300-char title without truncation.

Signed-off-by: Eduardo Rodrigues <2961314+eduardoarantes@users.noreply.github.com>
The GraphQL collector stores the query variables (which carry issueId) in the
raw row's input column, but the comment and history extractors parsed it as
{"Id":...} (SimpleLinearIssue.Id), so the owning issue id came out empty and
the convertor joins produced zero domain comments/changelogs on real data. The
e2e fixtures hand-wrote {"Id":...}, masking it. Parse issueId (with an Id
fallback) and update the fixtures to the real collector shape.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Eduardo Rodrigues <2961314+eduardoarantes@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature][Linear] Add Linear (linear.app) data source plugin

1 participant