Research async building architecture by nforro · Pull Request #233 · packit/research

nforro · 2026-05-20T14:18:05Z

No description provided.

Signed-off-by: Nikola Forró <nforro@redhat.com> Assisted-by: Claude Opus 4.6 via Claude Code

majamassarini

From the proposed solutions, personally I would start with option A (async multiplexing).

However, I am not totally sure we should disregard the idea of relying on the CI builds.
We would like to rely on the CI tests in the future, right? So why not also on the CI builds?
If we mark the MR as incomplete (WIP/draft), I think we can iterate on it multiple times based on the CI feedback. The iteration loop would be almost the same for the builds and for all the other CI checks. Also, we could collect "intermediate feedback from maintainers" in this way, something that other solutions will not give us.
I think it really depends on how many times we usually need to iterate on a build. And, I may be wrong, but as we improve things, I think we are iterating less and less on it.

nforro · 2026-05-25T12:15:28Z

If we mark the MR as incomplete (WIP/draft)

Can we do that? I think we wanted to avoid opening clearly wrong MRs and spamming maintainers - if we can open draft MRs and those can be configured to be silent until converted to regular MRs, then I think we can indeed move the feedback loop post MR creation, with the advantage of incorporating both CI and human feedback.

TomasTomecek · 2026-05-25T16:45:03Z

+
+**Advantages:**
+
+- Minimal code changes — workflows are already async


really love that, no need for celery, just a few code changes

TomasTomecek · 2026-05-25T16:46:22Z

+  since the `build_package` tool call goes through the same async polling loop
+- Single pod, single process — simpler operations
+
+**Challenges:**


how about logs? how hard will it be for us to diagnose issues from logs?

Good point, we would probably have to augment the log messages to be able to differentiate between tasks, similarly to packit-service with Celery.

makes sense, we can prepend with some session id or ideally even with human readable identification, something like "f{jira}-{branch}-{attempt}"

TomasTomecek · 2026-05-25T16:47:25Z

+- **Error blast radius**: one task's crash could affect others in the same
+  process.


it would be great to know if we can prepare the codebase for this change

Response from Claude:

Yes, several incremental prep steps would make Option A easier without committing to it yet:

Add a task/correlation ID to the workflow context and thread it through all log messages (using Python's logging filter or contextvars). This is useful regardless of concurrency — it improves traceability even with single-task pods.

Audit shared mutable state — check if any module-level globals, singletons, or shared file paths (e.g. /git-repos/{issue}/) would collide when two tasks run concurrently. The PVC paths already use {issue} as a namespace, which is good.

Ensure MCP gateway session isolation — verify that the _response_streams dict keyed by RequestId truly handles concurrent tool calls without cross-talk.

Profile actual RSS on a representative set of packages during source processing. This determines whether N=2 is feasible without a quota bump.

Claude: implement it and propose a PR 😅

lbarcziova · 2026-05-27T14:00:27Z

+  error message, upstream patches, repo path). This means conversation
+  continuity across the build wait is **not required**.
+
+## Alternative: Skip Build Verification Entirely


while this would simplify things a lot, I am a bit skeptical - I remember agent sometimes iterated many times on build failures during pilot, and re-pushing 5-10 times to a draft MR might be quite noisy. On the other hand, this might be statistically happening in minority of cases (and maybe more common in rebases than in backports?) But definitely worth validating against recent runs, @opohorel @abobrov any data from your experience on this?

I would advise against skipping the build verification. quite a few backports are not passing through it, forcing the agent to improve the backport. although I saw that the number of consecutive build failures decreasing with newer models, but we are not there yet

lbarcziova · 2026-05-27T14:03:12Z

+1. **What is the actual RSS of a backport pod during source processing?**
+   Feedback from the SE deployment shows large packages (Firefox, Thunderbird)
+   need ~4Gi just for source extraction, making even N=2 concurrency
+   challenging. Profiling smaller packages would clarify the typical case.
+2. **What's the namespace quota expansion path?** Determines feasibility of
+   Option A at higher concurrency.
+3. **How large are typical agent conversations?** Affects state size
+   estimates and memory profiling for Option A.


@nforro would you create a POC issue that we can prioritise and decide which option to go with?

usercont-release-bot added this to Packit pull requests May 20, 2026

github-project-automation Bot moved this to New in Packit pull requests May 20, 2026

nforro force-pushed the async-builds branch from b83f949 to 270cb6a Compare May 20, 2026 14:19

nforro moved this from New to In review in Packit pull requests May 20, 2026

nforro force-pushed the async-builds branch from 270cb6a to ce6c2c6 Compare May 20, 2026 14:53

Research async building architecture

f5d12b9

Signed-off-by: Nikola Forró <nforro@redhat.com> Assisted-by: Claude Opus 4.6 via Claude Code

nforro force-pushed the async-builds branch from ce6c2c6 to f5d12b9 Compare May 20, 2026 15:09

majamassarini approved these changes May 25, 2026

View reviewed changes

TomasTomecek reviewed May 25, 2026

View reviewed changes

lbarcziova approved these changes May 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Research async building architecture#233

Research async building architecture#233
nforro wants to merge 1 commit into
packit:mainfrom
nforro:async-builds

nforro commented May 20, 2026

Uh oh!

majamassarini left a comment

Uh oh!

nforro commented May 25, 2026

Uh oh!

TomasTomecek May 25, 2026

Uh oh!

TomasTomecek May 25, 2026

Uh oh!

nforro May 27, 2026

Uh oh!

TomasTomecek May 27, 2026

Uh oh!

TomasTomecek May 25, 2026

Uh oh!

nforro May 27, 2026

Uh oh!

TomasTomecek May 27, 2026

Uh oh!

lbarcziova May 27, 2026

Uh oh!

opohorel May 27, 2026

Uh oh!

lbarcziova May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants


		Advantages:

		- Minimal code changes — workflows are already async

		- Error blast radius: one task's crash could affect others in the same
		process.

Conversation

nforro commented May 20, 2026

Uh oh!

majamassarini left a comment

Choose a reason for hiding this comment

Uh oh!

nforro commented May 25, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants