Fix ResizableFile.write() crash when a single resize is not enough by Generalsimus · Pull Request #199 · bakwc/PySyncObj

Generalsimus · 2026-06-17T18:40:21Z

What breaks and why

ResizableFile.write() contains two bugs that together cause an
IndexError: mmap slice assignment is wrong size crash on Linux when a
late-joining raft node receives its first large catch-up batch from the leader.

Bug 1 — `if` instead of `while` (wrong size after one resize)

# before
if offset + size > self.__mm.size():
    self.__mm.resize(int(self.__mm.size() * self.__resizeFactor))  # doubles once
self.__mm[offset:offset + size] = values  # ← IndexError if still too small

resizeFactor is 2.0, so the mmap doubles once. If the value being
written is larger than 2 × current_mmap_size, one doubling is not
enough and the assignment crashes.

Concrete example (Patroni / pysyncobj 0.3.15, Ubuntu 24.04, Python 3.12):
The journal starts at 1 KB (initialSize=1024). When a new node joins an
existing 3-node cluster it receives the full committed raft log as a
catch-up batch. That batch includes the initial DCS bootstrap config
(PostgreSQL parameters, pg_hba, etc.) which serialises to several KB.
The first write is at offset 40 with size ≈ 3–5 KB. One resize brings
the mmap to 2 KB — still too small — and the write crashes.

Bug 2 — `except SystemError` misses `OSError` on Linux / Python 3

try:
    self.__mm.resize(newSize)
except SystemError:       # ← never triggered on Linux
    self.__extand(...)    # ← fallback never runs

On Linux with Python 3, mmap.resize() calls ftruncate + mremap.
When mremap fails it raises OSError, not SystemError. The
__extand fallback that re-opens the file is therefore silently skipped
on every Linux deployment, leaving the mmap at its original size.

Fix

Change if → while so we keep resizing until the mmap is actually
large enough.
Use max(doubled_size, offset + size) so a single iteration always
covers arbitrarily large values without looping.
Catch (SystemError, OSError) in both resize call-sites so the
__extand fallback works on Linux.

# after
while offset + size > self.__mm.size():
    currSize = self.__mm.size()
    newSize = max(int(currSize * self.__resizeFactor), offset + size)
    try:
        self.__mm.resize(newSize)
    except (SystemError, OSError):
        self.__extand(newSize - currSize)
self.__mm[offset:offset + size] = values

How to reproduce

Start a 3-node pysyncobj/Patroni cluster simultaneously. The third node
to join (which receives the catch-up batch rather than bootstrapping from
scratch) will log:

IndexError: mmap slice assignment is wrong size
  File ".../pysyncobj/journal.py", line 104, in write
    self.__mm[offset:offset + size] = values

and then loop indefinitely unable to reach the leader, while the other
two nodes operate normally.

Observed with: pysyncobj 0.3.15, Patroni 4.1.3, Python 3.12, Ubuntu 24.04 (Linux 6.x, KVM).

Two bugs in ResizableFile: 1. write() used `if` instead of `while` for the resize guard. A single doubling of the mmap may still not be large enough if the value being written is bigger than 2× the current mmap size. The assignment then fails with: IndexError: mmap slice assignment is wrong size The fix loops until the mmap is actually large enough, and also uses max(doubled_size, offset+size) so one iteration always suffices even for arbitrarily large values. 2. Both resize call-sites caught only SystemError as the fallback trigger for __extand(). On Linux with Python 3, mmap.resize() raises OSError (not SystemError) when mremap() fails. The fallback was therefore never reached on Linux, leaving the mmap at its original size and causing the write to crash. Fix catches both exception types. Observed in the wild with Patroni 4.1.3 / pysyncobj 0.3.15 on Ubuntu 24.04 (Python 3.12). When a late-joining raft node receives its first catch-up batch from the leader the batch can be several kilobytes; with a 1 KB initial mmap a single resize to 2 KB is still not enough, hitting bug 1. Bug 2 masks the __extand() fallback on every Linux deployment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ResizableFile.write() crash when a single resize is not enough#199

Fix ResizableFile.write() crash when a single resize is not enough#199
Generalsimus wants to merge 1 commit into
bakwc:masterfrom
Generalsimus:fix-resizable-file-write-overflow

Generalsimus commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Generalsimus commented Jun 17, 2026

What breaks and why

Bug 1 — if instead of while (wrong size after one resize)

Bug 2 — except SystemError misses OSError on Linux / Python 3

Fix

How to reproduce

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Bug 1 — `if` instead of `while` (wrong size after one resize)

Bug 2 — `except SystemError` misses `OSError` on Linux / Python 3