Skip to content

Fix ResizableFile.write() crash when a single resize is not enough#199

Open
Generalsimus wants to merge 1 commit into
bakwc:masterfrom
Generalsimus:fix-resizable-file-write-overflow
Open

Fix ResizableFile.write() crash when a single resize is not enough#199
Generalsimus wants to merge 1 commit into
bakwc:masterfrom
Generalsimus:fix-resizable-file-write-overflow

Conversation

@Generalsimus

Copy link
Copy Markdown

What breaks and why

ResizableFile.write() contains two bugs that together cause an
IndexError: mmap slice assignment is wrong size crash on Linux when a
late-joining raft node receives its first large catch-up batch from the leader.

Bug 1 — if instead of while (wrong size after one resize)

# before
if offset + size > self.__mm.size():
    self.__mm.resize(int(self.__mm.size() * self.__resizeFactor))  # doubles once
self.__mm[offset:offset + size] = values  # ← IndexError if still too small

resizeFactor is 2.0, so the mmap doubles once. If the value being
written is larger than 2 × current_mmap_size, one doubling is not
enough and the assignment crashes.

Concrete example (Patroni / pysyncobj 0.3.15, Ubuntu 24.04, Python 3.12):
The journal starts at 1 KB (initialSize=1024). When a new node joins an
existing 3-node cluster it receives the full committed raft log as a
catch-up batch. That batch includes the initial DCS bootstrap config
(PostgreSQL parameters, pg_hba, etc.) which serialises to several KB.
The first write is at offset 40 with size ≈ 3–5 KB. One resize brings
the mmap to 2 KB — still too small — and the write crashes.

Bug 2 — except SystemError misses OSError on Linux / Python 3

try:
    self.__mm.resize(newSize)
except SystemError:       # ← never triggered on Linux
    self.__extand(...)    # ← fallback never runs

On Linux with Python 3, mmap.resize() calls ftruncate + mremap.
When mremap fails it raises OSError, not SystemError. The
__extand fallback that re-opens the file is therefore silently skipped
on every Linux deployment, leaving the mmap at its original size.

Fix

  1. Change ifwhile so we keep resizing until the mmap is actually
    large enough.
  2. Use max(doubled_size, offset + size) so a single iteration always
    covers arbitrarily large values without looping.
  3. Catch (SystemError, OSError) in both resize call-sites so the
    __extand fallback works on Linux.
# after
while offset + size > self.__mm.size():
    currSize = self.__mm.size()
    newSize = max(int(currSize * self.__resizeFactor), offset + size)
    try:
        self.__mm.resize(newSize)
    except (SystemError, OSError):
        self.__extand(newSize - currSize)
self.__mm[offset:offset + size] = values

How to reproduce

Start a 3-node pysyncobj/Patroni cluster simultaneously. The third node
to join (which receives the catch-up batch rather than bootstrapping from
scratch) will log:

IndexError: mmap slice assignment is wrong size
  File ".../pysyncobj/journal.py", line 104, in write
    self.__mm[offset:offset + size] = values

and then loop indefinitely unable to reach the leader, while the other
two nodes operate normally.

Observed with: pysyncobj 0.3.15, Patroni 4.1.3, Python 3.12, Ubuntu 24.04 (Linux 6.x, KVM).

Two bugs in ResizableFile:

1. write() used `if` instead of `while` for the resize guard.
   A single doubling of the mmap may still not be large enough if the
   value being written is bigger than 2× the current mmap size.  The
   assignment then fails with:
     IndexError: mmap slice assignment is wrong size
   The fix loops until the mmap is actually large enough, and also uses
   max(doubled_size, offset+size) so one iteration always suffices even
   for arbitrarily large values.

2. Both resize call-sites caught only SystemError as the fallback trigger
   for __extand().  On Linux with Python 3, mmap.resize() raises OSError
   (not SystemError) when mremap() fails.  The fallback was therefore
   never reached on Linux, leaving the mmap at its original size and
   causing the write to crash.  Fix catches both exception types.

Observed in the wild with Patroni 4.1.3 / pysyncobj 0.3.15 on Ubuntu
24.04 (Python 3.12).  When a late-joining raft node receives its first
catch-up batch from the leader the batch can be several kilobytes; with
a 1 KB initial mmap a single resize to 2 KB is still not enough, hitting
bug 1.  Bug 2 masks the __extand() fallback on every Linux deployment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant