Skip to content

decode non-utf8 sql bytes as latin-1 not unicode-escape#852

Open
alhudz wants to merge 1 commit into
andialbrecht:masterfrom
alhudz:lexer-latin1-fallback
Open

decode non-utf8 sql bytes as latin-1 not unicode-escape#852
alhudz wants to merge 1 commit into
andialbrecht:masterfrom
alhudz:lexer-latin1-fallback

Conversation

@alhudz

@alhudz alhudz commented Jun 9, 2026

Copy link
Copy Markdown

Repro: sqlparse.parse(b"SELECT '\x41', '\n' \xff") where the input is bytes that aren't valid UTF-8 (a single stray \xff is enough to take the fallback branch).
Cause: the non-UTF-8 fallback in Lexer.get_tokens decodes via unicode-escape, which evaluates backslash escape sequences in the SQL bytes (\x41 becomes A, \n becomes a newline, plus octal and \u…). The parsed token stream then no longer matches the raw bytes the database receives, so anything inspecting or sanitising the SQL bytes sees a different statement from the one that runs.
Fix: decode the fallback as latin-1, which maps all 256 byte values one-to-one without evaluating escapes. For input without backslashes the result is byte-identical to today; only the escape reinterpretation is dropped.

  • ran the tests (pytest)
  • all style issues addressed (ruff)
  • your changes are covered by tests
  • your changes are documented, if needed

@codecov

codecov Bot commented Jun 9, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.27%. Comparing base (7334ac9) to head (0262a1a).
⚠️ Report is 102 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #852      +/-   ##
==========================================
+ Coverage   97.04%   97.27%   +0.22%     
==========================================
  Files          20       31      +11     
  Lines        1558     3664    +2106     
  Branches        0      328     +328     
==========================================
+ Hits         1512     3564    +2052     
- Misses         46       59      +13     
- Partials        0       41      +41     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant