stacknil · stacknil · May 23, 2026 · May 21, 2026 · May 21, 2026 · May 21, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,7 +6,10 @@ All notable user-visible changes should be recorded here.
 
 ### Added
 
-- None yet.
+- Added sanitized golden `report.md` / `report.json` regression fixtures to lock report contracts.
+- Expanded parser coverage for `Accepted publickey` and selected `pam_faillock` / `pam_sss` variants.
+- Added compact host-level summaries for multi-host reports.
+- Added optional CSV export for findings and warnings when explicitly requested.
 
 ### Changed
 

diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -21,6 +21,7 @@ target_include_directories(loglens_lib
 
 add_executable(loglens src/main.cpp)
 target_link_libraries(loglens PRIVATE loglens_lib)
+target_compile_definitions(loglens PRIVATE LOGLENS_VERSION="${PROJECT_VERSION}")
 
 include(CTest)
 if(BUILD_TESTING)
@@ -32,6 +33,10 @@ if(BUILD_TESTING)
     target_link_libraries(test_detector PRIVATE loglens_lib)
     add_test(NAME detector COMMAND test_detector)
 
+    add_executable(test_report tests/test_report.cpp)
+    target_link_libraries(test_report PRIVATE loglens_lib)
+    add_test(NAME report COMMAND test_report)
+
     add_executable(test_cli tests/test_cli.cpp)
     target_link_libraries(test_cli PRIVATE loglens_lib)
     add_test(

diff --git a/README.md b/README.md
@@ -11,6 +11,8 @@ It parses `auth.log` / `secure`-style syslog input and `journalctl --output=shor
 
 LogLens is an MVP / early release. The repository is stable enough for public review, local experimentation, and extension, but the parser and detection coverage are intentionally narrow.
 
+Reviewing the project quickly? Start with [`docs/reviewer-path.md`](./docs/reviewer-path.md) and [`docs/reviewer-brief.md`](./docs/reviewer-brief.md).
+
 ## Why This Project Exists
 
 Many small security tools can detect a handful of known log patterns. Fewer tools make their parsing limits visible.
@@ -58,23 +60,31 @@ LogLens currently detects:
 - One IP trying multiple usernames within 15 minutes
 - Bursty sudo activity from the same user within 5 minutes
 
-LogLens currently parses and reports these additional auth patterns beyond the core detector inputs, broadening coverage across common Linux auth families:
+LogLens currently parses and reports these additional auth patterns beyond the core detector inputs:
 
 - `Accepted publickey` SSH successes
+- `Accepted keyboard-interactive/pam` SSH successes
 - `Failed publickey` SSH failures, which count toward SSH brute-force detection by default
+- `Failed keyboard-interactive/pam` and `maximum authentication attempts exceeded` SSH failures, which count toward SSH brute-force detection by default
+- `sudo` command, password-failure, and sudoers policy-denial audit lines
+- `su` success and failure audit lines
 - `pam_unix(...:auth): authentication failure`
 - `pam_unix(...:session): session opened`
 - selected `pam_faillock(...:auth)` failure variants
 - selected `pam_sss(...:auth)` failure variants
 
 LogLens also tracks parser coverage telemetry for unsupported or malformed lines, including:
 
+- `total_input_lines`
 - `total_lines`
+- `skipped_blank_lines`
 - `parsed_lines`
 - `unparsed_lines`
 - `parse_success_rate`
 - `top_unknown_patterns`
 
+For the parser behavior contract, supported modes, and fixture map, see [`docs/parser-contract.md`](./docs/parser-contract.md).
+
 LogLens does not currently detect:
 
 - Lateral movement
@@ -96,9 +106,11 @@ For fresh-machine setup and repeatable local presets, see [`docs/dev-setup.md`](
 ## Run
 
 ```bash
+./build/loglens --help
+./build/loglens --version
 ./build/loglens --mode syslog --year 2026 ./assets/sample_auth.log ./out
-./build/loglens --mode journalctl-short-full ./assets/sample_journalctl_short_full.log ./out-journal
-./build/loglens --config ./assets/sample_config.json ./assets/sample_auth.log ./out-config
+./build/loglens --mode journalctl ./assets/sample_journalctl_short_full.log ./out-journal
+./build/loglens --config=./assets/sample_config.json ./assets/sample_auth.log ./out-config
 ./build/loglens --mode syslog --year 2026 --csv ./assets/sample_auth.log ./out-csv
 ```
 
@@ -114,14 +126,16 @@ When you add `--csv`, LogLens also writes:
 - `findings.csv`
 - `warnings.csv`
 
-Without `--csv`, LogLens does not create, overwrite, or delete any existing CSV files in the output directory.
-
 The CSV schema is intentionally small and stable:
 
 - `findings.csv`: `rule`, `subject_kind`, `subject`, `event_count`, `window_start`, `window_end`, `usernames`, `summary`
-- `warnings.csv`: `kind`, `message`
+- `warnings.csv`: `kind`, `line_number`, `message`
+
+Without `--csv`, LogLens does not create, overwrite, or delete any existing CSV files in the output directory.
 
-When an input spans multiple hostnames, both reports add compact host-level summaries without changing detector thresholds or introducing cross-host correlation logic. In `report.md` this appears as a host summary table, and in `report.json` it appears as a `host_summaries` array.
+Formula-like CSV text fields are neutralized with a leading single quote so spreadsheet tools treat them as text.
+When an input spans multiple hostnames, both reports add compact host-level summaries without changing detector thresholds or introducing cross-host correlation logic.
+Markdown table fields escape table separators, line breaks, and HTML-sensitive characters so unusual log tokens cannot break report layout.
 
 ## Sample Output
 
@@ -172,6 +186,14 @@ The config file schema is intentionally small and strict:
       "counts_as_attempt_evidence": true,
       "counts_as_terminal_auth_failure": true
     },
+    "ssh_failed_keyboard_interactive": {
+      "counts_as_attempt_evidence": true,
+      "counts_as_terminal_auth_failure": true
+    },
+    "ssh_max_auth_tries": {
+      "counts_as_attempt_evidence": true,
+      "counts_as_terminal_auth_failure": true
+    },
     "pam_auth_failure": {
       "counts_as_attempt_evidence": true,
       "counts_as_terminal_auth_failure": false
@@ -180,12 +202,13 @@ The config file schema is intentionally small and strict:
 }
 ```
 
-This mapping lets LogLens normalize parsed events into detection signals before applying brute-force or multi-user rules. By default, `pam_auth_failure` is treated as lower-confidence attempt evidence and does not count as a terminal authentication failure unless the config explicitly upgrades it.
+This mapping lets LogLens normalize parsed events into detection signals before applying brute-force or multi-user rules. By default, `pam_auth_failure` is treated as lower-confidence attempt evidence and does not count as a terminal authentication failure unless the config explicitly upgrades it. The `ssh_failed_keyboard_interactive` and `ssh_max_auth_tries` mapping keys are optional in older configs and default to terminal failure evidence.
 
 Timestamp handling is now explicit:
 
-- `--mode syslog` or `input_mode: syslog_legacy` requires `--year` or `timestamp.assume_year`
-- `--mode journalctl-short-full` or `input_mode: journalctl_short_full` parses the embedded year and timezone and ignores `assume_year`
+- `--mode syslog`, `--mode syslog-legacy`, or `input_mode: syslog_legacy` requires `--year` or `timestamp.assume_year`
+- `--year` and `timestamp.assume_year` must use a four-digit year, for example `2026`
+- `--mode journalctl`, `--mode journalctl-short-full`, or `input_mode: journalctl_short_full` parses the embedded year and timezone and ignores `assume_year`
 
 ## Example Input
 
@@ -213,7 +236,7 @@ Tue 2026-03-10 08:31:18 UTC example-host sshd[2245]: Connection closed by authen
 
 - `syslog_legacy` requires an explicit year; LogLens does not guess one implicitly.
 - `journalctl_short_full` currently supports `UTC`, `GMT`, `Z`, and numeric timezone offsets, not arbitrary timezone abbreviations.
-- Parser coverage is still selective: it covers common `sshd`, `sudo`, `pam_unix`, and selected `pam_faillock` / `pam_sss` variants rather than broad Linux auth-family support.
+- Parser coverage is still selective: it covers common `sshd`, `sudo`, `su`, `pam_unix`, and selected `pam_faillock` / `pam_sss` variants rather than broad Linux auth-family support.
 - Unsupported lines are surfaced as parser telemetry and warnings, not as detector findings.
 - `pam_unix` auth failures remain lower-confidence by default unless signal mappings explicitly upgrade them.
 - Detector configuration uses a fixed `config.json` schema rather than partial overrides or alternate config formats.

diff --git a/assets/parser_fixture_matrix_journalctl_short_full.log b/assets/parser_fixture_matrix_journalctl_short_full.log
@@ -6,6 +6,13 @@ Tue 2026-03-10 09:02:30 UTC example-host pam_unix(sudo:session): session opened
 Tue 2026-03-10 09:03:05 UTC example-host pam_unix(su-l:session): session opened for user root by bob(uid=1001)
 Tue 2026-03-10 09:03:28 UTC example-host sshd[3008]: Accepted password for alice from 203.0.113.41 port 52003 ssh2
 Tue 2026-03-10 09:03:34 UTC example-host sshd[3009]: Accepted publickey for carol from 203.0.113.42 port 52004 ssh2: ED25519 SHA256:SANITIZEDKEY2
+Tue 2026-03-10 09:03:35 UTC example-host sshd[3012]: Accepted keyboard-interactive/pam for dave from 203.0.113.43 port 52005 ssh2
+Tue 2026-03-10 09:03:36 UTC example-host sudo[3013]:    alice : 1 incorrect password attempt ; TTY=pts/0 ; PWD=/home/alice ; USER=root ; COMMAND=/usr/bin/systemctl status ssh
+Tue 2026-03-10 09:03:37 UTC example-host sudo[3014]:    bob : user NOT in sudoers ; TTY=pts/1 ; PWD=/home/bob ; USER=root ; COMMAND=/usr/bin/id
+Tue 2026-03-10 09:03:38 UTC example-host su[3015]: FAILED SU (to root) carol on pts/1
+Tue 2026-03-10 09:03:39 UTC example-host su[3016]: Successful su for root by dave
+Tue 2026-03-10 09:03:39 UTC example-host sshd[3017]: Failed keyboard-interactive/pam for eve from 203.0.113.44 port 52006 ssh2
+Tue 2026-03-10 09:03:39 UTC example-host sshd[3018]: maximum authentication attempts exceeded for frank from 203.0.113.45 port 52007 ssh2 [preauth]
 Tue 2026-03-10 09:03:40 UTC example-host sshd[3003]: Connection closed by user alice 203.0.113.50 port 52010 [preauth]
 Tue 2026-03-10 09:04:05 UTC example-host sshd[3004]: Connection closed by authenticating user carol 203.0.113.51 port 52011 [preauth]
 Tue 2026-03-10 09:04:28 UTC example-host sshd[3005]: Connection closed by invalid user deploy 203.0.113.52 port 52012 [preauth]

diff --git a/assets/parser_fixture_matrix_syslog.log b/assets/parser_fixture_matrix_syslog.log
@@ -6,6 +6,13 @@ Mar 10 09:02:30 example-host pam_unix(sudo:session): session opened for user roo
 Mar 10 09:03:05 example-host pam_unix(su-l:session): session opened for user root by bob(uid=1001)
 Mar 10 09:03:28 example-host sshd[2008]: Accepted password for alice from 203.0.113.41 port 52003 ssh2
 Mar 10 09:03:34 example-host sshd[2009]: Accepted publickey for carol from 203.0.113.42 port 52004 ssh2: ED25519 SHA256:SANITIZEDKEY2
+Mar 10 09:03:35 example-host sshd[2012]: Accepted keyboard-interactive/pam for dave from 203.0.113.43 port 52005 ssh2
+Mar 10 09:03:36 example-host sudo[2013]:    alice : 1 incorrect password attempt ; TTY=pts/0 ; PWD=/home/alice ; USER=root ; COMMAND=/usr/bin/systemctl status ssh
+Mar 10 09:03:37 example-host sudo[2014]:    bob : user NOT in sudoers ; TTY=pts/1 ; PWD=/home/bob ; USER=root ; COMMAND=/usr/bin/id
+Mar 10 09:03:38 example-host su[2015]: FAILED SU (to root) carol on pts/1
+Mar 10 09:03:39 example-host su[2016]: Successful su for root by dave
+Mar 10 09:03:39 example-host sshd[2017]: Failed keyboard-interactive/pam for eve from 203.0.113.44 port 52006 ssh2
+Mar 10 09:03:39 example-host sshd[2018]: maximum authentication attempts exceeded for frank from 203.0.113.45 port 52007 ssh2 [preauth]
 Mar 10 09:03:40 example-host sshd[2003]: Connection closed by user alice 203.0.113.50 port 52010 [preauth]
 Mar 10 09:04:05 example-host sshd[2004]: Connection closed by authenticating user carol 203.0.113.51 port 52011 [preauth]
 Mar 10 09:04:28 example-host sshd[2005]: Connection closed by invalid user deploy 203.0.113.52 port 52012 [preauth]

diff --git a/assets/sample_config.json b/assets/sample_config.json
@@ -19,6 +19,14 @@
       "counts_as_attempt_evidence": true,
       "counts_as_terminal_auth_failure": true
     },
+    "ssh_failed_keyboard_interactive": {
+      "counts_as_attempt_evidence": true,
+      "counts_as_terminal_auth_failure": true
+    },
+    "ssh_max_auth_tries": {
+      "counts_as_attempt_evidence": true,
+      "counts_as_terminal_auth_failure": true
+    },
     "pam_auth_failure": {
       "counts_as_attempt_evidence": true,
       "counts_as_terminal_auth_failure": false

diff --git a/docs/parser-contract.md b/docs/parser-contract.md
@@ -0,0 +1,88 @@
+# Parser contract
+
+LogLens treats parser behavior as reviewable output, not as a hidden implementation detail. A line is either recognized as a typed event, skipped as blank input, or surfaced as a warning with coverage telemetry.
+
+The guiding rule is:
+
+> Parser observability > silent detection claims.
+
+## Supported input modes
+
+| Mode | Typical source | Timestamp behavior | Review anchor |
+| --- | --- | --- | --- |
+| `syslog_legacy` | `auth.log` / `secure` style lines such as `Mar 10 08:11:22 example-host sshd[1234]: ...` | Requires an explicit four-digit year from `--year` or `timestamp.assume_year` | [`assets/parser_fixture_matrix_syslog.log`](../assets/parser_fixture_matrix_syslog.log) |
+| `journalctl_short_full` | `journalctl --output=short-full` style lines such as `Tue 2026-03-10 08:11:22 UTC example-host sshd[1234]: ...` | Uses the embedded year and supported timezone token | [`assets/parser_fixture_matrix_journalctl_short_full.log`](../assets/parser_fixture_matrix_journalctl_short_full.log) |
+
+Supported timezone tokens for `journalctl_short_full` are intentionally narrow: `UTC`, `GMT`, `Z`, and numeric offsets such as `+0000` or `+00:00`.
+
+## Recognized event families
+
+The parser currently recognizes common authentication evidence from:
+
+- `sshd`
+- `sudo`
+- `su`
+- `pam_unix(...)`
+- selected `pam_faillock(...)` variants
+- selected `pam_sss(...)` variants
+
+Recognized SSH failure families include failed password, invalid user, failed publickey, failed keyboard-interactive/pam, and maximum-authentication-attempts-exceeded lines. These are normalized into event types and can become detection signals.
+
+Recognized success or audit families include accepted password, accepted publickey, accepted keyboard-interactive/pam, sudo command audit lines, sudo password failures, sudoers policy denials, su success/failure audit lines, and selected PAM session/auth lines.
+
+## Line handling contract
+
+| Input line outcome | Parser behavior | Report behavior |
+| --- | --- | --- |
+| Recognized auth line | Emits a typed `Event` with timestamp, hostname, program, optional pid, message, source IP, username, event type, and line number | Can contribute to summaries, reports, and configured detection signals |
+| Blank line | Skips the line and increments `skipped_blank_lines` | Does not become a warning or parsed event |
+| Malformed header | Emits a parser warning with the original line number and structural reason | Counts toward `unparsed_lines` and `top_unknown_patterns` |
+| Well-formed but unsupported auth pattern | Emits a parser warning with an unknown-pattern bucket | Stays visible as telemetry instead of being silently ignored |
+
+This is the main trust boundary: unsupported input should remain inspectable, even when it does not produce a finding.
+
+## Detection signal boundary
+
+Parsing a line does not automatically mean it should drive a detector. LogLens keeps that boundary explicit through `AuthSignalConfig`.
+
+Default terminal SSH failure evidence:
+
+- `ssh_failed_password`
+- `ssh_invalid_user`
+- `ssh_failed_publickey`
+- `ssh_failed_keyboard_interactive`
+- `ssh_max_auth_tries`
+
+Default lower-confidence attempt evidence:
+
+- `pam_auth_failure`, which is attempt evidence but not terminal failure evidence unless configured otherwise
+
+Default sudo burst evidence:
+
+- `sudo_command`
+
+Parsed successes and audit-only events remain reportable but do not count as brute-force or multi-user failure evidence by default.
+
+## Test corpus map
+
+| Artifact | What it proves |
+| --- | --- |
+| [`tests/test_parser.cpp`](../tests/test_parser.cpp) | Unit-level parser expectations, malformed-line behavior, mode aliases, fixture-matrix counts, and unknown-pattern buckets |
+| [`tests/test_detector.cpp`](../tests/test_detector.cpp) | Detection signal mapping and default counting behavior after parsing |
+| [`assets/parser_fixture_matrix_syslog.log`](../assets/parser_fixture_matrix_syslog.log) | Syslog known/unknown parser matrix |
+| [`assets/parser_fixture_matrix_journalctl_short_full.log`](../assets/parser_fixture_matrix_journalctl_short_full.log) | Journalctl short-full known/unknown parser matrix |
+| [`assets/parser_auth_families_syslog.log`](../assets/parser_auth_families_syslog.log) | Syslog PAM/auth-family parser coverage |
+| [`assets/parser_auth_families_journalctl_short_full.log`](../assets/parser_auth_families_journalctl_short_full.log) | Journalctl PAM/auth-family parser coverage |
+| [`tests/test_report_contracts.cpp`](../tests/test_report_contracts.cpp) | Stable report-shape expectations for generated artifacts |
+
+## Non-goals
+
+The parser does not try to:
+
+- infer missing syslog years
+- support every Linux authentication log variant
+- classify unsupported lines as findings
+- correlate across files or hosts
+- produce incident verdicts
+
+Those boundaries are intentional for the MVP. The current priority is to keep parser coverage explicit and safely extensible.
diff --git a/docs/reviewer-path.md b/docs/reviewer-path.md
@@ -0,0 +1,85 @@
+# Reviewer Path
+
+This path is for reviewers who want to understand LogLens quickly without reading the whole repository first.
+
+## 30-second orientation
+
+Read:
+
+- [`README.md`](../README.md)
+- [`docs/reviewer-brief.md`](./reviewer-brief.md)
+
+Confirm:
+
+- LogLens is an offline C++20 CLI for Linux authentication log analysis.
+- It parses `auth.log` / `secure` style syslog input and `journalctl --output=short-full` style input.
+- It emits deterministic Markdown, JSON, and optional CSV reports.
+- Parser coverage telemetry is part of the output, not an internal-only detail.
+
+Core review lens:
+
+> Parser observability > silent detection claims.
+
+## 5-minute artifact review
+
+Inspect:
+
+- [`assets/sample_auth.log`](../assets/sample_auth.log)
+- [`assets/sample_journalctl_short_full.log`](../assets/sample_journalctl_short_full.log)
+- [`tests/fixtures/report_contracts/syslog_legacy/report.md`](../tests/fixtures/report_contracts/syslog_legacy/report.md)
+- [`tests/fixtures/report_contracts/syslog_legacy/report.json`](../tests/fixtures/report_contracts/syslog_legacy/report.json)
+- [`docs/parser-contract.md`](./parser-contract.md)
+
+Look for parser coverage fields:
+
+- `total_input_lines`
+- `total_lines`
+- `skipped_blank_lines`
+- `parsed_lines`
+- `unparsed_lines`
+- `parse_success_rate`
+- `top_unknown_patterns`
+
+Good stopping point: the reviewer can explain what LogLens parses, what it reports, and how unsupported lines remain visible.
+
+## 15-minute local check
+
+Run:
+
+```bash
+cmake -S . -B build
+cmake --build build
+ctest --test-dir build --output-on-failure
+./build/loglens --mode syslog --year 2026 ./assets/sample_auth.log ./out
+```
+
+Then inspect:
+
+- `out/report.md`
+- `out/report.json`
+
+Optional CSV check:
+
+```bash
+./build/loglens --mode syslog --year 2026 --csv ./assets/sample_auth.log ./out-csv
+```
+
+Then inspect:
+
+- `out-csv/findings.csv`
+- `out-csv/warnings.csv`
+
+Good stopping point: the reviewer can build, test, run a sample, and compare generated artifacts with the report-contract fixtures.
+
+## Boundaries
+
+LogLens is intentionally narrow:
+
+- no live collection
+- no credential attack automation
+- no exploitation, persistence, or offensive workflow support
+- no SIEM replacement
+- no cross-host correlation engine
+- no incident verdict or attribution claim
+
+Findings are rule-based triage aids. The parser boundary is the main trust boundary: recognized lines become typed events, unsupported lines become warnings and telemetry, and malformed input should fail gracefully.