Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,10 @@ All notable user-visible changes should be recorded here.

### Added

- None yet.
- Added sanitized golden `report.md` / `report.json` regression fixtures to lock report contracts.
- Expanded parser coverage for `Accepted publickey` and selected `pam_faillock` / `pam_sss` variants.
- Added compact host-level summaries for multi-host reports.
- Added optional CSV export for findings and warnings when explicitly requested.

### Changed

Expand Down
5 changes: 5 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ target_include_directories(loglens_lib

add_executable(loglens src/main.cpp)
target_link_libraries(loglens PRIVATE loglens_lib)
target_compile_definitions(loglens PRIVATE LOGLENS_VERSION="${PROJECT_VERSION}")

include(CTest)
if(BUILD_TESTING)
Expand All @@ -32,6 +33,10 @@ if(BUILD_TESTING)
target_link_libraries(test_detector PRIVATE loglens_lib)
add_test(NAME detector COMMAND test_detector)

add_executable(test_report tests/test_report.cpp)
target_link_libraries(test_report PRIVATE loglens_lib)
add_test(NAME report COMMAND test_report)

add_executable(test_cli tests/test_cli.cpp)
target_link_libraries(test_cli PRIVATE loglens_lib)
add_test(
Expand Down
45 changes: 34 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ It parses `auth.log` / `secure`-style syslog input and `journalctl --output=shor

LogLens is an MVP / early release. The repository is stable enough for public review, local experimentation, and extension, but the parser and detection coverage are intentionally narrow.

Reviewing the project quickly? Start with [`docs/reviewer-path.md`](./docs/reviewer-path.md) and [`docs/reviewer-brief.md`](./docs/reviewer-brief.md).

## Why This Project Exists

Many small security tools can detect a handful of known log patterns. Fewer tools make their parsing limits visible.
Expand Down Expand Up @@ -58,23 +60,31 @@ LogLens currently detects:
- One IP trying multiple usernames within 15 minutes
- Bursty sudo activity from the same user within 5 minutes

LogLens currently parses and reports these additional auth patterns beyond the core detector inputs, broadening coverage across common Linux auth families:
LogLens currently parses and reports these additional auth patterns beyond the core detector inputs:

- `Accepted publickey` SSH successes
- `Accepted keyboard-interactive/pam` SSH successes
- `Failed publickey` SSH failures, which count toward SSH brute-force detection by default
- `Failed keyboard-interactive/pam` and `maximum authentication attempts exceeded` SSH failures, which count toward SSH brute-force detection by default
- `sudo` command, password-failure, and sudoers policy-denial audit lines
- `su` success and failure audit lines
- `pam_unix(...:auth): authentication failure`
- `pam_unix(...:session): session opened`
- selected `pam_faillock(...:auth)` failure variants
- selected `pam_sss(...:auth)` failure variants

LogLens also tracks parser coverage telemetry for unsupported or malformed lines, including:

- `total_input_lines`
- `total_lines`
- `skipped_blank_lines`
- `parsed_lines`
- `unparsed_lines`
- `parse_success_rate`
- `top_unknown_patterns`

For the parser behavior contract, supported modes, and fixture map, see [`docs/parser-contract.md`](./docs/parser-contract.md).

LogLens does not currently detect:

- Lateral movement
Expand All @@ -96,9 +106,11 @@ For fresh-machine setup and repeatable local presets, see [`docs/dev-setup.md`](
## Run

```bash
./build/loglens --help
./build/loglens --version
./build/loglens --mode syslog --year 2026 ./assets/sample_auth.log ./out
./build/loglens --mode journalctl-short-full ./assets/sample_journalctl_short_full.log ./out-journal
./build/loglens --config ./assets/sample_config.json ./assets/sample_auth.log ./out-config
./build/loglens --mode journalctl ./assets/sample_journalctl_short_full.log ./out-journal
./build/loglens --config=./assets/sample_config.json ./assets/sample_auth.log ./out-config
./build/loglens --mode syslog --year 2026 --csv ./assets/sample_auth.log ./out-csv
```

Expand All @@ -114,14 +126,16 @@ When you add `--csv`, LogLens also writes:
- `findings.csv`
- `warnings.csv`

Without `--csv`, LogLens does not create, overwrite, or delete any existing CSV files in the output directory.

The CSV schema is intentionally small and stable:

- `findings.csv`: `rule`, `subject_kind`, `subject`, `event_count`, `window_start`, `window_end`, `usernames`, `summary`
- `warnings.csv`: `kind`, `message`
- `warnings.csv`: `kind`, `line_number`, `message`

Without `--csv`, LogLens does not create, overwrite, or delete any existing CSV files in the output directory.

When an input spans multiple hostnames, both reports add compact host-level summaries without changing detector thresholds or introducing cross-host correlation logic. In `report.md` this appears as a host summary table, and in `report.json` it appears as a `host_summaries` array.
Formula-like CSV text fields are neutralized with a leading single quote so spreadsheet tools treat them as text.
When an input spans multiple hostnames, both reports add compact host-level summaries without changing detector thresholds or introducing cross-host correlation logic.
Markdown table fields escape table separators, line breaks, and HTML-sensitive characters so unusual log tokens cannot break report layout.

## Sample Output

Expand Down Expand Up @@ -172,6 +186,14 @@ The config file schema is intentionally small and strict:
"counts_as_attempt_evidence": true,
"counts_as_terminal_auth_failure": true
},
"ssh_failed_keyboard_interactive": {
"counts_as_attempt_evidence": true,
"counts_as_terminal_auth_failure": true
},
"ssh_max_auth_tries": {
"counts_as_attempt_evidence": true,
"counts_as_terminal_auth_failure": true
},
"pam_auth_failure": {
"counts_as_attempt_evidence": true,
"counts_as_terminal_auth_failure": false
Expand All @@ -180,12 +202,13 @@ The config file schema is intentionally small and strict:
}
```

This mapping lets LogLens normalize parsed events into detection signals before applying brute-force or multi-user rules. By default, `pam_auth_failure` is treated as lower-confidence attempt evidence and does not count as a terminal authentication failure unless the config explicitly upgrades it.
This mapping lets LogLens normalize parsed events into detection signals before applying brute-force or multi-user rules. By default, `pam_auth_failure` is treated as lower-confidence attempt evidence and does not count as a terminal authentication failure unless the config explicitly upgrades it. The `ssh_failed_keyboard_interactive` and `ssh_max_auth_tries` mapping keys are optional in older configs and default to terminal failure evidence.

Timestamp handling is now explicit:

- `--mode syslog` or `input_mode: syslog_legacy` requires `--year` or `timestamp.assume_year`
- `--mode journalctl-short-full` or `input_mode: journalctl_short_full` parses the embedded year and timezone and ignores `assume_year`
- `--mode syslog`, `--mode syslog-legacy`, or `input_mode: syslog_legacy` requires `--year` or `timestamp.assume_year`
- `--year` and `timestamp.assume_year` must use a four-digit year, for example `2026`
- `--mode journalctl`, `--mode journalctl-short-full`, or `input_mode: journalctl_short_full` parses the embedded year and timezone and ignores `assume_year`

## Example Input

Expand Down Expand Up @@ -213,7 +236,7 @@ Tue 2026-03-10 08:31:18 UTC example-host sshd[2245]: Connection closed by authen

- `syslog_legacy` requires an explicit year; LogLens does not guess one implicitly.
- `journalctl_short_full` currently supports `UTC`, `GMT`, `Z`, and numeric timezone offsets, not arbitrary timezone abbreviations.
- Parser coverage is still selective: it covers common `sshd`, `sudo`, `pam_unix`, and selected `pam_faillock` / `pam_sss` variants rather than broad Linux auth-family support.
- Parser coverage is still selective: it covers common `sshd`, `sudo`, `su`, `pam_unix`, and selected `pam_faillock` / `pam_sss` variants rather than broad Linux auth-family support.
- Unsupported lines are surfaced as parser telemetry and warnings, not as detector findings.
- `pam_unix` auth failures remain lower-confidence by default unless signal mappings explicitly upgrade them.
- Detector configuration uses a fixed `config.json` schema rather than partial overrides or alternate config formats.
Expand Down
7 changes: 7 additions & 0 deletions assets/parser_fixture_matrix_journalctl_short_full.log
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,13 @@ Tue 2026-03-10 09:02:30 UTC example-host pam_unix(sudo:session): session opened
Tue 2026-03-10 09:03:05 UTC example-host pam_unix(su-l:session): session opened for user root by bob(uid=1001)
Tue 2026-03-10 09:03:28 UTC example-host sshd[3008]: Accepted password for alice from 203.0.113.41 port 52003 ssh2
Tue 2026-03-10 09:03:34 UTC example-host sshd[3009]: Accepted publickey for carol from 203.0.113.42 port 52004 ssh2: ED25519 SHA256:SANITIZEDKEY2
Tue 2026-03-10 09:03:35 UTC example-host sshd[3012]: Accepted keyboard-interactive/pam for dave from 203.0.113.43 port 52005 ssh2
Tue 2026-03-10 09:03:36 UTC example-host sudo[3013]: alice : 1 incorrect password attempt ; TTY=pts/0 ; PWD=/home/alice ; USER=root ; COMMAND=/usr/bin/systemctl status ssh
Tue 2026-03-10 09:03:37 UTC example-host sudo[3014]: bob : user NOT in sudoers ; TTY=pts/1 ; PWD=/home/bob ; USER=root ; COMMAND=/usr/bin/id
Tue 2026-03-10 09:03:38 UTC example-host su[3015]: FAILED SU (to root) carol on pts/1
Tue 2026-03-10 09:03:39 UTC example-host su[3016]: Successful su for root by dave
Tue 2026-03-10 09:03:39 UTC example-host sshd[3017]: Failed keyboard-interactive/pam for eve from 203.0.113.44 port 52006 ssh2
Tue 2026-03-10 09:03:39 UTC example-host sshd[3018]: maximum authentication attempts exceeded for frank from 203.0.113.45 port 52007 ssh2 [preauth]
Tue 2026-03-10 09:03:40 UTC example-host sshd[3003]: Connection closed by user alice 203.0.113.50 port 52010 [preauth]
Tue 2026-03-10 09:04:05 UTC example-host sshd[3004]: Connection closed by authenticating user carol 203.0.113.51 port 52011 [preauth]
Tue 2026-03-10 09:04:28 UTC example-host sshd[3005]: Connection closed by invalid user deploy 203.0.113.52 port 52012 [preauth]
Expand Down
7 changes: 7 additions & 0 deletions assets/parser_fixture_matrix_syslog.log
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,13 @@ Mar 10 09:02:30 example-host pam_unix(sudo:session): session opened for user roo
Mar 10 09:03:05 example-host pam_unix(su-l:session): session opened for user root by bob(uid=1001)
Mar 10 09:03:28 example-host sshd[2008]: Accepted password for alice from 203.0.113.41 port 52003 ssh2
Mar 10 09:03:34 example-host sshd[2009]: Accepted publickey for carol from 203.0.113.42 port 52004 ssh2: ED25519 SHA256:SANITIZEDKEY2
Mar 10 09:03:35 example-host sshd[2012]: Accepted keyboard-interactive/pam for dave from 203.0.113.43 port 52005 ssh2
Mar 10 09:03:36 example-host sudo[2013]: alice : 1 incorrect password attempt ; TTY=pts/0 ; PWD=/home/alice ; USER=root ; COMMAND=/usr/bin/systemctl status ssh
Mar 10 09:03:37 example-host sudo[2014]: bob : user NOT in sudoers ; TTY=pts/1 ; PWD=/home/bob ; USER=root ; COMMAND=/usr/bin/id
Mar 10 09:03:38 example-host su[2015]: FAILED SU (to root) carol on pts/1
Mar 10 09:03:39 example-host su[2016]: Successful su for root by dave
Mar 10 09:03:39 example-host sshd[2017]: Failed keyboard-interactive/pam for eve from 203.0.113.44 port 52006 ssh2
Mar 10 09:03:39 example-host sshd[2018]: maximum authentication attempts exceeded for frank from 203.0.113.45 port 52007 ssh2 [preauth]
Mar 10 09:03:40 example-host sshd[2003]: Connection closed by user alice 203.0.113.50 port 52010 [preauth]
Mar 10 09:04:05 example-host sshd[2004]: Connection closed by authenticating user carol 203.0.113.51 port 52011 [preauth]
Mar 10 09:04:28 example-host sshd[2005]: Connection closed by invalid user deploy 203.0.113.52 port 52012 [preauth]
Expand Down
8 changes: 8 additions & 0 deletions assets/sample_config.json
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,14 @@
"counts_as_attempt_evidence": true,
"counts_as_terminal_auth_failure": true
},
"ssh_failed_keyboard_interactive": {
"counts_as_attempt_evidence": true,
"counts_as_terminal_auth_failure": true
},
"ssh_max_auth_tries": {
"counts_as_attempt_evidence": true,
"counts_as_terminal_auth_failure": true
},
"pam_auth_failure": {
"counts_as_attempt_evidence": true,
"counts_as_terminal_auth_failure": false
Expand Down
88 changes: 88 additions & 0 deletions docs/parser-contract.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Parser contract

LogLens treats parser behavior as reviewable output, not as a hidden implementation detail. A line is either recognized as a typed event, skipped as blank input, or surfaced as a warning with coverage telemetry.

The guiding rule is:

> Parser observability > silent detection claims.

## Supported input modes

| Mode | Typical source | Timestamp behavior | Review anchor |
| --- | --- | --- | --- |
| `syslog_legacy` | `auth.log` / `secure` style lines such as `Mar 10 08:11:22 example-host sshd[1234]: ...` | Requires an explicit four-digit year from `--year` or `timestamp.assume_year` | [`assets/parser_fixture_matrix_syslog.log`](../assets/parser_fixture_matrix_syslog.log) |
| `journalctl_short_full` | `journalctl --output=short-full` style lines such as `Tue 2026-03-10 08:11:22 UTC example-host sshd[1234]: ...` | Uses the embedded year and supported timezone token | [`assets/parser_fixture_matrix_journalctl_short_full.log`](../assets/parser_fixture_matrix_journalctl_short_full.log) |

Supported timezone tokens for `journalctl_short_full` are intentionally narrow: `UTC`, `GMT`, `Z`, and numeric offsets such as `+0000` or `+00:00`.

## Recognized event families

The parser currently recognizes common authentication evidence from:

- `sshd`
- `sudo`
- `su`
- `pam_unix(...)`
- selected `pam_faillock(...)` variants
- selected `pam_sss(...)` variants

Recognized SSH failure families include failed password, invalid user, failed publickey, failed keyboard-interactive/pam, and maximum-authentication-attempts-exceeded lines. These are normalized into event types and can become detection signals.

Recognized success or audit families include accepted password, accepted publickey, accepted keyboard-interactive/pam, sudo command audit lines, sudo password failures, sudoers policy denials, su success/failure audit lines, and selected PAM session/auth lines.

## Line handling contract

| Input line outcome | Parser behavior | Report behavior |
| --- | --- | --- |
| Recognized auth line | Emits a typed `Event` with timestamp, hostname, program, optional pid, message, source IP, username, event type, and line number | Can contribute to summaries, reports, and configured detection signals |
| Blank line | Skips the line and increments `skipped_blank_lines` | Does not become a warning or parsed event |
| Malformed header | Emits a parser warning with the original line number and structural reason | Counts toward `unparsed_lines` and `top_unknown_patterns` |
| Well-formed but unsupported auth pattern | Emits a parser warning with an unknown-pattern bucket | Stays visible as telemetry instead of being silently ignored |

This is the main trust boundary: unsupported input should remain inspectable, even when it does not produce a finding.

## Detection signal boundary

Parsing a line does not automatically mean it should drive a detector. LogLens keeps that boundary explicit through `AuthSignalConfig`.

Default terminal SSH failure evidence:

- `ssh_failed_password`
- `ssh_invalid_user`
- `ssh_failed_publickey`
- `ssh_failed_keyboard_interactive`
- `ssh_max_auth_tries`

Default lower-confidence attempt evidence:

- `pam_auth_failure`, which is attempt evidence but not terminal failure evidence unless configured otherwise

Default sudo burst evidence:

- `sudo_command`

Parsed successes and audit-only events remain reportable but do not count as brute-force or multi-user failure evidence by default.

## Test corpus map

| Artifact | What it proves |
| --- | --- |
| [`tests/test_parser.cpp`](../tests/test_parser.cpp) | Unit-level parser expectations, malformed-line behavior, mode aliases, fixture-matrix counts, and unknown-pattern buckets |
| [`tests/test_detector.cpp`](../tests/test_detector.cpp) | Detection signal mapping and default counting behavior after parsing |
| [`assets/parser_fixture_matrix_syslog.log`](../assets/parser_fixture_matrix_syslog.log) | Syslog known/unknown parser matrix |
| [`assets/parser_fixture_matrix_journalctl_short_full.log`](../assets/parser_fixture_matrix_journalctl_short_full.log) | Journalctl short-full known/unknown parser matrix |
| [`assets/parser_auth_families_syslog.log`](../assets/parser_auth_families_syslog.log) | Syslog PAM/auth-family parser coverage |
| [`assets/parser_auth_families_journalctl_short_full.log`](../assets/parser_auth_families_journalctl_short_full.log) | Journalctl PAM/auth-family parser coverage |
| [`tests/test_report_contracts.cpp`](../tests/test_report_contracts.cpp) | Stable report-shape expectations for generated artifacts |

## Non-goals

The parser does not try to:

- infer missing syslog years
- support every Linux authentication log variant
- classify unsupported lines as findings
- correlate across files or hosts
- produce incident verdicts

Those boundaries are intentional for the MVP. The current priority is to keep parser coverage explicit and safely extensible.
85 changes: 85 additions & 0 deletions docs/reviewer-path.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# Reviewer Path

This path is for reviewers who want to understand LogLens quickly without reading the whole repository first.

## 30-second orientation

Read:

- [`README.md`](../README.md)
- [`docs/reviewer-brief.md`](./reviewer-brief.md)

Confirm:

- LogLens is an offline C++20 CLI for Linux authentication log analysis.
- It parses `auth.log` / `secure` style syslog input and `journalctl --output=short-full` style input.
- It emits deterministic Markdown, JSON, and optional CSV reports.
- Parser coverage telemetry is part of the output, not an internal-only detail.

Core review lens:

> Parser observability > silent detection claims.

## 5-minute artifact review

Inspect:

- [`assets/sample_auth.log`](../assets/sample_auth.log)
- [`assets/sample_journalctl_short_full.log`](../assets/sample_journalctl_short_full.log)
- [`tests/fixtures/report_contracts/syslog_legacy/report.md`](../tests/fixtures/report_contracts/syslog_legacy/report.md)
- [`tests/fixtures/report_contracts/syslog_legacy/report.json`](../tests/fixtures/report_contracts/syslog_legacy/report.json)
- [`docs/parser-contract.md`](./parser-contract.md)

Look for parser coverage fields:

- `total_input_lines`
- `total_lines`
- `skipped_blank_lines`
- `parsed_lines`
- `unparsed_lines`
- `parse_success_rate`
- `top_unknown_patterns`

Good stopping point: the reviewer can explain what LogLens parses, what it reports, and how unsupported lines remain visible.

## 15-minute local check

Run:

```bash
cmake -S . -B build
cmake --build build
ctest --test-dir build --output-on-failure
./build/loglens --mode syslog --year 2026 ./assets/sample_auth.log ./out
```

Then inspect:

- `out/report.md`
- `out/report.json`

Optional CSV check:

```bash
./build/loglens --mode syslog --year 2026 --csv ./assets/sample_auth.log ./out-csv
```

Then inspect:

- `out-csv/findings.csv`
- `out-csv/warnings.csv`

Good stopping point: the reviewer can build, test, run a sample, and compare generated artifacts with the report-contract fixtures.

## Boundaries

LogLens is intentionally narrow:

- no live collection
- no credential attack automation
- no exploitation, persistence, or offensive workflow support
- no SIEM replacement
- no cross-host correlation engine
- no incident verdict or attribution claim

Findings are rule-based triage aids. The parser boundary is the main trust boundary: recognized lines become typed events, unsupported lines become warnings and telemetry, and malformed input should fail gracefully.
Loading
Loading