Add CycloneDX SBOM generation for Ballerina packages#44574
Conversation
📝 WalkthroughWalkthroughThe changes introduce SBOM (Software Bill of Materials) generation for Bala archives. A new Changes
Sequence Diagram(s)sequenceDiagram
participant BalaWriter
participant SbomGenerator
participant ManifestFile as Manifest<br/>(Ballerina.toml/<br/>dependencies.toml)
participant FileSystem
participant BalaArchive
BalaWriter->>SbomGenerator: generateBom(manifestPath, outputPath)
activate SbomGenerator
SbomGenerator->>ManifestFile: read & parse TOML
activate ManifestFile
ManifestFile-->>SbomGenerator: project metadata & dependencies
deactivate ManifestFile
SbomGenerator->>SbomGenerator: extract packages & libraries
SbomGenerator->>SbomGenerator: build PURL mappings<br/>(Ballerina & Maven)
SbomGenerator->>SbomGenerator: resolve versionless PURLs
SbomGenerator->>SbomGenerator: merge & construct<br/>CycloneDX JSON
SbomGenerator->>FileSystem: write bom.cdx.json
FileSystem-->>SbomGenerator: ✓ file written
deactivate SbomGenerator
BalaWriter->>FileSystem: locate bom.cdx.json
FileSystem-->>BalaWriter: ✓ file path
BalaWriter->>BalaArchive: add bom.cdx.json
BalaArchive-->>BalaWriter: ✓ added to archive
BalaWriter->>FileSystem: cleanup temporary files
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 12
🧹 Nitpick comments (12)
compiler/ballerina-lang/src/main/java/io/ballerina/projects/BalaWriter.java (1)
146-146: Naming: preferaddBomoveraddBOMto match the surrounding camelCase convention.Other helpers in this class are
addBalaJson,addPackageJson,addDependenciesJson, etc.addBOMis the only all-caps acronym. Rename toaddBomfor consistency.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compiler/ballerina-lang/src/main/java/io/ballerina/projects/BalaWriter.java` at line 146, Rename the method addBOM to addBom in the BalaWriter class and update all call sites (e.g., the call currently written as addBOM(balaOutputStream)) to use addBom to follow camelCase convention; change the method declaration name from addBOM to addBom and update any internal references, Javadoc, and tests that refer to addBOM to the new name to avoid compilation errors.compiler/ballerina-lang/src/test/java/io/ballerina/projects/util/SbomGeneratorTest.java (3)
34-34: Test class should befinaland follow the project's test conventions.Minor: most other test classes in
compiler/ballerina-lang/src/test/javaarefinaland avoid wildcard imports (line 31:import java.util.*;). Aligning here keeps the pattern consistent.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compiler/ballerina-lang/src/test/java/io/ballerina/projects/util/SbomGeneratorTest.java` at line 34, Make the test class final and remove the wildcard java import: change the class declaration SbomGeneratorTest to "final class SbomGeneratorTest" and replace the "import java.util.*;" wildcard with explicit imports for only the types used in this file (e.g., List, Map, Optional, etc.) so the test follows the project's conventions and matches other tests in the directory.
95-97: Loose assertions weaken these tests.
buildPurl(...)is deterministic: whenorg="ballerina"(no dot, non-empty), the produced purl is alwayspkg:ballerina/ballerina/io@1.8.0. Thecontains(...) || contains(...)form lets a future refactor silently change the emitted format on one branch without the test noticing. Pin the exact expected purl and assert just that — and include the actual purl set in the failure message so debugging is fast:🔧 Proposed fix
- Assert.assertTrue(purls.contains("pkg:ballerina/ranvin/math_utils_v2@2.1.2")); - Assert.assertTrue(purls.contains("pkg:ballerina/ballerina/io@1.8.0") || purls.contains("pkg:ballerina/io@1.8.0"), - "Expected ballerina io component"); - Assert.assertTrue(purls.contains("pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1"), "Expected maven log4j component"); + Assert.assertTrue(purls.contains("pkg:ballerina/ranvin/math_utils_v2@2.1.2"), "purls=" + purls); + Assert.assertTrue(purls.contains("pkg:ballerina/ballerina/io@1.8.0"), + "Expected ballerina io component, purls=" + purls); + Assert.assertTrue(purls.contains("pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1"), + "Expected maven log4j component, purls=" + purls); @@ - Assert.assertTrue(depsSet.contains("pkg:ballerina/ballerina/io@1.8.0") || depsSet.contains("pkg:ballerina/io@1.8.0")); - Assert.assertTrue(depsSet.contains("pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1")); + Assert.assertTrue(depsSet.contains("pkg:ballerina/ballerina/io@1.8.0"), "depsSet=" + depsSet); + Assert.assertTrue(depsSet.contains("pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1"), + "depsSet=" + depsSet);Also applies to: 109-110
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compiler/ballerina-lang/src/test/java/io/ballerina/projects/util/SbomGeneratorTest.java` around lines 95 - 97, The assertions in SbomGeneratorTest use a loose "contains(a) || contains(b)" pattern for purls produced by buildPurl; change these to assert equality against the exact deterministic purl string (e.g., "pkg:ballerina/ballerina/io@1.8.0" and "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1") and include the actual purls set (the variable purls) in the assertion failure message so failures show what was produced; update the two occurrences around the checks that reference buildPurl/purls (the io and log4j assertions) accordingly.
34-34: Add coverage for failure / no-op paths.The two tests cover the happy paths but miss several branches that are easy to regress on:
- A
Ballerina.tomlmissing required fields (e.g., noname/version) — should surface aTomlExceptionperextractPackageMetadata.- A manifest where
[package]is absent andcomponentsByPurlends up empty — currentlygenerateBomwrites nothing and doesn't create an output file.BalaWriter.addBOMthen silently produces nobom.cdx.jsonin the bala. Both behaviors deserve a pinned test.- Verifying that the output filename is normalized to end with
.cdx.jsoneven when the caller passedbom-output.json(the tests already pass this, but never assert the filename of the file actually written — they only locate any*.cdx.json).- A duplicate-dependency case across
Ballerina.toml(Maven) anddependencies.toml(Ballerina) to lock in the de-duplication-by-purl behavior.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compiler/ballerina-lang/src/test/java/io/ballerina/projects/util/SbomGeneratorTest.java` at line 34, Add unit tests in SbomGeneratorTest that cover the failure/no-op branches: (1) a test that feeds a Ballerina.toml missing required fields and asserts extractPackageMetadata throws TomlException; (2) a test that builds a manifest with no [package] section so generateBom yields an empty componentsByPurl and verify that BalaWriter.addBOM produces no bom file (assert no output file is created or that generateBom returns an expected no-op result); (3) a test that requests an output filename like "bom-output.json" and asserts the actual file written by generateBom/BalaWriter ends with ".cdx.json" (normalize behavior); and (4) a duplicate-dependency test where the same dependency appears in Ballerina.toml (Maven style) and dependencies.toml (Ballerina style) and assert the final BOM deduplicates by purl. Locate and exercise extractPackageMetadata, generateBom, and BalaWriter.addBOM when adding these tests.compiler/ballerina-lang/src/main/java/io/ballerina/projects/util/SbomGenerator.java (7)
32-37: Class Javadoc could use@sinceand tighten the description.This class is part of a public package (
io.ballerina.projects.utilis exported inmodule-info.java), so the Javadoc is part of the API contract. Add@sinceand clarify thatgenerateBomis the sole entry point and that it writes one*.cdx.jsonfile per call. Currently the second sentence in the class Javadoc starts in the description block without a blank<p>paragraph break, which renders awkwardly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compiler/ballerina-lang/src/main/java/io/ballerina/projects/util/SbomGenerator.java` around lines 32 - 37, Update the SbomGenerator class Javadoc to include an `@since` tag and reword/format the description: add a blank paragraph break (<p>) before the second sentence, state that generateBom(...) is the sole public entry point, and explicitly document that each call to generateBom writes a single "*.cdx.json" CycloneDX BOM file; reference the class name SbomGenerator and the public method generateBom in the Javadoc so the API contract is clear.
30-30: Avoid wildcard import.
import java.util.*;masks which collection types this class actually relies on (LinkedHashMap,LinkedHashSet,UUID,Comparator, etc.) and is at odds with the rest ofio.ballerina.projects.util, which uses explicit imports. Same concern inSbomGeneratorTest.javaline 31. Replace with explicit imports.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compiler/ballerina-lang/src/main/java/io/ballerina/projects/util/SbomGenerator.java` at line 30, Replace the wildcard import in SbomGenerator.java (and the similar wildcard in SbomGeneratorTest.java) with explicit java.util imports used by the class: remove "import java.util.*;" and add the specific types referenced (for example LinkedHashMap, LinkedHashSet, UUID, Comparator, List, Map, Set, ArrayList — or whichever exact collection/utility types SbomGenerator and its test actually use) so the file shows explicit imports instead of a wildcard.
264-265: Long lines exceed typical Ballerina compiler module style (120 cols).The exception messages here run well past 120 columns on a single line, which doesn't match the surrounding style in
compiler/ballerina-lang. Consider splitting them. Same applies to lines 75, 79, 98, 115 in this file. (Static analysis may flag these once Checkstyle is run.)Also applies to: 313-314, 373-374
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compiler/ballerina-lang/src/main/java/io/ballerina/projects/util/SbomGenerator.java` around lines 264 - 265, The long single-line exception messages in SbomGenerator (e.g., the throw new IllegalArgumentException("Package metadata is incomplete...") and similar throws around the class) exceed the 120-column style; update each throw site in SbomGenerator to split the message into concatenated string literals or build the message with String.format/variables so no source line exceeds 120 columns (identify the sites by the throw new IllegalArgumentException/RuntimeException statements and the long messages currently around the class, including the ones you noted at the earlier line ranges) and ensure each split preserves the exact message content when concatenated at runtime.
78-82: CatchingExceptionis too broad and risks masking programming errors.These
catch (Exception e)blocks turn anyRuntimeException(NPE,ClassCastException, etc.) thrown by the TOML parser or by your own helpers into aTomlException, hiding the original location. Narrow to the exceptions toml4j actually throws (IllegalStateExceptionfor type mismatches, plusIOExceptionfrom the underlying read — already declared viaread(File)), or at the very least re-throwError/RuntimeExceptionwhen they're not parser errors.Note: line 78 also catches
IllegalArgumentException, which the previous catch on line 74 already handles — so the first catch is partially redundant. Coordinating the catches makes the intent clearer.Also applies to: 97-101, 113-118
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compiler/ballerina-lang/src/main/java/io/ballerina/projects/util/SbomGenerator.java` around lines 78 - 82, The catch blocks in SbomGenerator that currently use "catch (Exception e)" (and a redundant catch for IllegalArgumentException) are too broad; change them to only handle the TOML/parser exceptions you expect (e.g., IllegalStateException and IOException from toml4j or the file read), preserve the original cause by wrapping those specific exceptions into a TomlException, and rethrow unchecked Throwables (RuntimeException and Error) unchanged so you don't mask programming errors; also remove the duplicate IllegalArgumentException catch and coordinate the remaining catches in the methods of class SbomGenerator where TomlException is constructed so only parser/IO exceptions are converted.
597-597: Heuristicorg.contains(".")to decide Maven vs Ballerina is brittle.The PURL kind is being inferred from whether the org string contains a dot. This conflates two different concerns:
- A Ballerina org that happens to contain a
.(the validator may evolve over time) is silently mis-categorized as Maven.- A Maven groupId without a dot (rare but valid —
junit:junit:4.xhistorical examples) is mis-categorized as Ballerina.Pass the kind explicitly. The two callers already know which kind they're emitting:
extractMavenDependencies→ always Maven.extractPackageMetadata/extractBallerinaDependencies→ always Ballerina.🔧 Sketch
private static String buildBallerinaPurl(String org, String name, String version) { ... } private static String buildMavenPurl(String group, String artifactId, String version) { ... }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compiler/ballerina-lang/src/main/java/io/ballerina/projects/util/SbomGenerator.java` at line 597, The heuristic using "org.contains(\".\")" to set isMavenDependency in SbomGenerator is brittle; remove that check and change the PURL construction to two explicit builders (e.g., buildBallerinaPurl(String org, String name, String version) and buildMavenPurl(String group, String artifactId, String version)) or add an explicit kind parameter to the existing PURL builder, then update the callers: have extractMavenDependencies always call the Maven builder (or pass "maven") and have extractPackageMetadata/extractBallerinaDependencies always call the Ballerina builder (or pass "ballerina"), ensuring the logic no longer relies on org.contains(".") and that PURL creation uses the correct builder based on the caller.
522-522: Consider bumpingspecVersionto a current CycloneDX version.CycloneDX 1.4 is from 2022; 1.5 (2023), 1.6 (2024), and 1.7 (2025) are widely supported by tools like Xray, Dependency-Track, and Trivy. Since this generator is brand new, picking 1.5–1.7 avoids an immediate follow-up bump and unlocks fields like
lifecycles,licensesw/ SPDX expressions, and improvedbom-refhandling. While newer versions maintain practical backward compatibility (tools ignore unknown fields), strict schema-compliant 1.4 consumers may fail; weigh this against the benefits of staying current.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compiler/ballerina-lang/src/main/java/io/ballerina/projects/util/SbomGenerator.java` at line 522, Update the CycloneDX specVersion set in SbomGenerator (the bom.put("specVersion", ...) call in SbomGenerator.java) to a current supported version (e.g., "1.7" or at least "1.6"/"1.5") by changing the literal value; after updating, run/adjust any SBOM schema validations or unit tests that assert the previous "1.4" string to ensure compatibility with the newer spec and any new fields you plan to emit.
38-38: Consider adoptingcyclonedx-core-java(v12.1.0) to replace hand-rolled SBOM generation.This file implements ~840 lines of functionality—PURL construction via string concatenation, PURL parsing via manual string operations (
.indexOf(),.substring(),.split()), component model, JSON serialization—thatorg.cyclonedx:cyclonedx-core-javaalready provides out-of-the-box. The library includesBom,Component,Dependency,BomGenerator,BomJsonGeneratorclasses, andPackageURL(viacom.github.package-url:packageurl-java), all schema-validated against the official CycloneDX specification.Benefits of migrating:
- Eliminates brittle string parsing in
parsePurlToComponentandparseDependencyString(replace withPackageURL).- Guarantees output validates against the published CycloneDX JSON schema (currently targeting v1.4; library supports up to v1.6.1).
- Spec version upgrades become trivial.
- Reduces codebase by ~600 LOC.
This is a significant refactor worth prioritizing before the code accumulates more dependents. Consider opening a tracking issue if not done immediately.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compiler/ballerina-lang/src/main/java/io/ballerina/projects/util/SbomGenerator.java` at line 38, Replace the hand-rolled SBOM generation in SbomGenerator with cyclonedx-core-java v12.1.0: stop building PURLs and parsing them manually in methods like parsePurlToComponent and parseDependencyString and instead use com.github.packageurl.PackageURL to parse package URLs and org.cyclonedx.models.Bom / Component / Dependency and org.cyclonedx.generators.json.BomJsonGenerator to build and serialize the BOM; update SbomGenerator to construct Component instances from PackageURL (mapping type/name/version/namespace), wire dependencies via Dependency objects, and emit schema-compliant JSON via BomJsonGenerator (ensuring you remove the manual string concatenation/parsing logic and replace it with these library calls).compiler/ballerina-lang/src/main/java/module-info.java (1)
23-23: Use the existing TOML parsing APIs instead of pulling intoml4j.The codebase already has
io.ballerina.tomland related APIs (BallerinaToml,DependencyManifest,PackageManifest) that are actively used throughout the compiler module for parsing manifests. Introducingtoml4jadds unnecessary dependency surface, divergence in TOML parsing behavior, and supply-chain risk by pinning a second TOML library.Additionally,
toml4j(com.moandjiezana.toml) is unmaintained (last release several years ago) and does not ship as a proper JPMS named module. Therequires toml4j;declaration in module-info.java will rely on an automatic module name, which is fragile and not recommended for a long-lived compiler module.Refactor to use the existing manifest APIs or
io.ballerina.tomldirectly throughPackageContext, which also gives you access to resolved/effective dependency versions instead of re-parsing raw TOML.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compiler/ballerina-lang/src/main/java/module-info.java` at line 23, The module declares a new requires toml4j in module-info.java which introduces an unnecessary, unmaintained TOML dependency; replace uses of com.moandjiezana.toml by switching callers to the existing compiler manifest APIs (e.g. BallerinaToml, DependencyManifest, PackageManifest) or use PackageContext to obtain parsed/effective manifest data instead of re-parsing with toml4j; update any code that referenced toml4j parsing to call the appropriate API (BallerinaToml.parse / PackageContext.getManifest / DependencyManifest accessors) and remove the requires toml4j line from module-info.java so the module relies on the in-tree io.ballerina.toml implementation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@compiler/ballerina-lang/src/main/java/io/ballerina/projects/BalaWriter.java`:
- Around line 187-188: The ternary that computes the local variable pkg (String
pkg = packageContext != null ? packageContext.packageName().toString() :
"<unknown";) is dead and unused; remove that declaration and the null-check, and
simplify the exception throw at the end of the SBOM generation block (the throw
new ProjectException(..., se) line) so it does not depend on pkg or redundant
packageContext checks—leave the throw using se.getMessage() and se as before.
Target the BalaWriter code paths around packageContext and the SBOM generation
failure throw to remove the unused pkg variable and any unnecessary null logic.
- Around line 192-194: Replace the direct call to Files.list(tmpDir) with a
try-with-resources to ensure the directory stream is closed: declare
Optional<Path> generated outside the block, then use try (Stream<Path> stream =
Files.list(tmpDir)) { generated = stream.filter(p ->
p.getFileName().toString().toLowerCase().endsWith(".cdx.json")).findFirst(); }
so the stream returned by Files.list(tmpDir) is closed; update the BalaWriter
code where generated is computed to use this pattern.
- Around line 183-218: The current inner catch around SbomGenerator.generateBom
in BalaWriter (catch (TomlException | IOException se) that throws new
ProjectException) will abort packaging; change this to best-effort: replace the
throw with logging a warning (include packageContext.packageName() if non-null
and se.getMessage()) and skip adding SBOM (e.g., return from the SBOM-generation
block or continue) so bala packaging proceeds. Also ensure the outer catch
(IOException e) no longer silently swallows errors — log an error/warning with
the exception details (instead of empty comment) so real IO failures (like
Files.createTempDirectory) are visible. Target symbols:
SbomGenerator.generateBom, the inner catch handling the variable se, the thrown
ProjectException, and the outer catch (IOException e).
- Around line 162-178: Replace hardcoded manifest filenames in BalaWriter (the
block that computes manifestPath using packageContext.project().sourceRoot())
with ProjectConstants.BALLERINA_TOML and ProjectConstants.DEPENDENCIES_TOML;
specifically, look up the canonical ProjectConstants.BALLERINA_TOML first, then
fall back to ProjectConstants.DEPENDENCIES_TOML (and avoid checking a
lowercase-only variant), so the code uses the established constants and the
canonical names in the checks and Files.exists calls.
In
`@compiler/ballerina-lang/src/main/java/io/ballerina/projects/util/SbomGenerator.java`:
- Line 87: The code in SbomGenerator uses
manifestPath.getParent().resolve("dependencies.toml") which will miss the
canonical Ballerina filename on case-sensitive filesystems; change the lookup to
use the canonical constant ProjectConstants.DEPENDENCIES_TOML (or otherwise
check both "Dependencies.toml" and "dependencies.toml") when resolving the
dependencies file, updating any usage around the manifestPath variable in
SbomGenerator to rely on ProjectConstants.DEPENDENCIES_TOML to ensure the
correct file is found.
- Around line 122-201: The current branch quietly does nothing when
componentsByPurl is empty, resulting in no SBOM file; update SbomGenerator so
that when componentsByPurl.isEmpty() it still builds and writes a minimal
CycloneDX BOM JSON (with bomFormat, specVersion, serialNumber, metadata and
empty components/dependencies arrays) by calling buildBomJson(...) with an empty
components map and empty dependency list, then writing to the path produced by
ensureCdxJsonExtension(outputPath) (creating parent dirs and writing to
finalOutputPath as done in the existing try-with-resources). Also, add a clear
signal (either a return value or an exception) or ensure BalaWriter logs a
warning when the generator writes just the minimal BOM so callers can
distinguish "generated empty SBOM" from "skipped".
- Around line 61-63: The code dereferences manifestPath.getFileName() and
manifestPath.getParent() unguarded, which can NPE for relative/root paths; in
SbomGenerator.generateBom (and any constructor/usage creating ballerinaTomlPath)
normalize or guard manifestPath first: convert manifestPath to an
absolute/normalized Path (e.g., manifestPath =
manifestPath.toAbsolutePath().normalize()) or check for nulls before use, then
compute ballerinaTomlPath by first ensuring getFileName()!=null and using
getParent()!=null (fallback to manifestPath itself or Paths.get(".") as
appropriate) before calling resolve("Ballerina.toml"); update both the
ballerinaTomlPath assignment and the similar logic at the other occurrence
(around line 87) to use the same null-safe/normalized approach.
- Around line 198-200: Replace the PrintWriter usage that swallows IOExceptions
with the underlying BufferedWriter so write errors propagate: in SbomGenerator
(the method that opens finalOutputPath) remove the try-with-resources creating
new PrintWriter(Files.newBufferedWriter(finalOutputPath)) and instead use the
BufferedWriter returned by Files.newBufferedWriter(finalOutputPath) directly to
write bomJson (and let IOException propagate); also remove the now-unused import
java.io.PrintWriter. Ensure the resource is closed via try-with-resources and no
checkError() calls are used.
- Around line 822-833: The if-block in SbomGenerator that checks "if
(s.contains(\"/\") && s.contains(\"@\"))" is unreachable because earlier logic
already returns for any s containing "@"; remove this entire branch (the check
and its local variables atPos, left, version, slash, group, artifact and the
call to buildPurl) to eliminate dead code and keep the remaining parsing logic
intact in SbomGenerator.java.
- Line 518: The current SBOM serial is non-deterministic because SbomGenerator
builds serial with "urn:uuid:"+UUID.randomUUID(), which breaks reproducible bala
builds; change SbomGenerator to compute the serial deterministically from the
package coordinates (e.g., organization, package name, version and any module
identifier) by creating a stable byte representation of those fields and
deriving a UUID via UUID.nameUUIDFromBytes(...) or by hashing (SHA-1/SHA-256)
the concatenated coordinates and formatting as
"urn:uuid:"+UUID.nameUUIDFromBytes(hashBytes). Replace the UUID.randomUUID()
usage where the local variable serial is assigned so the same source always
yields the same serialNumber in bom.cdx.json. Ensure you use the same stable
field order and encoding when building the input bytes.
- Around line 289-314: The code currently throws IllegalArgumentException when a
Maven coord is incomplete (version null), breaking valid path-only dependencies;
update SbomGenerator so it only requires groupId and artifactId (allowing
version to be null) and do not throw for path-only entries: in the block that
currently checks if (groupId != null && artifactId != null && version != null)
change the condition to require only groupId and artifactId, call
buildPurl(groupId, artifactId, version) which already tolerates null version,
populate componentsByPurl and dependencyNodes as before, and for entries that
lack groupId/artifactId (path-only platform.javaXX.dependency) simply
skip/continue instead of throwing; keep extractDependencyRefs, componentsByPurl,
and dependencyNodes usage intact.
- Line 275: Replace the hardcoded loop over new
String[]{"java8","java11","java17","java21"} in SbomGenerator (the for loop that
processes javaVersion) with code that discovers Java platforms dynamically:
obtain the "platform" table from the parsed project model, iterate its child
keys, filter keys that start with "java" (e.g., "java25"), collect them (skip
null table), and iterate that list so SBOM generation (the existing logic that
consumes javaVersion) covers any newly added JDK platforms instead of silently
missing them.
---
Nitpick comments:
In `@compiler/ballerina-lang/src/main/java/io/ballerina/projects/BalaWriter.java`:
- Line 146: Rename the method addBOM to addBom in the BalaWriter class and
update all call sites (e.g., the call currently written as
addBOM(balaOutputStream)) to use addBom to follow camelCase convention; change
the method declaration name from addBOM to addBom and update any internal
references, Javadoc, and tests that refer to addBOM to the new name to avoid
compilation errors.
In
`@compiler/ballerina-lang/src/main/java/io/ballerina/projects/util/SbomGenerator.java`:
- Around line 32-37: Update the SbomGenerator class Javadoc to include an `@since`
tag and reword/format the description: add a blank paragraph break (<p>) before
the second sentence, state that generateBom(...) is the sole public entry point,
and explicitly document that each call to generateBom writes a single
"*.cdx.json" CycloneDX BOM file; reference the class name SbomGenerator and the
public method generateBom in the Javadoc so the API contract is clear.
- Line 30: Replace the wildcard import in SbomGenerator.java (and the similar
wildcard in SbomGeneratorTest.java) with explicit java.util imports used by the
class: remove "import java.util.*;" and add the specific types referenced (for
example LinkedHashMap, LinkedHashSet, UUID, Comparator, List, Map, Set,
ArrayList — or whichever exact collection/utility types SbomGenerator and its
test actually use) so the file shows explicit imports instead of a wildcard.
- Around line 264-265: The long single-line exception messages in SbomGenerator
(e.g., the throw new IllegalArgumentException("Package metadata is
incomplete...") and similar throws around the class) exceed the 120-column
style; update each throw site in SbomGenerator to split the message into
concatenated string literals or build the message with String.format/variables
so no source line exceeds 120 columns (identify the sites by the throw new
IllegalArgumentException/RuntimeException statements and the long messages
currently around the class, including the ones you noted at the earlier line
ranges) and ensure each split preserves the exact message content when
concatenated at runtime.
- Around line 78-82: The catch blocks in SbomGenerator that currently use "catch
(Exception e)" (and a redundant catch for IllegalArgumentException) are too
broad; change them to only handle the TOML/parser exceptions you expect (e.g.,
IllegalStateException and IOException from toml4j or the file read), preserve
the original cause by wrapping those specific exceptions into a TomlException,
and rethrow unchecked Throwables (RuntimeException and Error) unchanged so you
don't mask programming errors; also remove the duplicate
IllegalArgumentException catch and coordinate the remaining catches in the
methods of class SbomGenerator where TomlException is constructed so only
parser/IO exceptions are converted.
- Line 597: The heuristic using "org.contains(\".\")" to set isMavenDependency
in SbomGenerator is brittle; remove that check and change the PURL construction
to two explicit builders (e.g., buildBallerinaPurl(String org, String name,
String version) and buildMavenPurl(String group, String artifactId, String
version)) or add an explicit kind parameter to the existing PURL builder, then
update the callers: have extractMavenDependencies always call the Maven builder
(or pass "maven") and have extractPackageMetadata/extractBallerinaDependencies
always call the Ballerina builder (or pass "ballerina"), ensuring the logic no
longer relies on org.contains(".") and that PURL creation uses the correct
builder based on the caller.
- Line 522: Update the CycloneDX specVersion set in SbomGenerator (the
bom.put("specVersion", ...) call in SbomGenerator.java) to a current supported
version (e.g., "1.7" or at least "1.6"/"1.5") by changing the literal value;
after updating, run/adjust any SBOM schema validations or unit tests that assert
the previous "1.4" string to ensure compatibility with the newer spec and any
new fields you plan to emit.
- Line 38: Replace the hand-rolled SBOM generation in SbomGenerator with
cyclonedx-core-java v12.1.0: stop building PURLs and parsing them manually in
methods like parsePurlToComponent and parseDependencyString and instead use
com.github.packageurl.PackageURL to parse package URLs and
org.cyclonedx.models.Bom / Component / Dependency and
org.cyclonedx.generators.json.BomJsonGenerator to build and serialize the BOM;
update SbomGenerator to construct Component instances from PackageURL (mapping
type/name/version/namespace), wire dependencies via Dependency objects, and emit
schema-compliant JSON via BomJsonGenerator (ensuring you remove the manual
string concatenation/parsing logic and replace it with these library calls).
In `@compiler/ballerina-lang/src/main/java/module-info.java`:
- Line 23: The module declares a new requires toml4j in module-info.java which
introduces an unnecessary, unmaintained TOML dependency; replace uses of
com.moandjiezana.toml by switching callers to the existing compiler manifest
APIs (e.g. BallerinaToml, DependencyManifest, PackageManifest) or use
PackageContext to obtain parsed/effective manifest data instead of re-parsing
with toml4j; update any code that referenced toml4j parsing to call the
appropriate API (BallerinaToml.parse / PackageContext.getManifest /
DependencyManifest accessors) and remove the requires toml4j line from
module-info.java so the module relies on the in-tree io.ballerina.toml
implementation.
In
`@compiler/ballerina-lang/src/test/java/io/ballerina/projects/util/SbomGeneratorTest.java`:
- Line 34: Make the test class final and remove the wildcard java import: change
the class declaration SbomGeneratorTest to "final class SbomGeneratorTest" and
replace the "import java.util.*;" wildcard with explicit imports for only the
types used in this file (e.g., List, Map, Optional, etc.) so the test follows
the project's conventions and matches other tests in the directory.
- Around line 95-97: The assertions in SbomGeneratorTest use a loose
"contains(a) || contains(b)" pattern for purls produced by buildPurl; change
these to assert equality against the exact deterministic purl string (e.g.,
"pkg:ballerina/ballerina/io@1.8.0" and
"pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1") and include the actual
purls set (the variable purls) in the assertion failure message so failures show
what was produced; update the two occurrences around the checks that reference
buildPurl/purls (the io and log4j assertions) accordingly.
- Line 34: Add unit tests in SbomGeneratorTest that cover the failure/no-op
branches: (1) a test that feeds a Ballerina.toml missing required fields and
asserts extractPackageMetadata throws TomlException; (2) a test that builds a
manifest with no [package] section so generateBom yields an empty
componentsByPurl and verify that BalaWriter.addBOM produces no bom file (assert
no output file is created or that generateBom returns an expected no-op result);
(3) a test that requests an output filename like "bom-output.json" and asserts
the actual file written by generateBom/BalaWriter ends with ".cdx.json"
(normalize behavior); and (4) a duplicate-dependency test where the same
dependency appears in Ballerina.toml (Maven style) and dependencies.toml
(Ballerina style) and assert the final BOM deduplicates by purl. Locate and
exercise extractPackageMetadata, generateBom, and BalaWriter.addBOM when adding
these tests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1253ccc6-5297-4c6d-8e7c-986e0d26851e
📒 Files selected for processing (4)
compiler/ballerina-lang/src/main/java/io/ballerina/projects/BalaWriter.javacompiler/ballerina-lang/src/main/java/io/ballerina/projects/util/SbomGenerator.javacompiler/ballerina-lang/src/main/java/module-info.javacompiler/ballerina-lang/src/test/java/io/ballerina/projects/util/SbomGeneratorTest.java
| private static void extractMavenDependencies(Toml toml, Map<String, Map<String, Object>> componentsByPurl, | ||
| List<Map<String, Object>> dependencyNodes) { | ||
| // Look for all java/maven dependencies | ||
| for (String javaVersion : new String[]{"java8", "java11", "java17", "java21"}) { |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm what platform.* targets are actually used in the codebase / sample projects.
rg -n --type=toml '^\[platform\.' | head -40Repository: ballerina-platform/ballerina-lang
Length of output: 1775
🏁 Script executed:
# First, locate and examine SbomGenerator.java around the target line
find . -name "SbomGenerator.java" -type fRepository: ballerina-platform/ballerina-lang
Length of output: 163
🏁 Script executed:
# Once located, examine the code around line 275 with broader context
cat -n compiler/ballerina-lang/src/main/java/io/ballerina/projects/util/SbomGenerator.java | sed -n '250,300p'Repository: ballerina-platform/ballerina-lang
Length of output: 3176
🏁 Script executed:
# Check if there's any documentation or configuration about supported platforms
rg -n "java8|java11|java17|java21|supported.*platform" compiler/ballerina-lang/src/main/java/io/ballerina/projects/util/SbomGenerator.java -A 2 -B 2 | head -80Repository: ballerina-platform/ballerina-lang
Length of output: 449
🏁 Script executed:
# Search the broader codebase to see how other code handles platform discovery
rg -n "platform\." compiler/ballerina-lang/src/main/java/io/ballerina/projects/ -A 1 -B 1 | grep -E "(java8|java11|java17|java21|keySet|toMap)" | head -30Repository: ballerina-platform/ballerina-lang
Length of output: 254
🏁 Script executed:
# Check if java8 is referenced elsewhere in the codebase as a supported platform
rg -i "java8" compiler/ballerina-lang/src/main/java/io/ballerina/projects/ -B 2 -A 2 | head -60Repository: ballerina-platform/ballerina-lang
Length of output: 849
🏁 Script executed:
# Look for any documentation or constants about supported platforms
rg -n "supported.*platform|platform.*version|java.*version" compiler/ballerina-lang/src/main/java/io/ballerina/ --type=java | grep -i "8\|11\|17\|21\|version" | head -30Repository: ballerina-platform/ballerina-lang
Length of output: 2156
🏁 Script executed:
# Check what other keys exist under "platform" table in test TOML files
rg -A 10 '^\[platform\.' compiler/ballerina-lang/src/test/resources/ballerina-toml/ | grep -E '^\[|^[a-z]' | head -50Repository: ballerina-platform/ballerina-lang
Length of output: 3508
🏁 Script executed:
# Check the git history to see if java8 was recently removed or deprecated
git log --oneline -p compiler/ballerina-lang/src/main/java/io/ballerina/projects/util/SbomGenerator.java | grep -B 5 -A 5 "java8\|java11\|java17\|java21" | head -80Repository: ballerina-platform/ballerina-lang
Length of output: 840
🏁 Script executed:
# Look for configuration files or documentation about supported versions
fd -e properties -e xml -e yaml -e yml . compiler/ballerina-lang/src/main/java/io/ballerina/ | xargs grep -l "java\|platform\|version" 2>/dev/null | head -10Repository: ballerina-platform/ballerina-lang
Length of output: 59
🏁 Script executed:
# Check if there are any tests for the extractMavenDependencies method
rg -n "extractMavenDependencies" compiler/ballerina-lang/src/test/ -B 2 -A 10 | head -80Repository: ballerina-platform/ballerina-lang
Length of output: 59
🏁 Script executed:
# Look for any issue tracking or comments about platform versions
rg -n "TODO|FIXME|BUG|XXX" compiler/ballerina-lang/src/main/java/io/ballerina/projects/util/SbomGenerator.javaRepository: ballerina-platform/ballerina-lang
Length of output: 59
🏁 Script executed:
# Check what the actual platform table looks like in a real test - get more context
cat -n compiler/ballerina-lang/src/test/resources/ballerina-toml/valid-ballerina.tomlRepository: ballerina-platform/ballerina-lang
Length of output: 1012
Hardcoded Java platform list will silently drop new JDK dependencies from SBOM.
The current code only checks ["java8", "java11", "java17", "java21"]. When Ballerina adds support for new JDKs (e.g., java25), Maven dependencies under platform.java25 will be missed by SBOM generation—silently—without error or warning. This defeats Xray scanning and other security scanning tools that rely on complete dependency lists.
Discover platforms dynamically by iterating the platform table's children with a java* prefix filter. This is safe because all test TOML files use only [platform.java*] entries (no non-Java platform keys to accidentally include).
🔧 Proposed fix
- // Look for all java/maven dependencies
- for (String javaVersion : new String[]{"java8", "java11", "java17", "java21"}) {
- String platformKey = "platform." + javaVersion;
- Toml platformTable = toml.getTable(platformKey);
- if (platformTable != null) {
+ // Look for all java/maven dependencies under any platform.java* table
+ Toml platformsTable = toml.getTable("platform");
+ if (platformsTable != null) {
+ Map<String, Object> platformMap = platformsTable.toMap();
+ for (String javaVersion : platformMap.keySet()) {
+ if (!javaVersion.startsWith("java")) {
+ continue;
+ }
+ Toml platformTable = platformsTable.getTable(javaVersion);
+ if (platformTable == null) {
+ continue;
+ }
// Get the dependency(s) from the platform table
List<Toml> depTables = getTomlTables(platformTable, "dependency");
...
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@compiler/ballerina-lang/src/main/java/io/ballerina/projects/util/SbomGenerator.java`
at line 275, Replace the hardcoded loop over new
String[]{"java8","java11","java17","java21"} in SbomGenerator (the for loop that
processes javaVersion) with code that discovers Java platforms dynamically:
obtain the "platform" table from the parsed project model, iterate its child
keys, filter keys that start with "java" (e.g., "java25"), collect them (skip
null table), and iterate that list so SBOM generation (the existing logic that
consumes javaVersion) covers any newly added JDK platforms instead of silently
missing them.
| if (groupId != null && artifactId != null && version != null) { | ||
| String purl = buildPurl(groupId, artifactId, version); | ||
| Map<String, Object> comp = new LinkedHashMap<>(); | ||
| comp.put("type", "library"); | ||
| comp.put("group", groupId); | ||
| comp.put("name", artifactId); | ||
| comp.put("version", version); | ||
| comp.put("purl", purl); | ||
| componentsByPurl.putIfAbsent(purl, comp); | ||
|
|
||
| Object depsObj = depMap.get("dependencies"); | ||
| if (depsObj == null) { | ||
| depsObj = depMap.get("dependsOn"); | ||
| } | ||
|
|
||
| // Extracts the dependsOn dependencies and store them in the form of purl's | ||
| List<String> depRefs = extractDependencyRefs(depsObj, componentsByPurl); | ||
|
|
||
| // Bring ref & dependsOn dependencies to a single data structure | ||
| Map<String, Object> depNode = new LinkedHashMap<>(); | ||
| depNode.put("ref", purl); | ||
| depNode.put("dependsOn", depRefs); | ||
| dependencyNodes.add(depNode); | ||
| } else { | ||
| throw new IllegalArgumentException("Maven dependency is incomplete in the TOML file. 'groupId', 'artifactId', and 'version' are required fields."); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm whether path-only platform.javaXX.dependency entries exist in test fixtures.
rg -n --type=toml -B1 -A4 '^\[\[platform\.java[0-9]+\.dependency\]\]' | head -80
rg -n --type=toml -B1 -A4 '\bpath\s*=' compiler/ballerina-lang/src/test 2>/dev/null | head -40Repository: ballerina-platform/ballerina-lang
Length of output: 12597
🏁 Script executed:
cat -n compiler/ballerina-lang/src/main/java/io/ballerina/projects/util/SbomGenerator.java | sed -n '280,320p'Repository: ballerina-platform/ballerina-lang
Length of output: 2492
🏁 Script executed:
rg -A 20 'private.*buildPurl' compiler/ballerina-lang/src/main/java/io/ballerina/projects/util/SbomGenerator.javaRepository: ballerina-platform/ballerina-lang
Length of output: 950
🏁 Script executed:
rg -n 'version != null' compiler/ballerina-lang/src/main/java/io/ballerina/projects/util/SbomGenerator.javaRepository: ballerina-platform/ballerina-lang
Length of output: 332
🏁 Script executed:
rg -n 'SbomGenerator' compiler/ballerina-lang/src/test --type=java | head -20Repository: ballerina-platform/ballerina-lang
Length of output: 539
🏁 Script executed:
fd -name '*SbomGenerator*Test*' compiler/ballerina-lang/src/testRepository: ballerina-platform/ballerina-lang
Length of output: 308
🏁 Script executed:
rg -B5 -A10 'path.*=' compiler/ballerina-lang/src/test/resources/ballerina-toml/local-dependencies.tomlRepository: ballerina-platform/ballerina-lang
Length of output: 557
🏁 Script executed:
cat -n compiler/ballerina-lang/src/test/java/io/ballerina/projects/util/SbomGeneratorTest.java | head -100Repository: ballerina-platform/ballerina-lang
Length of output: 5416
🏁 Script executed:
rg -B2 -A5 'path.*=' compiler/ballerina-lang/src/test/resources/ballerina-toml/ --type=toml | grep -E '(^\[|path|groupId|artifactId|version)' | head -50Repository: ballerina-platform/ballerina-lang
Length of output: 5831
🏁 Script executed:
# Check if there are any test fixtures with ONLY path and no groupId/artifactId
rg --type=toml '^\[\[platform\.java' compiler/ballerina-lang/src/test/resources -A5 | grep -B5 'path =' | grep -v 'groupId\|artifactId\|version'Repository: ballerina-platform/ballerina-lang
Length of output: 6809
🏁 Script executed:
cat compiler/ballerina-lang/src/test/resources/ballerina-toml/additional-props-ballerina.tomlRepository: ballerina-platform/ballerina-lang
Length of output: 739
🏁 Script executed:
cat compiler/ballerina-lang/src/test/resources/ballerina-toml/platform-without-org-name-version.tomlRepository: ballerina-platform/ballerina-lang
Length of output: 449
🏁 Script executed:
# Check if SbomGenerator test or other tests explicitly test path-only entries
rg -B5 -A5 'path.*only' compiler/ballerina-lang/src/test --type=javaRepository: ballerina-platform/ballerina-lang
Length of output: 59
Skip or gracefully handle path-only platform.javaXX.dependency entries in SBOM generation.
Throwing IllegalArgumentException when version is null breaks valid Ballerina manifests that use path instead of Maven coordinates. Test fixtures (e.g., additional-props-ballerina.toml, project_no_class_def_found/Ballerina.toml) confirm path-only entries are legitimate; the current code makes them un-packageable.
The buildPurl() method already handles null version safely (it checks if (version != null && !version.isEmpty())), so either skip incomplete entries silently or conditionally build components for path-only entries without requiring Maven coordinates.
🔧 Proposed fix
- if (groupId != null && artifactId != null && version != null) {
+ if (groupId != null && artifactId != null) {
String purl = buildPurl(groupId, artifactId, version);
...
- } else {
- throw new IllegalArgumentException("Maven dependency is incomplete in the TOML file. 'groupId', 'artifactId', and 'version' are required fields.");
- }
+ }
+ // Skip path-based or URL-based platform entries without Maven coordinates.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@compiler/ballerina-lang/src/main/java/io/ballerina/projects/util/SbomGenerator.java`
around lines 289 - 314, The code currently throws IllegalArgumentException when
a Maven coord is incomplete (version null), breaking valid path-only
dependencies; update SbomGenerator so it only requires groupId and artifactId
(allowing version to be null) and do not throw for path-only entries: in the
block that currently checks if (groupId != null && artifactId != null && version
!= null) change the condition to require only groupId and artifactId, call
buildPurl(groupId, artifactId, version) which already tolerates null version,
populate componentsByPurl and dependencyNodes as before, and for entries that
lack groupId/artifactId (path-only platform.javaXX.dependency) simply
skip/continue instead of throwing; keep extractDependencyRefs, componentsByPurl,
and dependencyNodes usage intact.
| // Maven slash form or simple path-like: group/artifact@version | ||
| if (s.contains("/") && s.contains("@")) { | ||
| int atPos = s.indexOf('@'); | ||
| String left = s.substring(0, atPos); | ||
| String version = s.substring(atPos + 1); | ||
| int slash = left.indexOf('/'); | ||
| if (slash > 0) { | ||
| String group = left.substring(0, slash); | ||
| String artifact = left.substring(slash + 1); | ||
| return buildPurl(group, artifact, version); | ||
| } | ||
| } |
There was a problem hiding this comment.
Dead code: this branch is unreachable.
The block at line 807–820 already handles every s.contains("@") case (with or without /) and returns. By the time control could reach line 823, s.contains("@") is necessarily false, so the s.contains("/") && s.contains("@") test can never be true. Drop it.
🔧 Proposed fix
- // Maven slash form or simple path-like: group/artifact@version
- if (s.contains("/") && s.contains("@")) {
- int atPos = s.indexOf('@');
- String left = s.substring(0, atPos);
- String version = s.substring(atPos + 1);
- int slash = left.indexOf('/');
- if (slash > 0) {
- String group = left.substring(0, slash);
- String artifact = left.substring(slash + 1);
- return buildPurl(group, artifact, version);
- }
- }
-
// As a last resort, return null to avoid malformed purls
return null;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@compiler/ballerina-lang/src/main/java/io/ballerina/projects/util/SbomGenerator.java`
around lines 822 - 833, The if-block in SbomGenerator that checks "if
(s.contains(\"/\") && s.contains(\"@\"))" is unreachable because earlier logic
already returns for any s containing "@"; remove this entire branch (the check
and its local variables atPos, left, version, slash, group, artifact and the
call to buildPurl) to eliminate dead code and keep the remaining parsing logic
intact in SbomGenerator.java.
Purpose
Summary
This pull request adds CycloneDX SBOM (Software Bill of Materials) generation capability to Ballerina package archives. The implementation creates a
bom.cdx.jsonfile within each bala package, enabling dependency analysis and component tracking.Changes
New SBOM Generation Utility
SbomGeneratorclass that converts package manifests (Ballerina.tomlanddependencies.toml) into CycloneDX JSON formatIntegration into Bala Archive Creation
BalaWriterto invoke SBOM generation during package archivingModule Dependencies
toml4jmodule requirement to support TOML manifest parsingQuality Assurance
Outcome
Packages now include machine-readable component and dependency information, enabling downstream tools to perform analysis of package contents and transitive dependencies.