Changes by Jules#104
Open
akutuva21 wants to merge 622 commits into
Open
Conversation
🧪 Add unit tests for isInComplexWith in pathwaycommons
Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
… in rate functions Resolved an issue in PyBioNetGen's Atomizer (bngModel.py) where volume adjustment logic inside rate functions implicitly assumed the presence of only a single species. Updated the logic to iterate correctly through all species, safely escape regex search patterns, and properly evaluate overlapping substitutions.
Fixed an issue where reactants and products were incorrectly grouped together in createMetaRule. Separated the grouping logic to properly distinguish between reactants and products.
Replaced inefficient custom dependency sorting algorithm (marked with a FIXME comment) with networkx.topological_sort in bionetgen/atomizer/bngModel.py:reorder_functions. The previous implementation used a custom stack-based loop with O(N^2) worst-case complexity and a latent bug due to modifying lists during iteration.
Fixed issue where reactant/compartment info was not properly pulled for rule pointer functions in the atomizer.
Auto-load PyBioNetGen CLI version via module metadata instead of hardcoding. Adds --version flag support to the CLI.
This commit addresses the issue where the PyBioNetGen CLI lacked an automated way to load and display the PyBioNetGen version from the module's metadata without incurring module-load I/O. - Added a lazy `__version__` attribute in `bionetgen/__init__.py` using PEP 562 (`__getattr__`) and `importlib.metadata`. - Updated `versionAction.__call__` in `bionetgen/main.py` to pull `bng.__version__` instead of relying on the manual `VERSION` file read via `get_version()`. - Removed the obsolete `# TODO: Auto-load in BioNetGen version here` comment. Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
- Dynamically load version via importlib.metadata in bionetgen/__init__.py - Update main.py version action to use module version variable - Replace mocker fixture with standard unittest.mock.patch in tests to fix CI errors Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
Auto-load PyBioNetGen CLI version via module metadata (importlib.metadata). Clean recreation of PR #237.
Fixed parameter namespace collisions in bngModel assignment rules. Also fixed CI test issues with os.chdir cleanup and mock fixture failures.
Refactored BNGResult.find_dat_files and BNGResult.load_results to be standalone functions for better testability and reusability.
Cached annotationIDs lookups to avoid full table refetch on each call, improving performance.
…ex-1994341575432634614 🧪 [Testing Improvement] Add unit tests for `_extract_odes_from_cvode_mex`
Replaced a convoluted list comprehension and for-loop with a simple `in` operator to check if `rawArule[0]` exists in `observablesDict`. This improves code readability and maintainability without changing functionality. Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
…991341537741170 🔒 fix insecure pickle deserialization in atomizer utils
…val-12651291564637042192 Fix compartment removal in reaction rates for sbml2bngl
…g-9621468861655201185 fix: update comment and warning message for rate rules over non-zero parameters
…ation-5890336867955565148 🔒 Fix insecure deserialization vulnerability in contactMap.py
…3761045350582060 🧹 [Code Health] Simplify dictionary lookup logic in sbml2bngl.py
…recated actions - Run black on 9 files failing formatting check - Fix _resolve_block_adder to raise BNGModelError instead of ValueError - Fix test assertion expectations to match actual exception types - Fix simulator tests to properly skip when BNG2.pl unavailable - Update deprecated actions/checkout@v2/v3 to v4 - Update deprecated docker actions from v2 to v3/v5/v6
…ssertions, mock exception types, and SBML skip
Extracted duplicate logic for grouping equivalent molecules from different rules in `contextAnalyzer.py` into a new `groupEquivalentItems` helper function. This satisfies the long-standing "TODO: i have to find the way to group together equivalent molecules from different rules and find the metarule" and improves code readability. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
…g slicing. * Fix contactMap.py to only process the first cluster using slicing. Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com> * Fix contactMap.py to only process the first cluster using slicing. Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com> * Fix contactMap.py to only process the first cluster using slicing. Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com> --------- Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
* Fix string copying bug in analyzeSBML by performing token-based prefix matching
The original logic was iterating character-by-character over
`tmpRuleList[1][0]` and checking if the character itself (`in sym`)
was present in the list of symbols (`sym`). However, `sym` can contain
multi-character strings (e.g. '~P', '~U'), meaning the first character
('~') alone evaluates to False, causing the loop to fail to copy the
symbols properly.
The fix sorts the symbols by length in descending order to handle
overlapping prefixes correctly and then uses `.startswith` from the
current index to greedily match the longest possible valid string
token in `sym`.
Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
* Fix string copying bug in analyzeSBML by performing token-based prefix matching
The original logic was iterating character-by-character over
`tmpRuleList[1][0]` and checking if the character itself (`in sym`)
was present in the list of symbols (`sym`). However, `sym` can contain
multi-character strings (e.g. '~P', '~U'), meaning the first character
('~') alone evaluates to False, causing the loop to fail to copy the
symbols properly.
The fix sorts the symbols by length in descending order to handle
overlapping prefixes correctly and then uses `.startswith` from the
current index to greedily match the longest possible valid string
token in `sym`.
Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
* Fix string copying bug in analyzeSBML by performing token-based prefix matching
The original logic was iterating character-by-character over
`tmpRuleList[1][0]` and checking if the character itself (`in sym`)
was present in the list of symbols (`sym`). However, `sym` can contain
multi-character strings (e.g. '~P', '~U'), meaning the first character
('~') alone evaluates to False, causing the loop to fail to copy the
symbols properly.
The fix sorts the symbols by length in descending order to handle
overlapping prefixes correctly and then uses `.startswith` from the
current index to greedily match the longest possible valid string
token in `sym`.
Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
* fix(atomizer): apply greedy matching fix to analyzeSBML string copy
Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
---------
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Implemented the verbosity check in `bionetgen/modelapi/bngparser.py` that was mentioned in a TODO comment. If `self.verbose` is set to true, it will now print "Parsing XML" before calling `xml_file.read()`. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
* fix(parser): handle 0 as a valid dictionary key in BNGL action arguments
Modified `curly_arg_token` in `ActionList.define_parser` to allow `0` as a bare dictionary key using `pp.Literal("0")`. This resolves the TODO for handling the 0 case and ensures action strings like `simulate({print_functions=>{0=>1}})` are properly parsed instead of failing with a syntax error.
Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
* fix: adjust curly_arg_token formatting to pass black lint check
Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
---------
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
…tion in analyzeSBML * perf: optimize levenshtein distance calculations using memoization The `levenshtein` method in `analyzeSBML.py` is repeatedly called during string processing and matching. Adding the `@pmemoize` decorator caches its results for duplicate inputs, significantly speeding up nested string comparisons across the atomizer module. The method was safely converted to a `@staticmethod` to ensure compatibility with `pmemoize`'s underlying `marshal` logic, which cannot serialize `self`. Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com> * perf: optimize levenshtein distance calculations using memoization The `levenshtein` method in `analyzeSBML.py` is repeatedly called during string processing and matching. Adding the `@pmemoize` decorator caches its results for duplicate inputs, significantly speeding up nested string comparisons across the atomizer module. The method was safely converted to a `@staticmethod` to ensure compatibility with `pmemoize`'s underlying `marshal` logic, which cannot serialize `self`. Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com> --------- Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
* 🧪 Add error test for shutil.rmtree in CSimulator Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com> * Fix syntax error and compile errors in test_csimulator.py - Fixed Python 3.8/3.9 syntax error by unwrapping parenthesized context managers. - Fixed `distutils.errors.CompileError` by correctly mocking `bionetgen.simulator.csimulator._new_ccompiler` instead of `ccompiler` directly, returning `compile_shared_lib` compatibility with tests. - Reverted unintentional use of `Exception` instead of `ValueError` in test expectations. Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com> --------- Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
* fix: migrate BNGL function creation logic to bngModel class This commit adds a new `add_bngl_function` method to the `bngModel` class which encapsulates the repetitive logic of instantiating a Function object, parsing its definition from a BNGL string, assigning its properties, and adding it to the model. It then updates the `sbml2bngl.py` converter to use this new method instead of executing the logic inline, satisfying the previous "TODO: Add to bngModel functions" task. Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com> * fix: migrate BNGL function creation logic to bngModel class This commit adds a new `add_bngl_function` method to the `bngModel` class which encapsulates the repetitive logic of instantiating a Function object, parsing its definition from a BNGL string, assigning its properties, and adding it to the model. It then updates the `sbml2bngl.py` converter to use this new method instead of executing the logic inline, satisfying the previous "TODO: Add to bngModel functions" task. Ran black to fix formatting issues caught by CI. Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com> * fix: migrate BNGL function creation logic to bngModel class This commit adds a new `add_bngl_function` method to the `bngModel` class which encapsulates the repetitive logic of instantiating a Function object, parsing its definition from a BNGL string, assigning its properties, and adding it to the model. It then updates the `sbml2bngl.py` converter to use this new method instead of executing the logic inline, satisfying the previous "TODO: Add to bngModel functions" task. Includes proper black formatting. Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com> * fix: migrate BNGL function creation logic to bngModel class This commit adds a new `add_bngl_function` method to the `bngModel` class which encapsulates the repetitive logic of instantiating a Function object, parsing its definition from a BNGL string, assigning its properties, and adding it to the model. It then updates the `sbml2bngl.py` converter to use this new method instead of executing the logic inline, satisfying the previous "TODO: Add to bngModel functions" task. Includes proper black formatting. Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com> * fix: migrate BNGL function creation logic to bngModel class This commit adds a new `add_bngl_function` method to the `bngModel` class which encapsulates the repetitive logic of instantiating a Function object, parsing its definition from a BNGL string, assigning its properties, and adding it to the model. It then updates the `sbml2bngl.py` converter to use this new method instead of executing the logic inline, satisfying the previous "TODO: Add to bngModel functions" task. Includes proper black formatting. Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com> * fix: migrate BNGL function creation logic to bngModel class This commit adds a new `add_bngl_function` method to the `bngModel` class which encapsulates the repetitive logic of instantiating a Function object, parsing its definition from a BNGL string, assigning its properties, and adding it to the model. It then updates the `sbml2bngl.py` converter to use this new method instead of executing the logic inline, satisfying the previous "TODO: Add to bngModel functions" task. Includes proper black formatting. Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com> --------- Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
* Remove hardcoded BioGRID API key - Removed the hardcoded API key (`f74b8d6f4c394fcc9d97b11c8c83d7f3`) from `bionetgen/atomizer/utils/pathwaycommons.py`. - Updated `queryBioGridByName` to retrieve the key from the `BIOGRID_API_KEY` environment variable. - Added a graceful fallback: if the key is not set, a warning is logged (`WARNING:ATO006`), and the function returns `False`. - Updated tests in `tests/test_pathwaycommons.py` to mock `os.environ` so they continue passing. Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com> * Remove hardcoded BioGRID API key - Removed the hardcoded API key (`f74b8d6f4c394fcc9d97b11c8c83d7f3`) from `bionetgen/atomizer/utils/pathwaycommons.py`. - Updated `queryBioGridByName` to retrieve the key from the `BIOGRID_API_KEY` environment variable. - Added a graceful fallback: if the key is not set, a warning is logged (`WARNING:ATO006`), and the function returns `False`. - Updated tests in `tests/test_pathwaycommons.py` to mock `os.environ` so they continue passing. Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com> * 🔒 fix(pathwaycommons): Remove hardcoded API key Refactor `queryBioGridByName` in `bionetgen/atomizer/utils/pathwaycommons.py` to retrieve the BioGRID API key via the `BIOGRID_API_KEY` environment variable instead of using a hardcoded string. A fallback was added to return `False` gracefully if the key is not set, logging a warning. This addresses the security vulnerability of hardcoded secrets in the code. Tests in `tests/test_pathwaycommons.py` were also updated to mock the environment variable to ensure existing tests pass. Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com> * 🔒 fix(pathwaycommons): Remove hardcoded API key Refactor `queryBioGridByName` in `bionetgen/atomizer/utils/pathwaycommons.py` to retrieve the BioGRID API key via the `BIOGRID_API_KEY` environment variable instead of using a hardcoded string. A fallback was added to return `False` gracefully if the key is not set, logging a warning. This addresses the security vulnerability of hardcoded secrets in the code. Tests in `tests/test_pathwaycommons.py` were also updated to mock the environment variable to ensure existing tests pass. Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com> --------- Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
…ix CI test failures
Previously the BNGsim backend hook rejected PLA actions and routed them to the BNG2.pl subprocess, where the bundled run_network binary fails on the '-p pla' method. Now PLA is treated like ode/ssa/psa: the Perl hook intercepts it and delegates to the backend helper, which passes it through to BNGsim when available. The method alias map, routing checks, and test patch hook are all updated accordingly.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PyBioNetGen: Security, Performance, Code Health, and Test Coverage Overhaul
Summary
This PR rolls up 419 commits spanning April 13 – June 4, 2026 into a single contribution against PyBioNetGen. It touches 151 unique files with roughly +10,800 / −9,100 lines changed across the run. The work is a broad maintenance and hardening pass over the codebase — closing security holes, eliminating long-standing
TODO/FIXME/assertdebt, optimizing hot paths, and substantially expanding test coverage — alongside a set of targeted bug fixes in the SBML→BNGL atomizer and the simulator/runner stack.Work was developed with assistance from AI coding agents (Jules), with all commits co-authored and reviewed;
Co-authored-bytrailers creditgoogle-labs-jules[bot]on ~115 commits.Commit breakdown by category
Most-affected files
atomizer/sbml2bngl.py,core/tools/visualize.py,modelapi/runner.py,atomizer/bngModel.py,simulator/csimulator.py,atomizer/libsbml2bngl.py,modelapi/bngfile.py, andmerging/namingDatabase.py, plus a large number of new and expanded test modules undertests/.1. Security fixes
Hardening focused on unsafe deserialization, code injection vectors, XML parsing, path traversal, and leaked secrets.
pickle.loadwithjson.loadinatomizer/contactMap.pyandannotationComparison.py, and fixed insecure deserialization indetectOntology.py.eval()→ast.literal_eval()across the atomizer andpostAnalysis.pyto remove arbitrary-code-execution risk.ast.literal_evalDoS vulnerabilities fixed inpostAnalysis.pyandanalyzeTrends(untrusted assumption strings), with one path further hardened by switching tojson.loads.readBNGXML.pyand a full migration todefusedxmlfor safe XML parsing.tarfile.extractallresolved.yaml.load→yaml.safe_load).pathwaycommons.py; the key is now read from theBIOGRID_API_KEYenvironment variable with a graceful warning-and-skip fallback when unset.molec.namein the atomizer.2. Performance optimizations
NamingDatabaseSQLite layer: persistent connection handling and reduced per-query overhead, reported as a ~3.7x query speedup; fixed an N+1 query in annotation insertion; batched annotation-ID caching to avoid full-table refetches; batched SQL in namespace detection.join/buffered builds inbnglReaction,bnglWriter.py,BNGResult.__repr__,bngmodel.__str__, andside_string.moleculeCreation,smallStructures,classifyReactions, atomizer constant lists).bnglWriter.pyanddistanceToModification; Levenshtein distance memoized inanalyzeSBML.type() == listchecks withisinstance(), removed redundant.keys()lookups andlen() == 0checks, and added early-exit traversal indeleteMolecule.queryActiveSitewithin theresolveSCTloop.3. Code health & refactoring
assertstatements to structuredBNGError/BNGParseErrorplusBNGLoggeracross reactions parsing,csimulatorinit,network.py, and model parsing; replaced bareexceptclauses with specific exceptions; converted silent assignment failures to logged warnings.os.chdir/ CWD discipline: a large sustained effort to eliminate working-directory leaks — refactoredVisResultandvisualize.pyto avoidos.chdir, wrapped directory changes intry/finally, replaced a module-level app instantiation that corrupted CWD, and usedTemporaryDirectorycontexts correctly. This resolved widespread WindowsPermissionError/WinError 32andFileNotFoundErrorfailures in CI.libsbml2bngl.pyandbngModel.pyrefactored to usenetworkxtopological sort (Kahn's algorithm).TODO/FIXMEitems and removed dead code throughout the atomizer, network structs, and model API; standardizedshutilimport (dropped theimport shutil as spawnalias); refactored oversized files (e.g. splittingsbml2bngl.pyhelpers into utility modules) and long methods (ActionList.__init__).blackformatting and removal of stray test/build artifacts.4. Bug fixes
observablesDictwhen overridden by assignment rules. Fixednl/nrassignment when reactants/products appear in rate expressions.Pattern, dimer component classification, volume adjustment for multiple species in rate functions, double-modification queueing inSCTSolver, and parameter namespace collisions inbngModelassignment rules.gdiff): fixed single-node-to-list conversion and nested node recoloring/renaming.bngl2xmlnow runs with a multiprocessing/subprocess timeout; fixedrunner.pySectionProxyattribute error; restored CWD infinallyblocks across visualize/runner/atomizer/simulator.--versionflag (after one revert/re-land cycle); fixedBNGPATHresolution whenBNG2.plis onPATH.ListOfBondstag handling inxmltodictparsing; invalid escape sequences in parameter rate-rule regexes;setup.pyduplicate manifest inclusions.5. New features
poplevel,check_product_scale) and apsasimulation method.sympyinParameterBlock/NetworkBlock.add_item.BNGResultextension-based file filtering; standalonefind_dat_files/load_resultsmethods; consolidated path argument in the constructor.add_bngl_functionhelper migrated into thebngModelclass.6. Test coverage
~88 commits add or expand tests, with new suites for
pathwaycommons(BioGRID/Reactome/Uniprot query paths and HTTPError fallbacks),detectOntology.levenshtein,sbml2jsonutilities, thecsimulator/libRRSimulatorwrappers,BNGSimulatorproperties,gdiff,sympy_odes,Patternequality,ModelObj/ActionBlockoperations, the runner, and thenotebook/graphdiffCLI commands. Several commits specifically add exception-path and edge-case coverage (e.g.add_block,_safe_rmtreeOS errors,_get_color_id).Notes for reviewers
blackformatting,pytest-mock→unittest.mock, Python 3.8 compatibility for parenthesizedwith) that account for some apparent duplication in the log; the net effect is a green, cross-platform test suite.Merging PR #340–#367) and a few others fold in earlier sub-PRs from the same branch series.