chore: update downloads schema for tier 2/3 granularity#4150
Conversation
Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
…dates Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
There was a problem hiding this comment.
Pull request overview
This PR updates the OSS packages initial schema to support improved download tracking granularity and to align criticality ranking with the new download metric storage approach.
Changes:
- Renames the Tier-3 cached universe download snapshot column to
packages_universe.downloads_last_30dand updatesrank_packages_universe()to use it. - Removes denormalized download columns/indexes (
packages.downloads_last_month,versions.download_count) and introduces new download history tables (downloads_daily,downloads_last_30d). - Adds an FK from
downloads_daily.package_idtopackages(id)and updates downloads-related schema comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 8e07b22. Configure here.

This pull request updates the schema for package download tracking and criticality ranking. The changes focus on improving how download statistics are stored and accessed, introducing new tables for daily and monthly download counts, and refactoring related columns and indexes. The most important changes are grouped below:
Download tracking table changes:
downloads_last_30dto thepackages_universetable to cache the latest 30-day download snapshot, replacing the previousdownloads_30dcolumn. This value is written by the weekly ranking worker and used directly by the ranking function for efficiency.downloads_last_monthcolumn from thepackagestable and thedownload_countcolumn from theversionstable, as these are now tracked in dedicated download tables. [1] [2]New download history tables:
downloads_dailytable (partitioned by month) to store daily download counts per package, with a foreign key topackages.downloads_monthlytable to store monthly download counts keyed bypurl, ensuring historical data persists across weekly truncations ofpackages_universe. Includes a unique constraint on(purl, month)and an index for efficient queries.Index and schema cleanup:
downloads_last_monthin thepackagestable, as this column has been dropped.Criticality ranking function update:
downloads_last_30dcolumn instead of the olddownloads_30dcolumn, ensuring the ranking logic aligns with the new schema. [1] [2]Note
Medium Risk
Foundational schema for criticality ranking and partitioned time-series tables; wrong worker contracts or missing pg_partman partitions would break inserts and ranking inputs.
Overview
This PR reshapes how download metrics are stored for tier 2 (
packages) vs tier 3 (packages_universe) and wires criticality ranking to the new names.Universe (tier 3):
downloads_30dbecomesdownloads_last_30d, documented as the denormalized latest rolling 30-day total (written with upserts into a new history table).rank_packages_universe()now scores usingdownloads_last_30dinstead ofdownloads_30d.Packages / versions:
downloads_last_monthonpackagesand its partial index are removed—tier 2 windows are expected to come fromdownloads_daily(sum over days). Per-versiondownload_countonversionsis dropped.New / tightened download tables:
downloads_dailygains aREFERENCES packages (id)FK. A new range-partitioneddownloads_last_30dtable stores rolling windows bypurlandend_date(survives weeklypackages_universetruncation), with an index on(purl, end_date DESC)and pg_partman setup notes mirroringdownloads_daily.Docs in migration: A consolidated DOWNLOADS section explains the two-table split and upsert pattern for the 30-day timeline.
Reviewed by Cursor Bugbot for commit 8e07b22. Bugbot is set up for automated code reviews on this repo. Configure here.