Skip to content

chore: update downloads schema for tier 2/3 granularity#4150

Merged
joanagmaia merged 3 commits into
mainfrom
chore/downloads-schema
May 27, 2026
Merged

chore: update downloads schema for tier 2/3 granularity#4150
joanagmaia merged 3 commits into
mainfrom
chore/downloads-schema

Conversation

@joanagmaia
Copy link
Copy Markdown
Contributor

@joanagmaia joanagmaia commented May 27, 2026

This pull request updates the schema for package download tracking and criticality ranking. The changes focus on improving how download statistics are stored and accessed, introducing new tables for daily and monthly download counts, and refactoring related columns and indexes. The most important changes are grouped below:

Download tracking table changes:

  • Added a new column downloads_last_30d to the packages_universe table to cache the latest 30-day download snapshot, replacing the previous downloads_30d column. This value is written by the weekly ranking worker and used directly by the ranking function for efficiency.
  • Removed the downloads_last_month column from the packages table and the download_count column from the versions table, as these are now tracked in dedicated download tables. [1] [2]

New download history tables:

  • Introduced the downloads_daily table (partitioned by month) to store daily download counts per package, with a foreign key to packages.
  • Added the downloads_monthly table to store monthly download counts keyed by purl, ensuring historical data persists across weekly truncations of packages_universe. Includes a unique constraint on (purl, month) and an index for efficient queries.

Index and schema cleanup:

  • Removed the index on downloads_last_month in the packages table, as this column has been dropped.

Criticality ranking function update:

  • Updated the ranking function and related comments to use the new downloads_last_30d column instead of the old downloads_30d column, ensuring the ranking logic aligns with the new schema. [1] [2]

Note

Medium Risk
Foundational schema for criticality ranking and partitioned time-series tables; wrong worker contracts or missing pg_partman partitions would break inserts and ranking inputs.

Overview
This PR reshapes how download metrics are stored for tier 2 (packages) vs tier 3 (packages_universe) and wires criticality ranking to the new names.

Universe (tier 3): downloads_30d becomes downloads_last_30d, documented as the denormalized latest rolling 30-day total (written with upserts into a new history table). rank_packages_universe() now scores using downloads_last_30d instead of downloads_30d.

Packages / versions: downloads_last_month on packages and its partial index are removed—tier 2 windows are expected to come from downloads_daily (sum over days). Per-version download_count on versions is dropped.

New / tightened download tables: downloads_daily gains a REFERENCES packages (id) FK. A new range-partitioned downloads_last_30d table stores rolling windows by purl and end_date (survives weekly packages_universe truncation), with an index on (purl, end_date DESC) and pg_partman setup notes mirroring downloads_daily.

Docs in migration: A consolidated DOWNLOADS section explains the two-table split and upsert pattern for the 30-day timeline.

Reviewed by Cursor Bugbot for commit 8e07b22. Bugbot is set up for automated code reviews on this repo. Configure here.

Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
…dates

Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
@joanagmaia joanagmaia requested review from Copilot and epipav May 27, 2026 15:01
@github-actions
Copy link
Copy Markdown
Contributor

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the OSS packages initial schema to support improved download tracking granularity and to align criticality ranking with the new download metric storage approach.

Changes:

  • Renames the Tier-3 cached universe download snapshot column to packages_universe.downloads_last_30d and updates rank_packages_universe() to use it.
  • Removes denormalized download columns/indexes (packages.downloads_last_month, versions.download_count) and introduces new download history tables (downloads_daily, downloads_last_30d).
  • Adds an FK from downloads_daily.package_id to packages(id) and updates downloads-related schema comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread backend/src/osspckgs/migrations/V1779710880__initial_schema.sql Outdated
Comment thread backend/src/osspckgs/migrations/V1779710880__initial_schema.sql
Comment thread backend/src/osspckgs/migrations/V1779710880__initial_schema.sql
epipav
epipav previously approved these changes May 27, 2026
Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 8e07b22. Configure here.

Comment thread backend/src/osspckgs/migrations/V1779710880__initial_schema.sql
@joanagmaia joanagmaia merged commit 3e8feca into main May 27, 2026
15 checks passed
@joanagmaia joanagmaia deleted the chore/downloads-schema branch May 27, 2026 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants